The EventStore is queryable in flexible ways and so needs to have random access 
and indexing of events. In short it needs to be backed by a DB.

There are ML algorithms that do not need stored events like online learners in 
the Kappa style. But existing Templates and the PIO architecture do not support 
Kappa yet. So unless you are developing your own Template you will find that 
Kafka will not fill all the requirements.

It is, however an excellent way to manage streams so I tend to see it as a 
source for events and have seen it used as such.

To be more precise about why your method may not work is that not all events 
can be aged out. The item property events in PIO $set, $unset, and $delete are 
embedded in the stream but are only looked at in aggregate since they record a 
set of changes to an object. If you drop one of these events the current state 
of the object can not be calculated with certainty.

To truly use a streaming model we have to introduce the idea of watermarks that 
snapshot state in the stream so that any event can be dropped. This is how 
kappa learners work but for existing Lambda learners is often not possible. 
Many of the Lambda learners can be converted but not easily and not all.

 
On Jul 6, 2017, at 1:59 AM, Thomas POCREAU <[email protected]> wrote:

After some talks in intern, I misunderstood our needs.
Indeed, we will use Kafka HDFS connector 
<http://docs.confluent.io/current/connect/connect-hdfs/docs/index.html> to dump 
expired data.
So we will basically have Kafka for the fresh events and HDFS for the past 
events.


2017-07-06 7:06 GMT+02:00 Thomas POCREAU <[email protected] 
<mailto:[email protected]>>:
Hi,

Thanks for your responses.

Our goal is to use kafka as our main event store for event sourcing. I'm pretty 
sure that kafka can be used with an infinite retention time.

We could use KStream and the Java sdk but I would like to give a try to an 
implementation of PStore on top of spark-streaming-kafka 
(https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html 
<https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html>). 

My main concern is related to the http interface to push events. We will 
probably use KStream or websockets to load events in our Kafka topic used by an 
app channel.

Are you planning on supporting websockets as an alternative to batch import? 

Regards, 
Thomas Pocreau 

Le 5 juil. 2017 21:36, "Pat Ferrel" <[email protected] 
<mailto:[email protected]>> a écrit :
No, we try not to fork :-) But it would be nice as you say. It can be done with 
a small intermediary app that just reads from a Kafka topic and send events to 
a localhost EventServer, which would allow events to be custom extracted from 
say log files (typical contents of Kafka). We’ve done this in non-PIO projects.

The intermediary app should use Spark streaming. I may have a snippet of code 
around if you need it but it just saves to micro-batch files. You’d have to use 
the PIO Java-SDK to send them to the EventServer. A relatively simple thing.

Donald, what did you have in mind for deeper integration? I guess we could cut 
out the intermediate app and integrate into a new Kafka aware EventServer 
endpoint where the raw topic input is stored in the EventStore. This would 
force any log filtering onto the Kafka source.


On Jul 5, 2017, at 10:20 AM, Donald Szeto <[email protected] 
<mailto:[email protected]>> wrote:

Hi Thomas,

Supporting Kafka is definitely interesting and desirable. Are you looking to 
sinking your Kafka messages to event store for batch processing, or stream 
processing directly from Kafka? The latter would require more work because 
Apache PIO does not yet support streaming properly.

Folks from ActionML might have a flavor of PIO that works with Kafka.

Regards,
Donald

On Tue, Jul 4, 2017 at 8:34 AM, Thomas POCREAU <[email protected] 
<mailto:[email protected]>> wrote:
Hi,

Thanks a lot for this awesome project.

I have a question regarding Kafka and it's possible integration as an Event 
Store.
Do you have any plan on this matter ? 
Are you aware of someone working on a similar sujet ?

Regards,
Thomas.





Reply via email to