Re: Reprocessing and windowing

2015-02-23 Thread Yi Pan
Hey, Geoffry, We have started some work in SAMZA-552 to create a window operator API in samza, as part of effort to implement support for a high-level language. I will probably be able to have something to share in a few days and would love to get feedbacks regarding to the window operator. Thank

Re: Reprocessing and windowing

2015-02-23 Thread Roger Hoover
Hi Geoffry, You might find the Google Millwheel paper and recent talk relevant. That system supports windows based on event time as well as reprocessing. Sent from my iPhone > On Feb 23, 2015, at 4:49 PM, Geoffry Sumter wrote: > > Hey everyone, > > I've been thinking about reprocessing >

Reprocessing and windowing

2015-02-23 Thread Geoffry Sumter
Hey everyone, I've been thinking about reprocessing when my job has windowed state and I have a few questions. Contex

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Jay Kreps
I find it useful to delineate two kinds of things 1. Mutations such as database table updates. These always have a key 2. Immutable events such as clicks, sales, orders, etc. The whole premise of compaction is that you have some redundant updates as in case (1). In order to have updates you have t

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Thomas Bernhardt
Assume that there is data that doesn't have a key, how would you handle that? Would you always have a key and therefore generate one? Best regards,Tom From: Felix GV To: "dev@samza.apache.org" Sent: Monday, February 23, 2015 2:15 PM Subject: RE: Re-processing a la Kappa/Liquid A r

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Roger Hoover
Ah, right, thanks. Sent from my iPhone > On Feb 23, 2015, at 11:15 AM, Felix GV wrote: > > A recently-compacted topic is pretty similar to a snapshot in terms of > semantics, no? If you read the topic up until the point where the compaction > ended, you effectively read every key just once, s

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Roger Hoover
Ah, right. To save historical snapshots, one could periodically read the whole compacted topic and save it somewhere. Sent from my iPhone > On Feb 23, 2015, at 11:11 AM, Jay Kreps wrote: > > Basically log compaction == snapshot in a logical format. You can optimize > a tiny bit more, of cours

RE: Re-processing a la Kappa/Liquid

2015-02-23 Thread Felix GV
A recently-compacted topic is pretty similar to a snapshot in terms of semantics, no? If you read the topic up until the point where the compaction ended, you effectively read every key just once, same as a snapshot. I agree that the guaranteed uncompacted/dirty retention period would be useful.

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Jay Kreps
Basically log compaction == snapshot in a logical format. You can optimize a tiny bit more, of course, if you store the data files themselves for whatever store but that is going to be very storage engine specific. -Jay On Mon, Feb 23, 2015 at 10:33 AM, Roger Hoover wrote: > Thanks, Julian. > >

Re: Re-processing a la Kappa/Liquid

2015-02-23 Thread Roger Hoover
Thanks, Julian. I didn't see any mention of checkpoints in Kappa or Liquid information I've read but it does seem like a very useful optimization to make re-processing and failure recovery much faster. Databus supports snapshots, I believe, so that DB replicates can be initialized in a practical

Apache Samza Meetup Announced (March 4 @6PM hosted at LinkedInĀ¹s campus in Mountain View CA)

2015-02-23 Thread Ed Yakabosky
Hi all - I would like to announce the first Bay Area Apache Samza Meetup hosted at LinkedIn in Mountain View, CA on March 4, 2015 @6PM. We plan to host the event every 2-months to encourage knowledge sharing & collaboration in Samz