Re: Review Request 32147: SAMZA-465

2015-04-02 Thread Naveen Somasundaram
> On April 2, 2015, 11:22 p.m., Chris Riccomini wrote: > > samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java, > > line 43 > > > > > > Can we make this private final, and call new in the cons

Re: Review Request 32147: SAMZA-465

2015-04-02 Thread Naveen Somasundaram
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32147/ --- (Updated April 3, 2015, 12:52 a.m.) Review request for samza. Changes ---

Re: Review Request 32147: SAMZA-465

2015-04-02 Thread Naveen Somasundaram
> On April 2, 2015, 11:35 p.m., Chris Riccomini wrote: > > samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java, > > line 458 > > > > > > typo fixed > On April 2, 2015, 11:35

Re: How do you serve the data computed by Samza?

2015-04-02 Thread Roger Hoover
Thanks for the great explanation, Felix! On Thu, Apr 2, 2015 at 4:08 PM, Felix GV wrote: > Hi Roger, > > The slow storage shard situation is indeed a concern. > > The slow storage shard will back up your pusher process for all shards if > the incoming Kafka stream partitions don't line up. Alter

Re: Review Request 32147: SAMZA-465

2015-04-02 Thread Chris Riccomini
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32147/#review78743 --- samza-core/src/main/java/org/apache/samza/coordinator/stream/Coordi

Re: Review Request 32147: SAMZA-465

2015-04-02 Thread Chris Riccomini
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32147/#review78680 --- samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointMana

RE: How do you serve the data computed by Samza?

2015-04-02 Thread Felix GV
Hi Roger, The slow storage shard situation is indeed a concern. The slow storage shard will back up your pusher process for all shards if the incoming Kafka stream partitions don't line up. Alternatively, your pusher process will keep going with the healthy shards but will then need to re-cons

Re: How do you serve the data computed by Samza?

2015-04-02 Thread Roger Hoover
Is it because the Kafka partitioning might not be the same as the storage partitioning? So that a slow storage shard will prevent unrelated shards from getting their messages? Ah, I think I see what you mean. If so, then the solution is to make the Kafka partitioning match the storage partitioni

Re: How do you serve the data computed by Samza?

2015-04-02 Thread Roger Hoover
Chinmay, Thanks for your input. I'm not understanding what the difference is. With the design that Felix laid out, the co-located Kafka consumer is still doing a push to the storage system, right?. It just happens to be on the same machine. How is this different from pushing batches from a non

RE: How do you serve the data computed by Samza?

2015-04-02 Thread Felix GV
That's a good point Chinmay. Doing smart throttling is more complex and not as reliable in the push model. Your point that storage partitioning and Kafka partitioning need to line up is fair. It is indeed an assumption I am making in the proposed design. -- Felix GV Data Infrastructure Enginee

Re: How do you serve the data computed by Samza?

2015-04-02 Thread Chinmay Soman
My 2 cents => One thing to note about the push model : multi-tenancy When your storage system (Druid for example) is used in a multi-tenant fashion - then push model is a bit difficult to operate. Primarily because there is no real feedback loop from the storage system. Yes - if the storage system

Re: How do you serve the data computed by Samza?

2015-04-02 Thread Roger Hoover
Felix, I see your point about simple Kafka consumers. My thought was that if you're already managing a Samza/YARN deployment then these types of jobs would be "just another job" and not require an additional process management/monitoring/operations setup. If you've already got a way to handle va

Re: Store changelog

2015-04-02 Thread Dan
Ah that makes sense now thanks Chris, re-reading that page it is clear. I think what confused me is this section from the configuration documentation for `stores.store-name.changelog`: "... Any output stream can be used as changelog, but you must ensure that only one job ever writes to a given cha

Re: Store changelog

2015-04-02 Thread Chinmay Soman
Also documented here: http://samza.apache.org/learn/documentation/0.9/container/state-management.html Check the "Local state in Samza" section - the diagram (and the description) explains this clearly. On Thu, Apr 2, 2015 at 10:36 AM, Chris Riccomini wrote: > Hey Dan, > > I think you might have

Re: Store changelog

2015-04-02 Thread Chris Riccomini
Hey Dan, I think you might have a misunderstanding in how changelogs work with Samza. Suppose you have a job with two tasks, and a single kv-store is configured with a changelog attached. The changelog, in Kafka, will have two partitions. Each task will use one partition of the changelog topic. Yo

Store changelog

2015-04-02 Thread Dan
Hi all, We're just starting out using Samza to process streams we've already got in Kafka. Some of the jobs we've written are using the per task KV store which are being persisted to a changelog topic in Kafka. As you need a different changelog topic per task we are wondering how people are dealin