Re: Per task/topic checkpoint?

2017-11-01 Thread Jacob Maes
Ahh, sorry, I misunderstood the problem. Does the application use InMemoryKeyValueStore to accumulate the deltas? If so, I would think you could enable the changelog on that store and commit as often as you like, because having the deltas backed up durably should allow you to decouple commit()

Re: Per task/topic checkpoint?

2017-11-01 Thread Gaurav Agarwal
Thanks Jake. In this application we control the checkpointing explicitly to accumulate certain amount of delta in memory before committing them to stores, and checkpointing. This is to reduce the commit counts and some other business case like deduplication of deltas. The scenario that I was

Re: Per task/topic checkpoint?

2017-11-01 Thread Jacob Maes
Hey Gaurav, Samza automatically keeps track of the offsets your job has successfully processed for each SSP. When your task requests a checkpoint, Samza will write the offset of the latest successfully-processed message for each SSP that task consumes. So if task0 consumes partition 0 of two

Re: Per task/topic checkpoint?

2017-11-01 Thread Gaurav Agarwal
Thanks, I'll check it out. I have a samza application that is consuming a lot of different types of messages (these messages are related to each other but do not require join - think of these like different configuration and metric information of virtual machines that modify some central sates

Re: Per task/topic checkpoint?

2017-10-28 Thread Jagadish Venkatraman
In Samza, the logical unit of processing (and hence, checkpointing) is a task. Hence, you cannot selectively checkpoint SSPs within a task. However, you can configure how you group your SSPs into tasks by choosing a Grouper. If you want to control checkpointing at the granularity of an SSP, then

Per task/topic checkpoint?

2017-10-28 Thread Gaurav Agarwal
Hi All, If I had Samza Tasks that were consuming message from multiple topics, how would checkpoint/commit work in that case? On calling taskCordinator.commit(), would current offset of all topics be saved for the caller task (only the partitions assigned to the caller task)? Is there a way to