You can also have a look at this blog and linked example that specifically covers exactly-once with input from Kafka:
https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/ On Tue, Jun 14, 2016 at 2:47 PM, Thomas Weise <[email protected]> wrote: > See response below: > > On Tue, Jun 14, 2016 at 12:41 PM, Ananth Gundabattula < > [email protected]> wrote: > >> Hello Siyuan/All, >> >> I have a couple of questions regarding the Kafka 0.9 operator. Could you >> please help me in understanding this operator a bit better? >> >> >> - As stated in >> http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator >> , kafka 0.9 operator stores it "check-pointed offsets" in Kafka itself >> using the App name ? It sounds like -originalAppID is not used by this >> operator at all - In other words, I cant force an app to process starting >> from the beginning until I change the App name if the App is based on the >> Kafka 0.9 operator as the input operator >> >> The start offset configuration option should determine where the operator > starts consuming on cold start (earliest, latest, last consumed). If that's > not the case then it would be a bug. Siyuan, please comment. > >> >> - >> - How does the kafka 0.9 operator handle downstream operators failure >> ? By this I mean, an Apex downstream operator fails, and is brought back >> up >> by STRAM. However this operator was significantly lagging behind the >> current window of the kafka 0.9 operator window. Does the buffer server >> within the Kafka 0.9 operator buffer many windows to handle this situation >> ? ( and hence replays accordingly ? ) . I ask this to fine tune the buffer >> memory property. >> >> The upstream buffer server will hold the data until processed by the > downstream operator. The buffer server, by default, will start to spool to > disk when the allocated memory is used up. Back pressure will cause the > consumer to slow down accordingly. > >> >> - Is EXACTLY_ONCE processing supported in this operator ? if yes, is >> it fair to assume that HDFS would be used to manage this type of >> configuration ? >> >> Yes, when you enable idempotency on the operator, exactly once processing > semantics in downstream operators are supported (affects those that write > to external systems). To enable this you can configure to use the window > data manager that writes to HDFS, essentially it will keep track of the > consumer offsets for each window. > >> >> - >> - Is EXACTLY_ONCE based off the streaming window or the Application >> Window in Apex ? >> >> The operator only sees the "application window". Make sure to align the > checkpoint window interval. > > For more information about the Kafka input operator, please see: > http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator > > > >
