See response below: On Tue, Jun 14, 2016 at 12:41 PM, Ananth Gundabattula < [email protected]> wrote:
> Hello Siyuan/All, > > I have a couple of questions regarding the Kafka 0.9 operator. Could you > please help me in understanding this operator a bit better? > > > - As stated in > http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator > , kafka 0.9 operator stores it "check-pointed offsets" in Kafka itself > using the App name ? It sounds like -originalAppID is not used by this > operator at all - In other words, I cant force an app to process starting > from the beginning until I change the App name if the App is based on the > Kafka 0.9 operator as the input operator > > The start offset configuration option should determine where the operator starts consuming on cold start (earliest, latest, last consumed). If that's not the case then it would be a bug. Siyuan, please comment. > > - > - How does the kafka 0.9 operator handle downstream operators failure > ? By this I mean, an Apex downstream operator fails, and is brought back up > by STRAM. However this operator was significantly lagging behind the > current window of the kafka 0.9 operator window. Does the buffer server > within the Kafka 0.9 operator buffer many windows to handle this situation > ? ( and hence replays accordingly ? ) . I ask this to fine tune the buffer > memory property. > > The upstream buffer server will hold the data until processed by the downstream operator. The buffer server, by default, will start to spool to disk when the allocated memory is used up. Back pressure will cause the consumer to slow down accordingly. > > - Is EXACTLY_ONCE processing supported in this operator ? if yes, is > it fair to assume that HDFS would be used to manage this type of > configuration ? > > Yes, when you enable idempotency on the operator, exactly once processing semantics in downstream operators are supported (affects those that write to external systems). To enable this you can configure to use the window data manager that writes to HDFS, essentially it will keep track of the consumer offsets for each window. > > - > - Is EXACTLY_ONCE based off the streaming window or the Application > Window in Apex ? > > The operator only sees the "application window". Make sure to align the checkpoint window interval. For more information about the Kafka input operator, please see: http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator
