You can also have a look at this blog and linked example that specifically
covers exactly-once with input from Kafka:

https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/


On Tue, Jun 14, 2016 at 2:47 PM, Thomas Weise <[email protected]>
wrote:

> See response below:
>
> On Tue, Jun 14, 2016 at 12:41 PM, Ananth Gundabattula <
> [email protected]> wrote:
>
>> Hello Siyuan/All,
>>
>> I have a couple of questions regarding the Kafka 0.9 operator. Could you
>> please help me in understanding this operator a bit better?
>>
>>
>>    - As stated in
>>    http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator
>>    , kafka 0.9 operator stores it "check-pointed offsets" in Kafka itself
>>    using the App name ? It sounds like -originalAppID is not used by this
>>    operator at all - In other words, I cant force an app to process starting
>>    from the beginning until I change the App name if the App is based on the
>>    Kafka 0.9 operator as the input operator
>>
>> The start offset configuration option should determine where the operator
> starts consuming on cold start (earliest, latest, last consumed). If that's
> not the case then it would be a bug. Siyuan, please comment.
>
>>
>>    -
>>    - How does the kafka 0.9 operator handle downstream operators failure
>>    ? By this I mean, an Apex downstream operator fails, and is brought back 
>> up
>>    by STRAM. However this operator was significantly lagging behind the
>>    current window of the kafka 0.9 operator window. Does the buffer server
>>    within the Kafka 0.9 operator buffer many windows to handle this situation
>>    ? ( and hence replays accordingly ? ) . I ask this to fine tune the buffer
>>    memory property.
>>
>> The upstream buffer server will hold the data until processed by the
> downstream operator. The buffer server, by default, will start to spool to
> disk when the allocated memory is used up. Back pressure will cause the
> consumer to slow down accordingly.
>
>>
>>    - Is EXACTLY_ONCE processing supported in this operator ? if yes, is
>>    it fair to assume that HDFS would be used to manage this type of
>>    configuration ?
>>
>> Yes, when you enable idempotency on the operator, exactly once processing
> semantics in downstream operators are supported (affects those that write
> to external systems). To enable this you can configure to use the window
> data manager that writes to HDFS, essentially it will keep track of the
> consumer offsets for each window.
>
>>
>>    -
>>    - Is EXACTLY_ONCE based off the streaming window or the Application
>>    Window in Apex ?
>>
>> The operator only sees the "application window". Make sure to align the
> checkpoint window interval.
>
> For more information about the Kafka input operator, please see:
> http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator
>
>
>
>

Reply via email to