Thanks a lot for the answers Thomas. The blog entry really helps.

Regarding the comment about offset management initialization, we are using
APPLICATION_OR_EARLIEST.  I was interpreting APPLICATION in this
configuration as the "ApplicationID" ( and hence my question regarding
originalAppId parameter ) and the behavior for the Kafka 0.9 operator seems
to be interpreting it as the application name. I shall wait for Siyuan's
thoughts on this.

Regards,
Ananth

On Wed, Jun 15, 2016 at 7:57 AM, Thomas Weise <[email protected]>
wrote:

> You can also have a look at this blog and linked example that specifically
> covers exactly-once with input from Kafka:
>
> https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/
>
>
> On Tue, Jun 14, 2016 at 2:47 PM, Thomas Weise <[email protected]>
> wrote:
>
>> See response below:
>>
>> On Tue, Jun 14, 2016 at 12:41 PM, Ananth Gundabattula <
>> [email protected]> wrote:
>>
>>> Hello Siyuan/All,
>>>
>>> I have a couple of questions regarding the Kafka 0.9 operator. Could you
>>> please help me in understanding this operator a bit better?
>>>
>>>
>>>    - As stated in
>>>    http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator
>>>    , kafka 0.9 operator stores it "check-pointed offsets" in Kafka itself
>>>    using the App name ? It sounds like -originalAppID is not used by this
>>>    operator at all - In other words, I cant force an app to process starting
>>>    from the beginning until I change the App name if the App is based on the
>>>    Kafka 0.9 operator as the input operator
>>>
>>> The start offset configuration option should determine where the
>> operator starts consuming on cold start (earliest, latest, last consumed).
>> If that's not the case then it would be a bug. Siyuan, please comment.
>>
>>>
>>>    -
>>>    - How does the kafka 0.9 operator handle downstream operators
>>>    failure ? By this I mean, an Apex downstream operator fails, and is 
>>> brought
>>>    back up by STRAM. However this operator was significantly lagging behind
>>>    the current window of the kafka 0.9 operator window. Does the buffer 
>>> server
>>>    within the Kafka 0.9 operator buffer many windows to handle this 
>>> situation
>>>    ? ( and hence replays accordingly ? ) . I ask this to fine tune the 
>>> buffer
>>>    memory property.
>>>
>>> The upstream buffer server will hold the data until processed by the
>> downstream operator. The buffer server, by default, will start to spool to
>> disk when the allocated memory is used up. Back pressure will cause the
>> consumer to slow down accordingly.
>>
>>>
>>>    - Is EXACTLY_ONCE processing supported in this operator ? if yes, is
>>>    it fair to assume that HDFS would be used to manage this type of
>>>    configuration ?
>>>
>>> Yes, when you enable idempotency on the operator, exactly once
>> processing semantics in downstream operators are supported (affects those
>> that write to external systems). To enable this you can configure to use
>> the window data manager that writes to HDFS, essentially it will keep track
>> of the consumer offsets for each window.
>>
>>>
>>>    -
>>>    - Is EXACTLY_ONCE based off the streaming window or the Application
>>>    Window in Apex ?
>>>
>>> The operator only sees the "application window". Make sure to align the
>> checkpoint window interval.
>>
>> For more information about the Kafka input operator, please see:
>> http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator
>>
>>
>>
>>
>

Reply via email to