See response below:

On Tue, Jun 14, 2016 at 12:41 PM, Ananth Gundabattula <
[email protected]> wrote:

> Hello Siyuan/All,
>
> I have a couple of questions regarding the Kafka 0.9 operator. Could you
> please help me in understanding this operator a bit better?
>
>
>    - As stated in
>    http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator
>    , kafka 0.9 operator stores it "check-pointed offsets" in Kafka itself
>    using the App name ? It sounds like -originalAppID is not used by this
>    operator at all - In other words, I cant force an app to process starting
>    from the beginning until I change the App name if the App is based on the
>    Kafka 0.9 operator as the input operator
>
> The start offset configuration option should determine where the operator
starts consuming on cold start (earliest, latest, last consumed). If that's
not the case then it would be a bug. Siyuan, please comment.

>
>    -
>    - How does the kafka 0.9 operator handle downstream operators failure
>    ? By this I mean, an Apex downstream operator fails, and is brought back up
>    by STRAM. However this operator was significantly lagging behind the
>    current window of the kafka 0.9 operator window. Does the buffer server
>    within the Kafka 0.9 operator buffer many windows to handle this situation
>    ? ( and hence replays accordingly ? ) . I ask this to fine tune the buffer
>    memory property.
>
> The upstream buffer server will hold the data until processed by the
downstream operator. The buffer server, by default, will start to spool to
disk when the allocated memory is used up. Back pressure will cause the
consumer to slow down accordingly.

>
>    - Is EXACTLY_ONCE processing supported in this operator ? if yes, is
>    it fair to assume that HDFS would be used to manage this type of
>    configuration ?
>
> Yes, when you enable idempotency on the operator, exactly once processing
semantics in downstream operators are supported (affects those that write
to external systems). To enable this you can configure to use the window
data manager that writes to HDFS, essentially it will keep track of the
consumer offsets for each window.

>
>    -
>    - Is EXACTLY_ONCE based off the streaming window or the Application
>    Window in Apex ?
>
> The operator only sees the "application window". Make sure to align the
checkpoint window interval.

For more information about the Kafka input operator, please see:
http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator

Reply via email to