[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

koeninger Tue, 20 Sep 2016 12:59:42 -0700

Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    My fork is not following auto.offset.reset, it's following what the
    (potentially user-provided) consumer does when it sees a new partition.
    Maybe that's auto.offset.reset, maybe it's something else.
    
    Either way, who are you to presume that a user doesn't know what she is
    doing when she configured a consumer to start at a particular position for
    an added partition?  You have no guarantee that offset 0L even exists at
    that time.
    
    On Tue, Sep 20, 2016 at 2:52 PM, Shixiong Zhu <[email protected]>
    wrote:
    
    > You should not be assuming 0 for a starting offset for partitions you've
    > just learned about. You should be asking the underlying driver consumer
    > what its position is.
    >
    > Yes, there are two approaches for getting new partitions' offsets:
    >
    >    1. From a user perspective, if I set âauto.offset.resetâ to 
latest, I
    >    would like to process new data after the query starts. So we should 
start
    >    from the earliest offset for a new partition.
    >    2. Follow auto.offset.reset as what you did in your fork.
    >
    > However, option 2 makes the query result indeterminate when
    > auto.offset.reset is latest, depends on how the query runs. E.g., if you
    > add new partitions and push new data very quickly, the new data may be
    > lost; if you add new partitions and push new data when the app is being
    > recovered, the new data may be lost as well.
    >
    > â
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/15102#issuecomment-248414530>, or 
mute
    > the thread
    > 
<https://github.com/notifications/unsubscribe-auth/AAGAB1U_gCyEfg0GHzuymzWp3wtZ4zNoks5qsDmYgaJpZM4J9QvR>
    > .
    >




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

Reply via email to