Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    > For streaming you already know what the global order is, because you know 
when you asked for A and B. I agree that we should probably remove the 
comparable requirement from Offset in favor of just having equality.
    
    Sounds good, as long as the execution portion of it isn't e.g. storing 
timestamps for A and running into issues on driver failover to a machine with 
clock drift.
    
    
    > Assuming A was retrieved before B, then it seems like you emit a warning 
that data was possibly missed from A (since it was deleted before we could get 
it) and you start a new batch on topic B from offsets 0-1. Right?
    
    I think in the absence of prior information about the position in a 
topicpartition, you start a new batch on topic B starting from wherever the 
consumer's position was at the time it acquired the subscription, which might 
not be 0.  I.e. you call position() before seekToEnd().  This might mean you 
need to record 2 kafka offsets in an SQL Offset if it's the first time you've 
seen that topicpartition.
    
    > Are there arguments that we do support that you think are confusing?
    
    I think the main thing that would be confusing is to specify topics in one 
way (custom-delimited string) for one configuration, and in another way 
(structured json) for another configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to