Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    @koeninger
    I did some independent brainstorming with @zsxwing on topic deletion, and 
yeah I agree with you that attempting to account for deleted topics in the 
offset in the KafkaSourceOffset such that compareTo is satisfied is more 
complicated than just eliminating compareTo. That said, there are still a few 
corner case - of the same topic being deleted and recreated. I am not familiar 
with how often this can happen (let us know your thoughts). But the general 
idea we can implement that that we attach a unique id to the topic in the 
KafkaSourceOffset. Whenever the new topic is detected (while running or across 
query restarts), generate a unique id so that it is consider as a new topic. 
Here are the options
    
    **Option 1: When getOffset detects new topic, if the topic existed in 
previous offset, create new (topic, unique id)**
    - Pro: Simple
    - Con: Cannot detect if topic gets deleted+recreated between triggers 
(possibly, across query restarts), 
    
    **Option 2: Use RebalanceListener to know when topic has been deleted**
    - Pro: Handles topic deletion+recreation between triggers while query is 
active
    - Con: Misses deletion+recreation during query restarts
    - Con: Listener called on different thread, so possible race conditions
    
    **Option 3: Use the creation time / cZxid of topic info stored in ZK to 
disambiguate**
    - Pro: Zookeeper maintains uniques ness across any component restarts
    - Con: Requires depending on full Kafka + ZK, 
    - Con: Requires knowing the exact ZK path where topics are saved, but this 
can be tested and made sure that it never fails when we upgrade Kafka
    
    I feel that we should just keep it simple for now, and go for Option 1. 
What do you think?
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to