Github user tdas commented on the issue:
https://github.com/apache/spark/pull/15102
@koeninger
I did some independent brainstorming with @zsxwing on topic deletion, and
yeah I agree with you that attempting to account for deleted topics in the
offset in the KafkaSourceOffset such that compareTo is satisfied is more
complicated than just eliminating compareTo. That said, there are still a few
corner case - of the same topic being deleted and recreated. I am not familiar
with how often this can happen (let us know your thoughts). But the general
idea we can implement that that we attach a unique id to the topic in the
KafkaSourceOffset. Whenever the new topic is detected (while running or across
query restarts), generate a unique id so that it is consider as a new topic.
Here are the options
**Option 1: When getOffset detects new topic, if the topic existed in
previous offset, create new (topic, unique id)**
- Pro: Simple
- Con: Cannot detect if topic gets deleted+recreated between triggers
(possibly, across query restarts),
**Option 2: Use RebalanceListener to know when topic has been deleted**
- Pro: Handles topic deletion+recreation between triggers while query is
active
- Con: Misses deletion+recreation during query restarts
- Con: Listener called on different thread, so possible race conditions
**Option 3: Use the creation time / cZxid of topic info stored in ZK to
disambiguate**
- Pro: Zookeeper maintains uniques ness across any component restarts
- Con: Requires depending on full Kafka + ZK,
- Con: Requires knowing the exact ZK path where topics are saved, but this
can be tested and made sure that it never fails when we upgrade Kafka
I feel that we should just keep it simple for now, and go for Option 1.
What do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]