Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    @tdas moving this conversation back to the PR that's linked from the public 
jira
    
    > yeah, i am trying to figure out all the options and write up something to 
so that we are clear on the pros and cons of each approach. At a high level, 
they match the ones you suggested. Though I am trying to tease out what needs 
to be done if deletion is to supported, and what needs to be done 
deletion+recreation of same topic needs to be supported.
    
    > Also, at a high level I think that supporting deletion of topics does not 
require timestamps, its supporting deletion+recreation of the same topic would 
require more disambiguating information like timestamps. Here are a few 
questions,
    > 
    > Just to confirm, there are no unique guid kind of thing for topics in 
Kafka?
    > 
    > When a topic is deleted and recreated, what happens to the offsets? Does 
the recreated topic's offset start from 0? Or does it start from where the 
previous topic left off?
    
    I don't think it's useful to focus too hard on deletion, it's a symptom not 
just a cause.  Subscription changes for other reasons would also expose the 
same issue.   I think the underlying issue is that the Offset interface is 
asking for a global monotonic order, which is hard in a distributed system.
    
    To answer the questions, a topicpartition is a folder on disk (containing 
messages and offsets), and a node in ZK.  Both are deleted if the 
topicpartition is successfully deleted, so offsets start over.  ZK nodes have a 
cZxid, but I don't think it's a good idea to rely on Kafka having ZK internals, 
and I'm not sure what it buys you.  Say you do have zxid, and say you have two 
consumer states:
    
    State A:  topic A, partition 0, offset 1, zxid 0x20
    
    State B: topic B, partition 0, offset 1, zxid 0x21
    
    At one point in time the consumer is in state A, and different point in 
time it is in state B.  Without more information, how can you tell if A < B or 
B < A ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to