[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

koeninger Tue, 20 Sep 2016 15:53:09 -0700

Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    > We are not giving the developer the option to manually configure a 
consumer in this way for this PR precisely because I don't think we can while 
still maintaining the semantics that structured streaming has promised.
    
    You've got this backwards.  As soon as you give someone the ability to set 
auto.offset.reset to largest, it opens up the can of worms as to whether resets 
should happen at the beginning of a stream, during a stream, and/or when a 
partition is added.  Giving people the ability to configure a consumer doesn't 
cause that problem, it allows them to solve that problem until such time as the 
Kafka project has a unified way to solve it.  Similarly, as soon as you allow 
pattern subscriptions, it opens up the can of worms as to adding/removing 
topics and whether the sql Offset interface as is makes sense for Kafka.  Just 
saying you aren't going to handle deletions for right now doesn't solve that 
problem.  
    
    If you really don't want to consider changing the Offset interface, and 
want to tell people who need the details of Kafka in order to work done to use 
the DStream, then you should probably eliminate all configuration options 
except brokers, a list of topics, and maybe SSL.
    
    I'll try one more time, and then I'm done:
    - Months ago you came up with an interface that realistically will only 
work with Kafka / Kinesis / lookalikes, yet had no implementation for any of 
those.
    - Actually attempting an implementation raised some notable differences 
between what the interface allowed for and what the implementation needed.
    - I offered some specific suggestions, including considering changes to the 
interface
    - I offered to help with implementation
    
    Your response, from my point of view, has been
    - Decline to consider changes to the interface
    - Decline any assistance with actual implementation
    - Only (re)implement a subset of Kafka functionality that you can see is 
"safe", regardless of whether it's congruent with the way Kafka is already 
being used by users.
    
    Under those circumstances, I'm happy to answer specific directed questions 
you may have, but I'm not interested in continuing to argue.  If you guys say 
you've got this and you're going to do it your way, then you've got it.
    
    Let me know if you change your mind, I'll still be around.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

Reply via email to