[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

marmbrus Tue, 20 Sep 2016 12:07:28 -0700

Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    > My bigger concern is that it looks like you guys are continuing to hack 
in a particular direction, without addressing my points or answering whether 
you're willing to let me help work on this.
    Have you made up your mind?
    
    Cody, I think we have been addressing your points, though I know we are not 
done yet. It would be helpful if you could make specific comments on the code, 
preferably with pointers to what you think the correct implementation would 
look like.  Otherwise its hard to track which points you think have been 
resolved and which are still in question.
    
    I appreciate that you are concerned that some of this code is duplicated, 
but I'm going to have to respectfully disagree on that point.  I think this is 
the right choice both for the stability of the DStream implementation and our 
ability to optimize the SQL implementation.
    
    > You should not be assuming 0 for a starting offset for partitions you've 
just learned about. You should be asking the underlying driver consumer what 
its position is.
    
    I'll let Ryan comment further here, but I'm not sure if this is correct.  
It sounds like if we rely on Kafka to manage the its position there will be 
cases where partial failure could result it data loss.  In general, I think we 
need to be careful about relying on Kafka internals when our end goal is to 
provide a much [higher level 
abstraction](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#programming-model).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

Reply via email to