[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

pwendell Tue, 03 Feb 2015 01:13:33 -0800

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3798#issuecomment-72617088
  
    I took a pass through the public API. I'm not very familiar with Kafka so 
it was somewhat slow going. However, some reactions:
    
    1. We should try to tighten, simplify, and clarify the way we name and 
document everything in this public API. Most of the comments were about this. 
The most important IMO is coming up with a good name for the new streams 
returned and clearly explaining how they differ from the old Kafka stream. To 
me, the main differences seems to be in the way we (a) decide what goes into 
which batch and (b) actually ingest the data. I proposed javadoc and naming 
scheme that emphasizing that distinction.
    2. Is there plans to add a Java and Python wrappers here next? Those are 
straightforward and it would be good to have them. Maybe in a follow on PR?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

Reply via email to