[jira] [Created] (SPARK-18682) Batch Source for Kafka

Michael Armbrust (JIRA) Thu, 01 Dec 2016 17:36:08 -0800

Michael Armbrust created SPARK-18682:
----------------------------------------


             Summary: Batch Source for Kafka
                 Key: SPARK-18682
                 URL: https://issues.apache.org/jira/browse/SPARK-18682
             Project: Spark
          Issue Type: New Feature
          Components: SQL, Structured Streaming
            Reporter: Michael Armbrust


Today, you can start a stream that reads from kafka.  However, given kafka's 
configurable retention period, it seems like sometimes you might just want to 
read all of the data that is available now.  As such we should add a version 
that works with {{spark.read}} as well.

The options should be the same as the streaming kafka source, with the 
following differences:
 - {{startingOffsets}} should default to earliest, and should not allow 
{{latest}} (which would always be empty).
 - {{endingOffsets}} should also be allowed and should default to {{latest}}. 
the same assign json format as {{startingOffsets}} should also be accepted.

It would be really good, if things like {{.limit\(n\)}} were enough to prevent 
all the data from being read (this might just work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18682) Batch Source for Kafka

Reply via email to