Re: How to stream all data out of a Kafka topic once, then terminate job?

2015-04-30 Thread Akhil Das
Have a look at KafkaRDD https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/kafka/KafkaRDD.html Thanks Best Regards On Wed, Apr 29, 2015 at 10:04 AM, dgoldenberg dgoldenberg...@gmail.com wrote: Hi, I'm wondering about the use-case where you're not doing continuous,

Re: How to stream all data out of a Kafka topic once, then terminate job?

2015-04-29 Thread Dmitry Goldenberg
Yes, and Kafka topics are basically queues. So perhaps what's needed is just KafkaRDD with starting offset being 0 and finish offset being a very large number... Sent from my iPhone On Apr 29, 2015, at 1:52 AM, ayan guha guha.a...@gmail.com wrote: I guess what you mean is not streaming.

Re: How to stream all data out of a Kafka topic once, then terminate job?

2015-04-29 Thread ayan guha
I guess what you mean is not streaming. If you create a stream context at time t, you will receive data coming through starting time t++, not before time t. Looks like you want a queue. Let Kafka write to a queue, consume msgs from the queue and stop when queue is empty. On 29 Apr 2015 14:35,

Re: How to stream all data out of a Kafka topic once, then terminate job?

2015-04-29 Thread Dmitry Goldenberg
Part of the issues is, when you read messages in a topic, the messages are peeked, not polled, so there'll be no when the queue is empty, as I understand it. So it would seem I'd want to do KafkaUtils.createRDD, which takes an array of OffsetRange's. Each OffsetRange is characterized by topic,

Re: How to stream all data out of a Kafka topic once, then terminate job?

2015-04-29 Thread Cody Koeninger
The idea of peek vs poll doesn't apply to kafka, because kafka is not a queue. There are two ways of doing what you want, either using KafkaRDD or a direct stream The Kafka rdd approach would require you to find the beginning and ending offsets for each partition. For an example of this, see

Re: How to stream all data out of a Kafka topic once, then terminate job?

2015-04-29 Thread Dmitry Goldenberg
Thanks for the comments, Cody. Granted, Kafka topics aren't queues. I was merely wishing that Kafka's topics had some queue behaviors supported because often that is exactly what one wants. The ability to poll messages off a topic seems like what lots of use-cases would want. I'll explore both

How to stream all data out of a Kafka topic once, then terminate job?

2015-04-28 Thread dgoldenberg
Hi, I'm wondering about the use-case where you're not doing continuous, incremental streaming of data out of Kafka but rather want to publish data once with your Producer(s) and consume it once, in your Consumer, then terminate the consumer Spark job. JavaStreamingContext jssc = new