Part of the issues is, when you read messages in a topic, the messages are peeked, not polled, so there'll be no "when the queue is empty", as I understand it.
So it would seem I'd want to do KafkaUtils.createRDD, which takes an array of OffsetRange's. Each OffsetRange is characterized by topic, partition, fromOffset, and untilOffset. In my case, I want to read all data, i.e. from all partitions and I don't know how many partitions there may be, nor do I know the 'untilOffset' values. In essence, I just want something like createRDD(new OffsetRangeAllData()); In addition, I'd ideally want the option of not peeking but polling the messages off the topics involved. But I'm not sure whether Kafka API's support it and then whether Spark does/will support that as well... On Wed, Apr 29, 2015 at 1:52 AM, ayan guha <guha.a...@gmail.com> wrote: > I guess what you mean is not streaming. If you create a stream context at > time t, you will receive data coming through starting time t++, not before > time t. > > Looks like you want a queue. Let Kafka write to a queue, consume msgs from > the queue and stop when queue is empty. > On 29 Apr 2015 14:35, "dgoldenberg" <dgoldenberg...@gmail.com> wrote: > >> Hi, >> >> I'm wondering about the use-case where you're not doing continuous, >> incremental streaming of data out of Kafka but rather want to publish data >> once with your Producer(s) and consume it once, in your Consumer, then >> terminate the consumer Spark job. >> >> JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, >> Durations.milliseconds(...)); >> >> The batchDuration parameter is "The time interval at which streaming data >> will be divided into batches". Can this be worked somehow to cause Spark >> Streaming to just get all the available data, then let all the RDD's >> within >> the Kafka discretized stream get processed, and then just be done and >> terminate, rather than wait another period and try and process any more >> data >> from Kafka? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-stream-all-data-out-of-a-Kafka-topic-once-then-terminate-job-tp22698.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>