Re: How to stream all data out of a Kafka topic once, then terminate job?

Dmitry Goldenberg Wed, 29 Apr 2015 06:53:38 -0700

Part of the issues is, when you read messages in a topic, the messages are
peeked, not polled, so there'll be no "when the queue is empty", as I
understand it.


So it would seem I'd want to do KafkaUtils.createRDD, which takes an array
of OffsetRange's. Each OffsetRange is characterized by topic, partition,
fromOffset, and untilOffset. In my case, I want to read all data, i.e. from
all partitions and I don't know how many partitions there may be, nor do I
know the 'untilOffset' values.

In essence, I just want something like createRDD(new OffsetRangeAllData());

In addition, I'd ideally want the option of not peeking but polling the
messages off the topics involved.  But I'm not sure whether Kafka API's
support it and then whether Spark does/will support that as well...



On Wed, Apr 29, 2015 at 1:52 AM, ayan guha <guha.a...@gmail.com> wrote:

> I guess what you mean is not streaming.  If you create a stream context at
> time t, you will receive data coming through starting time t++, not before
> time t.
>
> Looks like you want a queue. Let Kafka write to a queue, consume msgs from
> the queue and stop when queue is empty.
> On 29 Apr 2015 14:35, "dgoldenberg" <dgoldenberg...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm wondering about the use-case where you're not doing continuous,
>> incremental streaming of data out of Kafka but rather want to publish data
>> once with your Producer(s) and consume it once, in your Consumer, then
>> terminate the consumer Spark job.
>>
>> JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
>> Durations.milliseconds(...));
>>
>> The batchDuration parameter is "The time interval at which streaming data
>> will be divided into batches". Can this be worked somehow to cause Spark
>> Streaming to just get all the available data, then let all the RDD's
>> within
>> the Kafka discretized stream get processed, and then just be done and
>> terminate, rather than wait another period and try and process any more
>> data
>> from Kafka?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-stream-all-data-out-of-a-Kafka-topic-once-then-terminate-job-tp22698.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

Re: How to stream all data out of a Kafka topic once, then terminate job?

Reply via email to