I dont think there is any easier way. On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande <deshpandesh...@gmail.com> wrote:
> Thanks TD for the response. I forgot to mention that I am not using > structured streaming. > > I was looking into KafkaUtils.createRDD, and looks like I need to get the > earliest and the latest offset for each partition to build the > Array(offsetRange). I wanted to know if there was a easier way. > > 1 reason why we are hesitating to use structured streaming is because I > need to persist the data in Cassandra database which I believe is not > production ready. > > > On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das <tathagata.das1...@gmail.com > > wrote: > >> Its best to use DataFrames. You can read from as streaming or as batch. >> More details here. >> >> https://spark.apache.org/docs/latest/structured-streaming-ka >> fka-integration.html#creating-a-kafka-source-for-batch-queries >> https://databricks.com/blog/2017/04/26/processing-data-in-ap >> ache-kafka-with-structured-streaming-in-apache-spark-2-2.html >> >> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande <deshpandesh...@gmail.com >> > wrote: >> >>> Hi all, >>> >>> What is the easiest way to read all the data from kafka in a batch >>> program for a given topic? >>> I have 10 kafka partitions, but the data is not much. I would like to >>> read from the earliest from all the partitions for a topic. >>> >>> I appreciate any help. Thanks >>> >> >> >