Thanks Cody. On Wed, Aug 9, 2017 at 8:46 AM, Cody Koeninger <c...@koeninger.org> wrote:
> org.apache.spark.streaming.kafka.KafkaCluster has methods > getLatestLeaderOffsets and getEarliestLeaderOffsets > > On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande > <deshpandesh...@gmail.com> wrote: > > Thanks TD. > > > > On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das < > tathagata.das1...@gmail.com> > > wrote: > >> > >> I dont think there is any easier way. > >> > >> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande < > deshpandesh...@gmail.com> > >> wrote: > >>> > >>> Thanks TD for the response. I forgot to mention that I am not using > >>> structured streaming. > >>> > >>> I was looking into KafkaUtils.createRDD, and looks like I need to get > the > >>> earliest and the latest offset for each partition to build the > >>> Array(offsetRange). I wanted to know if there was a easier way. > >>> > >>> 1 reason why we are hesitating to use structured streaming is because I > >>> need to persist the data in Cassandra database which I believe is not > >>> production ready. > >>> > >>> > >>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das > >>> <tathagata.das1...@gmail.com> wrote: > >>>> > >>>> Its best to use DataFrames. You can read from as streaming or as > batch. > >>>> More details here. > >>>> > >>>> > >>>> https://spark.apache.org/docs/latest/structured-streaming- > kafka-integration.html#creating-a-kafka-source-for-batch-queries > >>>> > >>>> https://databricks.com/blog/2017/04/26/processing-data-in- > apache-kafka-with-structured-streaming-in-apache-spark-2-2.html > >>>> > >>>> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande > >>>> <deshpandesh...@gmail.com> wrote: > >>>>> > >>>>> Hi all, > >>>>> > >>>>> What is the easiest way to read all the data from kafka in a batch > >>>>> program for a given topic? > >>>>> I have 10 kafka partitions, but the data is not much. I would like to > >>>>> read from the earliest from all the partitions for a topic. > >>>>> > >>>>> I appreciate any help. Thanks > >>>> > >>>> > >>> > >> > > >