Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?
Thanks Cody. On Wed, Aug 9, 2017 at 8:46 AM, Cody Koeninger wrote: > org.apache.spark.streaming.kafka.KafkaCluster has methods > getLatestLeaderOffsets and getEarliestLeaderOffsets > > On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande > wrote: > > Thanks TD. > > > > On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das < > tathagata.das1...@gmail.com> > > wrote: > >> > >> I dont think there is any easier way. > >> > >> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande < > deshpandesh...@gmail.com> > >> wrote: > >>> > >>> Thanks TD for the response. I forgot to mention that I am not using > >>> structured streaming. > >>> > >>> I was looking into KafkaUtils.createRDD, and looks like I need to get > the > >>> earliest and the latest offset for each partition to build the > >>> Array(offsetRange). I wanted to know if there was a easier way. > >>> > >>> 1 reason why we are hesitating to use structured streaming is because I > >>> need to persist the data in Cassandra database which I believe is not > >>> production ready. > >>> > >>> > >>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das > >>> wrote: > > Its best to use DataFrames. You can read from as streaming or as > batch. > More details here. > > > https://spark.apache.org/docs/latest/structured-streaming- > kafka-integration.html#creating-a-kafka-source-for-batch-queries > > https://databricks.com/blog/2017/04/26/processing-data-in- > apache-kafka-with-structured-streaming-in-apache-spark-2-2.html > > On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande > wrote: > > > > Hi all, > > > > What is the easiest way to read all the data from kafka in a batch > > program for a given topic? > > I have 10 kafka partitions, but the data is not much. I would like to > > read from the earliest from all the partitions for a topic. > > > > I appreciate any help. Thanks > > > >>> > >> > > >
Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?
org.apache.spark.streaming.kafka.KafkaCluster has methods getLatestLeaderOffsets and getEarliestLeaderOffsets On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande wrote: > Thanks TD. > > On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das > wrote: >> >> I dont think there is any easier way. >> >> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande >> wrote: >>> >>> Thanks TD for the response. I forgot to mention that I am not using >>> structured streaming. >>> >>> I was looking into KafkaUtils.createRDD, and looks like I need to get the >>> earliest and the latest offset for each partition to build the >>> Array(offsetRange). I wanted to know if there was a easier way. >>> >>> 1 reason why we are hesitating to use structured streaming is because I >>> need to persist the data in Cassandra database which I believe is not >>> production ready. >>> >>> >>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das >>> wrote: Its best to use DataFrames. You can read from as streaming or as batch. More details here. https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande wrote: > > Hi all, > > What is the easiest way to read all the data from kafka in a batch > program for a given topic? > I have 10 kafka partitions, but the data is not much. I would like to > read from the earliest from all the partitions for a topic. > > I appreciate any help. Thanks >>> >> > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?
Thanks TD. On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das wrote: > I dont think there is any easier way. > > On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande > wrote: > >> Thanks TD for the response. I forgot to mention that I am not using >> structured streaming. >> >> I was looking into KafkaUtils.createRDD, and looks like I need to get >> the earliest and the latest offset for each partition to build the >> Array(offsetRange). I wanted to know if there was a easier way. >> >> 1 reason why we are hesitating to use structured streaming is because I >> need to persist the data in Cassandra database which I believe is not >> production ready. >> >> >> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> Its best to use DataFrames. You can read from as streaming or as batch. >>> More details here. >>> >>> https://spark.apache.org/docs/latest/structured-streaming-ka >>> fka-integration.html#creating-a-kafka-source-for-batch-queries >>> https://databricks.com/blog/2017/04/26/processing-data-in-ap >>> ache-kafka-with-structured-streaming-in-apache-spark-2-2.html >>> >>> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande < >>> deshpandesh...@gmail.com> wrote: >>> Hi all, What is the easiest way to read all the data from kafka in a batch program for a given topic? I have 10 kafka partitions, but the data is not much. I would like to read from the earliest from all the partitions for a topic. I appreciate any help. Thanks >>> >>> >> >
Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?
I dont think there is any easier way. On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande wrote: > Thanks TD for the response. I forgot to mention that I am not using > structured streaming. > > I was looking into KafkaUtils.createRDD, and looks like I need to get the > earliest and the latest offset for each partition to build the > Array(offsetRange). I wanted to know if there was a easier way. > > 1 reason why we are hesitating to use structured streaming is because I > need to persist the data in Cassandra database which I believe is not > production ready. > > > On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das > wrote: > >> Its best to use DataFrames. You can read from as streaming or as batch. >> More details here. >> >> https://spark.apache.org/docs/latest/structured-streaming-ka >> fka-integration.html#creating-a-kafka-source-for-batch-queries >> https://databricks.com/blog/2017/04/26/processing-data-in-ap >> ache-kafka-with-structured-streaming-in-apache-spark-2-2.html >> >> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande > > wrote: >> >>> Hi all, >>> >>> What is the easiest way to read all the data from kafka in a batch >>> program for a given topic? >>> I have 10 kafka partitions, but the data is not much. I would like to >>> read from the earliest from all the partitions for a topic. >>> >>> I appreciate any help. Thanks >>> >> >> >
Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?
Thanks TD for the response. I forgot to mention that I am not using structured streaming. I was looking into KafkaUtils.createRDD, and looks like I need to get the earliest and the latest offset for each partition to build the Array(offsetRange). I wanted to know if there was a easier way. 1 reason why we are hesitating to use structured streaming is because I need to persist the data in Cassandra database which I believe is not production ready. On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das wrote: > Its best to use DataFrames. You can read from as streaming or as batch. > More details here. > > https://spark.apache.org/docs/latest/structured-streaming- > kafka-integration.html#creating-a-kafka-source-for-batch-queries > https://databricks.com/blog/2017/04/26/processing-data-in- > apache-kafka-with-structured-streaming-in-apache-spark-2-2.html > > On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande > wrote: > >> Hi all, >> >> What is the easiest way to read all the data from kafka in a batch >> program for a given topic? >> I have 10 kafka partitions, but the data is not much. I would like to >> read from the earliest from all the partitions for a topic. >> >> I appreciate any help. Thanks >> > >
Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?
Its best to use DataFrames. You can read from as streaming or as batch. More details here. https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande wrote: > Hi all, > > What is the easiest way to read all the data from kafka in a batch program > for a given topic? > I have 10 kafka partitions, but the data is not much. I would like to read > from the earliest from all the partitions for a topic. > > I appreciate any help. Thanks >