Re: Spark Streaming with Kafka Use Case
If by smaller block interval you mean the value in seconds passed to the streaming context constructor, no. You'll still get everything from the starting offset until now in the first batch. On Thu, Feb 18, 2016 at 10:02 AM, praveen S wrote: > Sorry.. Rephrasing : > Can this issue be resolved by having a smaller block interval? > > Regards, > Praveen > On 18 Feb 2016 21:30, "praveen S" wrote: > >> Can having a smaller block interval only resolve this? >> >> Regards, >> Praveen >> On 18 Feb 2016 21:13, "Cody Koeninger" wrote: >> >>> Backpressure won't help you with the first batch, you'd need >>> spark.streaming.kafka.maxRatePerPartition >>> for that >>> >>> On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote: >>> Have a look at spark.streaming.backpressure.enabled Property Regards, Praveen On 18 Feb 2016 00:13, "Abhishek Anand" wrote: > I have a spark streaming application running in production. I am > trying to find a solution for a particular use case when my application > has > a downtime of say 5 hours and is restarted. Now, when I start my streaming > application after 5 hours there would be considerable amount of data then > in the Kafka and my cluster would be unable to repartition and process > that. > > Is there any workaround so that when my streaming application starts > it starts taking data for 1-2 hours, process it , then take the data for > next 1 hour process it. Now when its done processing of previous 5 hours > data which missed, normal streaming should start with the given slide > interval. > > Please suggest any ideas and feasibility of this. > > > Thanks !! > Abhi > >>>
Re: Spark Streaming with Kafka Use Case
Sorry.. Rephrasing : Can this issue be resolved by having a smaller block interval? Regards, Praveen On 18 Feb 2016 21:30, "praveen S" wrote: > Can having a smaller block interval only resolve this? > > Regards, > Praveen > On 18 Feb 2016 21:13, "Cody Koeninger" wrote: > >> Backpressure won't help you with the first batch, you'd need >> spark.streaming.kafka.maxRatePerPartition >> for that >> >> On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote: >> >>> Have a look at >>> >>> spark.streaming.backpressure.enabled >>> Property >>> >>> Regards, >>> Praveen >>> On 18 Feb 2016 00:13, "Abhishek Anand" wrote: >>> I have a spark streaming application running in production. I am trying to find a solution for a particular use case when my application has a downtime of say 5 hours and is restarted. Now, when I start my streaming application after 5 hours there would be considerable amount of data then in the Kafka and my cluster would be unable to repartition and process that. Is there any workaround so that when my streaming application starts it starts taking data for 1-2 hours, process it , then take the data for next 1 hour process it. Now when its done processing of previous 5 hours data which missed, normal streaming should start with the given slide interval. Please suggest any ideas and feasibility of this. Thanks !! Abhi >>> >>
Re: Spark Streaming with Kafka Use Case
Can having a smaller block interval only resolve this? Regards, Praveen On 18 Feb 2016 21:13, "Cody Koeninger" wrote: > Backpressure won't help you with the first batch, you'd need > spark.streaming.kafka.maxRatePerPartition > for that > > On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote: > >> Have a look at >> >> spark.streaming.backpressure.enabled >> Property >> >> Regards, >> Praveen >> On 18 Feb 2016 00:13, "Abhishek Anand" wrote: >> >>> I have a spark streaming application running in production. I am trying >>> to find a solution for a particular use case when my application has a >>> downtime of say 5 hours and is restarted. Now, when I start my streaming >>> application after 5 hours there would be considerable amount of data then >>> in the Kafka and my cluster would be unable to repartition and process that. >>> >>> Is there any workaround so that when my streaming application starts it >>> starts taking data for 1-2 hours, process it , then take the data for next >>> 1 hour process it. Now when its done processing of previous 5 hours data >>> which missed, normal streaming should start with the given slide interval. >>> >>> Please suggest any ideas and feasibility of this. >>> >>> >>> Thanks !! >>> Abhi >>> >> >
Re: Spark Streaming with Kafka Use Case
Backpressure won't help you with the first batch, you'd need spark.streaming.kafka.maxRatePerPartition for that On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote: > Have a look at > > spark.streaming.backpressure.enabled > Property > > Regards, > Praveen > On 18 Feb 2016 00:13, "Abhishek Anand" wrote: > >> I have a spark streaming application running in production. I am trying >> to find a solution for a particular use case when my application has a >> downtime of say 5 hours and is restarted. Now, when I start my streaming >> application after 5 hours there would be considerable amount of data then >> in the Kafka and my cluster would be unable to repartition and process that. >> >> Is there any workaround so that when my streaming application starts it >> starts taking data for 1-2 hours, process it , then take the data for next >> 1 hour process it. Now when its done processing of previous 5 hours data >> which missed, normal streaming should start with the given slide interval. >> >> Please suggest any ideas and feasibility of this. >> >> >> Thanks !! >> Abhi >> >
Re: Spark Streaming with Kafka Use Case
Have a look at spark.streaming.backpressure.enabled Property Regards, Praveen On 18 Feb 2016 00:13, "Abhishek Anand" wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has a > downtime of say 5 hours and is restarted. Now, when I start my streaming > application after 5 hours there would be considerable amount of data then > in the Kafka and my cluster would be unable to repartition and process that. > > Is there any workaround so that when my streaming application starts it > starts taking data for 1-2 hours, process it , then take the data for next > 1 hour process it. Now when its done processing of previous 5 hours data > which missed, normal streaming should start with the given slide interval. > > Please suggest any ideas and feasibility of this. > > > Thanks !! > Abhi >
Re: Spark Streaming with Kafka Use Case
Just use a kafka rdd in a batch job or two, then start your streaming job. On Wed, Feb 17, 2016 at 12:57 AM, Abhishek Anand wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has a > downtime of say 5 hours and is restarted. Now, when I start my streaming > application after 5 hours there would be considerable amount of data then > in the Kafka and my cluster would be unable to repartition and process that. > > Is there any workaround so that when my streaming application starts it > starts taking data for 1-2 hours, process it , then take the data for next > 1 hour process it. Now when its done processing of previous 5 hours data > which missed, normal streaming should start with the given slide interval. > > Please suggest any ideas and feasibility of this. > > > Thanks !! > Abhi >