Re: Spark streaming. Strict discretizing by time

2016-07-07 Thread rss rss
to.offset.reset largest, > backpressure alone should be sufficient. > > > > On Wed, Jul 6, 2016 at 12:23 PM, rss rss <rssde...@gmail.com> wrote: > >> If you aren't processing messages as fast as you receive them, you're > >> going to run out of kafka retention regar

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
rowing of a processing window time of Spark" you mean a > processing time that exceeds batch time... that's what backpressure > and maxRatePerPartition are for. As long as those are set reasonably, > you'll have a reasonably fixed output interval. Have you actually > tested thi

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
limit you have set until the processing time > approaches the batch time. > > On Wed, Jul 6, 2016 at 11:06 AM, rss rss <rssde...@gmail.com> wrote: > > Ok, with: > > > > .set("spark.streaming.backpressure.enabled","true")

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
are better for your use > case than the tradeoffs that DStreams make, you may be better off > using Flink (or testing out spark 2.0 structured streaming, although > there's no kafka integration available for that yet) > > On Wed, Jul 6, 2016 at 10:25 AM, rss rss <rssde...@

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
some reason, and > unwilling to empirically determine a reasonable maximum partition > size, you should be able to estimate an upper bound such that the > first batch does not encompass your entire kafka retention. > Backpressure will kick in once it has some information to work with. > >

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
number > of messages... isn't a reasonable way to test steady state > performance. Flink can't magically give you a correct answer under > those circumstances either. > > On Tue, Jul 5, 2016 at 10:41 AM, rss rss <rssde...@gmail.com> wrote: > > Hi, thanks. > > > &

Re: Spark streaming. Strict discretizing by time

2016-07-05 Thread rss rss
t; > http://spark.apache.org/docs/latest/configuration.html > > look for backpressure and maxRatePerParition > > > But if you're only seeing zeros after your job runs for a minute, it > sounds like something else is wrong. > > > On Tue, Jul 5, 2016 at 10:02 AM, rss

Spark streaming. Strict discretizing by time

2016-07-05 Thread rss rss
Hello, I'm trying to organize processing of messages from Kafka. And there is a typical case when a number of messages in kafka's queue is more then Spark app's possibilities to process. But I need a strong time limit to prepare result for at least for a part of data. Code example: