to.offset.reset largest,
> backpressure alone should be sufficient.
>
>
>
> On Wed, Jul 6, 2016 at 12:23 PM, rss rss <rssde...@gmail.com> wrote:
> >> If you aren't processing messages as fast as you receive them, you're
> >> going to run out of kafka retention regar
rowing of a processing window time of Spark" you mean a
> processing time that exceeds batch time... that's what backpressure
> and maxRatePerPartition are for. As long as those are set reasonably,
> you'll have a reasonably fixed output interval. Have you actually
> tested thi
limit you have set until the processing time
> approaches the batch time.
>
> On Wed, Jul 6, 2016 at 11:06 AM, rss rss <rssde...@gmail.com> wrote:
> > Ok, with:
> >
> > .set("spark.streaming.backpressure.enabled","true")
are better for your use
> case than the tradeoffs that DStreams make, you may be better off
> using Flink (or testing out spark 2.0 structured streaming, although
> there's no kafka integration available for that yet)
>
> On Wed, Jul 6, 2016 at 10:25 AM, rss rss <rssde...@
some reason, and
> unwilling to empirically determine a reasonable maximum partition
> size, you should be able to estimate an upper bound such that the
> first batch does not encompass your entire kafka retention.
> Backpressure will kick in once it has some information to work with.
>
>
number
> of messages... isn't a reasonable way to test steady state
> performance. Flink can't magically give you a correct answer under
> those circumstances either.
>
> On Tue, Jul 5, 2016 at 10:41 AM, rss rss <rssde...@gmail.com> wrote:
> > Hi, thanks.
> >
> &
t;
> http://spark.apache.org/docs/latest/configuration.html
>
> look for backpressure and maxRatePerParition
>
>
> But if you're only seeing zeros after your job runs for a minute, it
> sounds like something else is wrong.
>
>
> On Tue, Jul 5, 2016 at 10:02 AM, rss
Hello,
I'm trying to organize processing of messages from Kafka. And there is a
typical case when a number of messages in kafka's queue is more then
Spark app's possibilities to process. But I need a strong time limit to
prepare result for at least for a part of data.
Code example: