I'm using branch-1.6 built for 2.11 yesterday. Part of my actor receiver that
stores data. The log reports millions while the job apparently back pressured
according to UI (I. e. 2000 a 10s batch).
store((key, msg))
if (storeCount.incrementAndGet() % 100000 == 0) {
logger.info(s"Stored ${storeCount.get()} messages to spark}")
}
From: Iulian DragoČ™
<[email protected]<mailto:[email protected]>>
Date: Thursday, January 28, 2016 at 5:33 AM
To: Lin Zhao <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Spark streaming flow control and back pressure
Calling `store` should get you there. What version of Spark are you using? Can
you share your code?
iulian
On Thu, Jan 28, 2016 at 2:28 AM, Lin Zhao
<[email protected]<mailto:[email protected]>> wrote:
I have an actor receiver that reads data and calls "store()" to save data to
spark. I was hoping spark.streaming.receiver.maxRate and
spark.streaming.backpressure would help me block the method when needed to
avoid overflowing the pipeline. But it doesn't. My actor pumps millions of
lines to spark when backpressure and the rate limit is in effect. Whereas these
data is slow flowing into the input blocks, the data created sits around and
creates memory problem.
Is there guideline how to handle this? What's the best way for my actor to know
it should slow down so it doesn't keep creating millions of messages? Blocking
store() call seems aptable.
Thanks, Lin
--
--
Iulian Dragos
------
Reactive Apps on the JVM
www.typesafe.com<http://www.typesafe.com>