Re: Spark Streaming: BatchDuration and Processing time
Hi Silvio, Can you go into a little detail how the back pressure work? Does it block the receiver? Or does it temporarily saves the incoming messages in mem/disk? I have a custom actor receiver that uses store() to save dataa to spark. Would the back pressure make store() call block? On 1/17/16, 10:15 AM, "Silvio Fiorito" wrote: >It will just queue up the subsequent batches, however if this delay is >constant you may start losing batches. It can handle spikes in processing >time, but if you know you're consistently running over your batch >duration you either need to increase the duration or look at enabling >back pressure support. See: >http://spark.apache.org/docs/latest/configuration.html#spark-streaming >(1.5+). > > >From: pyspark2555 >Sent: Sunday, January 17, 2016 11:32 AM >To: user@spark.apache.org >Subject: Spark Streaming: BatchDuration and Processing time > >Hi, > >If BatchDuration is set to 1 second in StreamingContext and the actual >processing time is longer than one second, then how does Spark handle >that? > >For example, I am receiving a continuous Input stream. Every 1 second >(batch >duration), the RDDs will be processed. What if this processing time is >longer than 1 second? What happens in the next batch duration? > >Thanks. >Amit > > > >-- >View this message in context: >http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchD >uration-and-Processing-time-tp25986.html >Sent from the Apache Spark User List mailing list archive at Nabble.com. > >- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org > > >- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming: BatchDuration and Processing time
If you are using Kafka as the message queue, Spark will process accordingly the time slices, even if it is late, like in your example. But it will fail sometime, due the fact that your process will ask for a message that is older than the oldest message in Kafka. If your process takes longer than the streaming time, let's say at your system peak time during day, but it takes much less time at night, when your system is mostly idle, the streaming will work and process correctly (though it's risky if the late time slices don't finish during the idle time). Best thing to do is try to optimize your job to fit at the time streaming time and avoid overflows. :) Regards, Ricardo On Sun, Jan 17, 2016 at 2:32 PM, pyspark2555 [via Apache Spark User List] < ml-node+s1001560n25986...@n3.nabble.com> wrote: > Hi, > > If BatchDuration is set to 1 second in StreamingContext and the actual > processing time is longer than one second, then how does Spark handle that? > > For example, I am receiving a continuous Input stream. Every 1 second > (batch duration), the RDDs will be processed. What if this processing time > is longer than 1 second? What happens in the next batch duration? > > Thanks. > Amit > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmljYXJkby5wYWl2YUBjb3JwLmdsb2JvLmNvbXwxfDQ1MDcxMTc2Mw==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- Ricardo Paiva Big Data / Semântica 2483-6432 *globo.com* <http://www.globo.com> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986p25989.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming: BatchDuration and Processing time
It will just queue up the subsequent batches, however if this delay is constant you may start losing batches. It can handle spikes in processing time, but if you know you're consistently running over your batch duration you either need to increase the duration or look at enabling back pressure support. See: http://spark.apache.org/docs/latest/configuration.html#spark-streaming (1.5+). From: pyspark2555 Sent: Sunday, January 17, 2016 11:32 AM To: user@spark.apache.org Subject: Spark Streaming: BatchDuration and Processing time Hi, If BatchDuration is set to 1 second in StreamingContext and the actual processing time is longer than one second, then how does Spark handle that? For example, I am receiving a continuous Input stream. Every 1 second (batch duration), the RDDs will be processed. What if this processing time is longer than 1 second? What happens in the next batch duration? Thanks. Amit -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Streaming: BatchDuration and Processing time
Hi, If BatchDuration is set to 1 second in StreamingContext and the actual processing time is longer than one second, then how does Spark handle that? For example, I am receiving a continuous Input stream. Every 1 second (batch duration), the RDDs will be processed. What if this processing time is longer than 1 second? What happens in the next batch duration? Thanks. Amit -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org