Re: Spark Streaming: BatchDuration and Processing time

2016-01-22 Thread Lin Zhao
Hi Silvio,

Can you go into a little detail how the back pressure work? Does it block
the receiver? Or does it temporarily saves the incoming messages in
mem/disk? I have a custom actor receiver that uses store() to save dataa
to spark. Would the back pressure make store() call block?

On 1/17/16, 10:15 AM, "Silvio Fiorito" 
wrote:

>It will just queue up the subsequent batches, however if this delay is
>constant you may start losing batches. It can handle spikes in processing
>time, but if you know you're consistently running over your batch
>duration you either need to increase the duration or look at enabling
>back pressure support. See:
>http://spark.apache.org/docs/latest/configuration.html#spark-streaming
>(1.5+).
>
>
>From: pyspark2555 
>Sent: Sunday, January 17, 2016 11:32 AM
>To: user@spark.apache.org
>Subject: Spark Streaming: BatchDuration and Processing time
>
>Hi,
>
>If BatchDuration is set to 1 second in StreamingContext and the actual
>processing time is longer than one second, then how does Spark handle
>that?
>
>For example, I am receiving a continuous Input stream. Every 1 second
>(batch
>duration), the RDDs will be processed. What if this processing time is
>longer than 1 second? What happens in the next batch duration?
>
>Thanks.
>Amit
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchD
>uration-and-Processing-time-tp25986.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>-
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>
>
>-
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming: BatchDuration and Processing time

2016-01-18 Thread Ricardo Paiva
If you are using Kafka as the message queue, Spark will process accordingly
the time slices, even if it is late, like in your example. But it will fail
sometime, due the fact that your process will ask for a message that is
older than the oldest message in Kafka.
If your process takes longer than the streaming time, let's say at your
system peak time during day, but it takes much less time at night, when
your system is mostly idle, the streaming will work and process correctly
(though it's risky if the late time slices don't finish during the idle
time).

Best thing to do is try to optimize your job to fit at the time streaming
time and avoid overflows. :)

Regards,

Ricardo





On Sun, Jan 17, 2016 at 2:32 PM, pyspark2555 [via Apache Spark User List] <
ml-node+s1001560n25986...@n3.nabble.com> wrote:

> Hi,
>
> If BatchDuration is set to 1 second in StreamingContext and the actual
> processing time is longer than one second, then how does Spark handle that?
>
> For example, I am receiving a continuous Input stream. Every 1 second
> (batch duration), the RDDs will be processed. What if this processing time
> is longer than 1 second? What happens in the next batch duration?
>
> Thanks.
> Amit
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmljYXJkby5wYWl2YUBjb3JwLmdsb2JvLmNvbXwxfDQ1MDcxMTc2Mw==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Ricardo Paiva
Big Data / Semântica
2483-6432
*globo.com* <http://www.globo.com>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986p25989.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming: BatchDuration and Processing time

2016-01-17 Thread Silvio Fiorito
It will just queue up the subsequent batches, however if this delay is constant 
you may start losing batches. It can handle spikes in processing time, but if 
you know you're consistently running over your batch duration you either need 
to increase the duration or look at enabling back pressure support. See: 
http://spark.apache.org/docs/latest/configuration.html#spark-streaming (1.5+).


From: pyspark2555 
Sent: Sunday, January 17, 2016 11:32 AM
To: user@spark.apache.org
Subject: Spark Streaming: BatchDuration and Processing time

Hi,

If BatchDuration is set to 1 second in StreamingContext and the actual
processing time is longer than one second, then how does Spark handle that?

For example, I am receiving a continuous Input stream. Every 1 second (batch
duration), the RDDs will be processed. What if this processing time is
longer than 1 second? What happens in the next batch duration?

Thanks.
Amit



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Streaming: BatchDuration and Processing time

2016-01-17 Thread pyspark2555
Hi,

If BatchDuration is set to 1 second in StreamingContext and the actual
processing time is longer than one second, then how does Spark handle that?

For example, I am receiving a continuous Input stream. Every 1 second (batch
duration), the RDDs will be processed. What if this processing time is
longer than 1 second? What happens in the next batch duration?

Thanks.
Amit



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org