Re: windowing & max spout pending

2016-09-01 Thread Balázs Kossovics
In this case would you recommend keeping the bolt's parallelism at 1, to 
avoid having to set max.pending.spout too high?


Thanks,
Balazs


On 09/01/2016 02:04 PM, Arun Mahadevan wrote:

Hi Balazs,

Yes, you are right. For event time windows, the progress (of time) is based on 
the timestamps of the events that arrive. So if no new events arrive, the 
topology can get stuck since the time does not progress.

max.spout.pending > parallelism * (window length + sliding interval) is 
required to make progress.

Thanks,
Arun


On 9/1/16, 9:51 PM, "Balázs Kossovics"  wrote:


Hi Arun,

Thanks for your promptly answer.

As we have problems with our windowed topology (the average processing
time keeps increasing, while the processing speed keeps dropping until
hitting zero) I kept thinking about the value of max.spout.pending
parameter for the case of field based windowing.

Imagine the following situation: we have a window length of 1s and a
sliding interval of 0.1s, with the max.pending.spout set to 2, thus each
second the expired tuples are going to be evicted. Now if our windowed
bolt have a parallelism of 2, and the messages are partitioned in a
round robin fashion (well it's not an actually existion partitioning
scheme in Storm, let's say that for the example's sake) between the
component instances, then the topology are going to stop processing
messages, as the watermarks are emitted on a per component basis.

Unless I'm mistaken somewhere, the max.spout.pending should be higher
than (parallelism * (number of tuples in window length + sliding
interval)), right?

Best regards,
Balazs

On 08/24/2016 05:31 PM, Arun Mahadevan wrote:

Hi Balazs,



Tuples are acked only when the window slides and the events fall out of the 
window.

So max.spout.pending should be more than max number of tuples in window length 
+ sliding interval.

Thanks,
Arun

On 8/24/16, 8:33 PM, "Balázs Kossovics"  wrote:


Hello,

I've recently hit a problem with my topology using the windowing mode
when I set the TOPOLOGY_MAX_SPOUT_PENDING too low.

I have a window lenght of 60 seconds and sliding interval of 1 seconds
with field based timestamping. In my first tests I didn't specify
TOPOLOGY_MAX_SPOUT_PENDING, so when I was working on historical data my
topology died with an OutOfMemoryError because of the practically
unbounded dataflow. Nothing surprising here.

Just to make sure to avoid the OOMError at the next run, I set
TOPOLOGY_MAX_SPOUT_PENDING to 10, which basically pushed my topology in
a deadlock, because the first 10 tuple was all in the same second, so
the WatermarkTimeEvictionPolicy didn't trigger Action. EXPIRE. Following
this train of thought, imagine that the number of messages during given
window attains this limit: your spout stops emitting, so the tuple which
would make the expire event trigger would never come.

Am I right in thinking that TOPOLOGY_MAX_SPOUT_PENDING must always be
greate than the number of messages during any timeperiod equivalent to
the window length?

Best regards,
Balazs












Re: windowing & max spout pending

2016-09-01 Thread Arun Mahadevan
Hi Balazs,

Yes, you are right. For event time windows, the progress (of time) is based on 
the timestamps of the events that arrive. So if no new events arrive, the 
topology can get stuck since the time does not progress.

max.spout.pending > parallelism * (window length + sliding interval) is 
required to make progress.

Thanks,
Arun


On 9/1/16, 9:51 PM, "Balázs Kossovics"  wrote:

>Hi Arun,
>
>Thanks for your promptly answer.
>
>As we have problems with our windowed topology (the average processing 
>time keeps increasing, while the processing speed keeps dropping until 
>hitting zero) I kept thinking about the value of max.spout.pending 
>parameter for the case of field based windowing.
>
>Imagine the following situation: we have a window length of 1s and a 
>sliding interval of 0.1s, with the max.pending.spout set to 2, thus each 
>second the expired tuples are going to be evicted. Now if our windowed 
>bolt have a parallelism of 2, and the messages are partitioned in a 
>round robin fashion (well it's not an actually existion partitioning 
>scheme in Storm, let's say that for the example's sake) between the 
>component instances, then the topology are going to stop processing 
>messages, as the watermarks are emitted on a per component basis.
>
>Unless I'm mistaken somewhere, the max.spout.pending should be higher 
>than (parallelism * (number of tuples in window length + sliding 
>interval)), right?
>
>Best regards,
>Balazs
>
>On 08/24/2016 05:31 PM, Arun Mahadevan wrote:
>> Hi Balazs,
>>
>>
>>
>> Tuples are acked only when the window slides and the events fall out of the 
>> window.
>>
>> So max.spout.pending should be more than max number of tuples in window 
>> length + sliding interval.
>>
>> Thanks,
>> Arun
>>
>> On 8/24/16, 8:33 PM, "Balázs Kossovics"  wrote:
>>
>>> Hello,
>>>
>>> I've recently hit a problem with my topology using the windowing mode
>>> when I set the TOPOLOGY_MAX_SPOUT_PENDING too low.
>>>
>>> I have a window lenght of 60 seconds and sliding interval of 1 seconds
>>> with field based timestamping. In my first tests I didn't specify
>>> TOPOLOGY_MAX_SPOUT_PENDING, so when I was working on historical data my
>>> topology died with an OutOfMemoryError because of the practically
>>> unbounded dataflow. Nothing surprising here.
>>>
>>> Just to make sure to avoid the OOMError at the next run, I set
>>> TOPOLOGY_MAX_SPOUT_PENDING to 10, which basically pushed my topology in
>>> a deadlock, because the first 10 tuple was all in the same second, so
>>> the WatermarkTimeEvictionPolicy didn't trigger Action. EXPIRE. Following
>>> this train of thought, imagine that the number of messages during given
>>> window attains this limit: your spout stops emitting, so the tuple which
>>> would make the expire event trigger would never come.
>>>
>>> Am I right in thinking that TOPOLOGY_MAX_SPOUT_PENDING must always be
>>> greate than the number of messages during any timeperiod equivalent to
>>> the window length?
>>>
>>> Best regards,
>>> Balazs
>>>
>>
>>
>
>




Re: windowing & max spout pending

2016-09-01 Thread Balázs Kossovics

Hi Arun,

Thanks for your promptly answer.

As we have problems with our windowed topology (the average processing 
time keeps increasing, while the processing speed keeps dropping until 
hitting zero) I kept thinking about the value of max.spout.pending 
parameter for the case of field based windowing.


Imagine the following situation: we have a window length of 1s and a 
sliding interval of 0.1s, with the max.pending.spout set to 2, thus each 
second the expired tuples are going to be evicted. Now if our windowed 
bolt have a parallelism of 2, and the messages are partitioned in a 
round robin fashion (well it's not an actually existion partitioning 
scheme in Storm, let's say that for the example's sake) between the 
component instances, then the topology are going to stop processing 
messages, as the watermarks are emitted on a per component basis.


Unless I'm mistaken somewhere, the max.spout.pending should be higher 
than (parallelism * (number of tuples in window length + sliding 
interval)), right?


Best regards,
Balazs

On 08/24/2016 05:31 PM, Arun Mahadevan wrote:

Hi Balazs,



Tuples are acked only when the window slides and the events fall out of the 
window.

So max.spout.pending should be more than max number of tuples in window length 
+ sliding interval.

Thanks,
Arun

On 8/24/16, 8:33 PM, "Balázs Kossovics"  wrote:


Hello,

I've recently hit a problem with my topology using the windowing mode
when I set the TOPOLOGY_MAX_SPOUT_PENDING too low.

I have a window lenght of 60 seconds and sliding interval of 1 seconds
with field based timestamping. In my first tests I didn't specify
TOPOLOGY_MAX_SPOUT_PENDING, so when I was working on historical data my
topology died with an OutOfMemoryError because of the practically
unbounded dataflow. Nothing surprising here.

Just to make sure to avoid the OOMError at the next run, I set
TOPOLOGY_MAX_SPOUT_PENDING to 10, which basically pushed my topology in
a deadlock, because the first 10 tuple was all in the same second, so
the WatermarkTimeEvictionPolicy didn't trigger Action. EXPIRE. Following
this train of thought, imagine that the number of messages during given
window attains this limit: your spout stops emitting, so the tuple which
would make the expire event trigger would never come.

Am I right in thinking that TOPOLOGY_MAX_SPOUT_PENDING must always be
greate than the number of messages during any timeperiod equivalent to
the window length?

Best regards,
Balazs








Re: windowing & max spout pending

2016-08-24 Thread Arun Mahadevan
Hi Balazs,



Tuples are acked only when the window slides and the events fall out of the 
window. 

So max.spout.pending should be more than max number of tuples in window length 
+ sliding interval.

Thanks,
Arun

On 8/24/16, 8:33 PM, "Balázs Kossovics"  wrote:

>Hello,
>
>I've recently hit a problem with my topology using the windowing mode 
>when I set the TOPOLOGY_MAX_SPOUT_PENDING too low.
>
>I have a window lenght of 60 seconds and sliding interval of 1 seconds 
>with field based timestamping. In my first tests I didn't specify 
>TOPOLOGY_MAX_SPOUT_PENDING, so when I was working on historical data my 
>topology died with an OutOfMemoryError because of the practically 
>unbounded dataflow. Nothing surprising here.
>
>Just to make sure to avoid the OOMError at the next run, I set 
>TOPOLOGY_MAX_SPOUT_PENDING to 10, which basically pushed my topology in 
>a deadlock, because the first 10 tuple was all in the same second, so 
>the WatermarkTimeEvictionPolicy didn't trigger Action. EXPIRE. Following 
>this train of thought, imagine that the number of messages during given 
>window attains this limit: your spout stops emitting, so the tuple which 
>would make the expire event trigger would never come.
>
>Am I right in thinking that TOPOLOGY_MAX_SPOUT_PENDING must always be 
>greate than the number of messages during any timeperiod equivalent to 
>the window length?
>
>Best regards,
>Balazs
>