Re: Reliability in storm 0.9.4

2018-03-12 Thread Ajeesh
In my personal experience, I could say like try to upgrade for latest
version. Because I could see stability issues like disruptor queue fills up
and storm getting hung in 0.9.4 version. I am in the upgrade process now
migrating from 0.9.4 to 1.2.1

On Tue, Mar 13, 2018, 2:05 AM Shubham Gupta 
wrote:

> Thanks a lot
>
> Regards
> Shubham Gupta
>
> On Mon, Mar 12, 2018 at 1:33 PM, Stig Rohde Døssing 
> wrote:
>
>> I'm basing this off of later Storm versions, because I'm not familiar
>> with 0.9, so buyer beware. As far as I know the logic hasn't changed
>> though, so I'll be linking to the 2.0.0 classes.
>>
>> 1. No, you don't need to ack or fail unanchored tuples. Storm isn't
>> keeping track of those tuples at all. Here's the OutputCollector
>> implementation for bolts in 2.0.0
>> https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java#L128.
>> As you can see, if you call ack or fail on a tuple with no anchors, nothing
>> really happens.
>>
>> 2. If the tuple is anchored at the spout (i.e. the spout emitted the
>> tuple with a message id), then yes. If you want to disable acking globally,
>> you can do so by setting topology.acker.executors to 0
>> https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/jvm/org/apache/storm/Config.java#L375
>> setting.
>>
>> 2018-03-12 20:34 GMT+01:00 Shubham Gupta :
>>
>>>
>>> Hi,
>>>
>>> I had a few doubts regarding storm reliability mechanism :
>>> - Do we need to ack (or fail) unanchored tuples ?
>>> - Is is always necessary to ack or fail every tuple received from the
>>> spout in the first bolt to avoid Out of Memory error in Storm ?
>>>
>>> Thanks in advance
>>>
>>> Regards
>>> Shubham Gupta
>>>
>>>
>>
>


Re: Reliability in storm 0.9.4

2018-03-12 Thread Shubham Gupta
Thanks a lot

Regards
Shubham Gupta

On Mon, Mar 12, 2018 at 1:33 PM, Stig Rohde Døssing  wrote:

> I'm basing this off of later Storm versions, because I'm not familiar with
> 0.9, so buyer beware. As far as I know the logic hasn't changed though, so
> I'll be linking to the 2.0.0 classes.
>
> 1. No, you don't need to ack or fail unanchored tuples. Storm isn't
> keeping track of those tuples at all. Here's the OutputCollector
> implementation for bolts in 2.0.0 https://github.com/apache/storm/blob/
> ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/
> jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java#L128. As
> you can see, if you call ack or fail on a tuple with no anchors, nothing
> really happens.
>
> 2. If the tuple is anchored at the spout (i.e. the spout emitted the tuple
> with a message id), then yes. If you want to disable acking globally, you
> can do so by setting topology.acker.executors to 0
> https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043
> f5a8818d04/storm-client/src/jvm/org/apache/storm/Config.java#L375 setting.
>
> 2018-03-12 20:34 GMT+01:00 Shubham Gupta :
>
>>
>> Hi,
>>
>> I had a few doubts regarding storm reliability mechanism :
>> - Do we need to ack (or fail) unanchored tuples ?
>> - Is is always necessary to ack or fail every tuple received from the
>> spout in the first bolt to avoid Out of Memory error in Storm ?
>>
>> Thanks in advance
>>
>> Regards
>> Shubham Gupta
>>
>>
>


Re: Reliability in storm 0.9.4

2018-03-12 Thread Stig Rohde Døssing
I'm basing this off of later Storm versions, because I'm not familiar with
0.9, so buyer beware. As far as I know the logic hasn't changed though, so
I'll be linking to the 2.0.0 classes.

1. No, you don't need to ack or fail unanchored tuples. Storm isn't keeping
track of those tuples at all. Here's the OutputCollector implementation for
bolts in 2.0.0
https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java#L128.
As you can see, if you call ack or fail on a tuple with no anchors, nothing
really happens.

2. If the tuple is anchored at the spout (i.e. the spout emitted the tuple
with a message id), then yes. If you want to disable acking globally, you
can do so by setting topology.acker.executors to 0
https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/jvm/org/apache/storm/Config.java#L375
setting.

2018-03-12 20:34 GMT+01:00 Shubham Gupta :

>
> Hi,
>
> I had a few doubts regarding storm reliability mechanism :
> - Do we need to ack (or fail) unanchored tuples ?
> - Is is always necessary to ack or fail every tuple received from the
> spout in the first bolt to avoid Out of Memory error in Storm ?
>
> Thanks in advance
>
> Regards
> Shubham Gupta
>
>


Fwd: Reliability in storm 0.9.4

2018-03-12 Thread Shubham Gupta
Hi,

I had a few doubts regarding storm reliability mechanism :
- Do we need to ack (or fail) unanchored tuples ?
- Is is always necessary to ack or fail every tuple received from the spout
in the first bolt to avoid Out of Memory error in Storm ?

Thanks in advance

Regards
Shubham Gupta


Data Stream Processing Workshop with Elsevier Parallel Computing Special Issue

2018-03-12 Thread Gmail
New workshop on Data Stream Processing (best papers invited to a Special 
Issue Elsevier Parallel Computing)



**
Auto-DaSP 2018: an Euro-Par 2018 International Workshop
Autonomic Solutions for Parallel and Distributed Data Stream Processing
(Special Issue of the best papers will appear on Elsevier Parallel 
Computing!)

Date: 27-28, August 2018
Location: Turin, Italy
Workshop web page: http://www.di.unipi.it/auto-dasp-18/
Euro-Par web page: https://europar2018.org/
**

* Call for Papers
We are living in a hyper-connected world with a proliferation of devices 
continuously producing unbounded data flows that have to be processed 
“on the fly”. This extends to a wide spectrum of applications with high 
socio-economic impact, like systems for healthcare, emergency 
management, surveillance, intelligent transportation and many others.


Data Stream Processing frameworks usually ingest high frequency flows of 
incoming data, and process the application queries by respecting strict 
performance requirements in terms of throughput and response time. The 
maintenance of these constraints is often fundamental despite an 
unplanned or unexpected workload variability or changes due to the 
dynamism of the execution environment.


High-volume data streams can be efficiently handled through the adoption 
of novel high-performance solutions targeting today’s highly parallel 
hardware. This comprises multicore-based platforms and heterogeneous 
systems equipped with GPU and FPGA co-processors, aggregated at rack 
level by low-latency/high-bandwidth networks. The capacity of these 
highly-dense/highly-parallel rack-scale solutions has grown remarkably 
over the years, offering tens of thousands of heterogeneous cores and 
multiple terabytes of aggregated RAM reaching computing, memory and 
storage capacity of a large warehouse-scale cluster of just few years ago.


However, despite this large computing power, high-performance data 
streaming solutions need to be equipped with flexible and autonomic 
logics in order to adapt the framework/application configuration to 
rapidly changing execution conditions and workloads. This turns out in 
mechanisms and strategies to adapt the queries and operator placement 
policies, intra-operator parallelism degree, scheduling strategies, load 
shedding rate and so forth, and fosters novel interdisciplinary 
approaches that exploit Control Theory and Artificial Intelligence methods.


In this landscape, the workshop is willing to attract contributions in 
the area of Data Stream Processing with particular emphasis on supports 
for highly parallel platforms and autonomic features to deal with 
variable workloads. A partial list of interesting topics of this 
workshop is the following:

  - Highly parallel models for streaming applications
  - Parallel sliding-window query processing
  - Streaming parallel patterns
  - Autonomic intra-operator parallel solutions
  - Strategies for dynamic operator and query placement
  - Elastic techniques to cope with burstiness and workload variations
  - Integration of elasticity support in stream processing frameworks
  - Stream processing on heterogeneous and reconfigurable hardware
  - Stream scheduling strategies and load balancing
  - Adaptive load shedding techniques
  - Techniques to deal with out-of-order data streams
  - Power- and energy-aware management of parallel stream processing 
systems
  - Applications and use cases in various domains including Smart 
Cities, Internet of Things, Finance, Social Media, and Healthcare


* Submission Instructions
Submissions in PDF format should be between 10–12 pages in the Springer 
LNCS style, which can be downloaded from the Springer Web site. The 12 
pages limit is a hard limit while the minimum bound of 10 pages is 
needed to see the paper published in the formal Springer proceedings. It 
includes everything (text, figures, references) and will be strictly 
enforced by the submission system. Complete LaTeX sources must be 
provided for accepted papers. All submitted research papers will be 
peer-reviewed. Only contributions that are not submitted elsewhere or 
currently under review will be considered. Accepted papers will be 
included in the workshop proceedings, published by Springer in the 
ARCoSS/LNCS series. Authors of accepted papers will have to sign a 
Springer copyright form.


* Papers have to be submitted through EasyChair at:
https://easychair.org/conferences/?conf=europar2018ws

* Special Issue
A number of selected papers presented at the workshop will be invited to 
the special issue titled “Data Stream Processing in HPC Systems: new 
frameworks and architectures for high-frequency streaming” that will 
appear on Parallel Computing (Elsevier PARCO).


* Important Dates
May 4, 2018            Paper submission deadline
June 15, 2018