Re: Reliability in storm 0.9.4
In my personal experience, I could say like try to upgrade for latest version. Because I could see stability issues like disruptor queue fills up and storm getting hung in 0.9.4 version. I am in the upgrade process now migrating from 0.9.4 to 1.2.1 On Tue, Mar 13, 2018, 2:05 AM Shubham Guptawrote: > Thanks a lot > > Regards > Shubham Gupta > > On Mon, Mar 12, 2018 at 1:33 PM, Stig Rohde Døssing > wrote: > >> I'm basing this off of later Storm versions, because I'm not familiar >> with 0.9, so buyer beware. As far as I know the logic hasn't changed >> though, so I'll be linking to the 2.0.0 classes. >> >> 1. No, you don't need to ack or fail unanchored tuples. Storm isn't >> keeping track of those tuples at all. Here's the OutputCollector >> implementation for bolts in 2.0.0 >> https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java#L128. >> As you can see, if you call ack or fail on a tuple with no anchors, nothing >> really happens. >> >> 2. If the tuple is anchored at the spout (i.e. the spout emitted the >> tuple with a message id), then yes. If you want to disable acking globally, >> you can do so by setting topology.acker.executors to 0 >> https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/jvm/org/apache/storm/Config.java#L375 >> setting. >> >> 2018-03-12 20:34 GMT+01:00 Shubham Gupta : >> >>> >>> Hi, >>> >>> I had a few doubts regarding storm reliability mechanism : >>> - Do we need to ack (or fail) unanchored tuples ? >>> - Is is always necessary to ack or fail every tuple received from the >>> spout in the first bolt to avoid Out of Memory error in Storm ? >>> >>> Thanks in advance >>> >>> Regards >>> Shubham Gupta >>> >>> >> >
Re: Reliability in storm 0.9.4
Thanks a lot Regards Shubham Gupta On Mon, Mar 12, 2018 at 1:33 PM, Stig Rohde Døssingwrote: > I'm basing this off of later Storm versions, because I'm not familiar with > 0.9, so buyer beware. As far as I know the logic hasn't changed though, so > I'll be linking to the 2.0.0 classes. > > 1. No, you don't need to ack or fail unanchored tuples. Storm isn't > keeping track of those tuples at all. Here's the OutputCollector > implementation for bolts in 2.0.0 https://github.com/apache/storm/blob/ > ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/ > jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java#L128. As > you can see, if you call ack or fail on a tuple with no anchors, nothing > really happens. > > 2. If the tuple is anchored at the spout (i.e. the spout emitted the tuple > with a message id), then yes. If you want to disable acking globally, you > can do so by setting topology.acker.executors to 0 > https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043 > f5a8818d04/storm-client/src/jvm/org/apache/storm/Config.java#L375 setting. > > 2018-03-12 20:34 GMT+01:00 Shubham Gupta : > >> >> Hi, >> >> I had a few doubts regarding storm reliability mechanism : >> - Do we need to ack (or fail) unanchored tuples ? >> - Is is always necessary to ack or fail every tuple received from the >> spout in the first bolt to avoid Out of Memory error in Storm ? >> >> Thanks in advance >> >> Regards >> Shubham Gupta >> >> >
Re: Reliability in storm 0.9.4
I'm basing this off of later Storm versions, because I'm not familiar with 0.9, so buyer beware. As far as I know the logic hasn't changed though, so I'll be linking to the 2.0.0 classes. 1. No, you don't need to ack or fail unanchored tuples. Storm isn't keeping track of those tuples at all. Here's the OutputCollector implementation for bolts in 2.0.0 https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltOutputCollectorImpl.java#L128. As you can see, if you call ack or fail on a tuple with no anchors, nothing really happens. 2. If the tuple is anchored at the spout (i.e. the spout emitted the tuple with a message id), then yes. If you want to disable acking globally, you can do so by setting topology.acker.executors to 0 https://github.com/apache/storm/blob/ffa607e2464a361a8f2fa548cc8043f5a8818d04/storm-client/src/jvm/org/apache/storm/Config.java#L375 setting. 2018-03-12 20:34 GMT+01:00 Shubham Gupta: > > Hi, > > I had a few doubts regarding storm reliability mechanism : > - Do we need to ack (or fail) unanchored tuples ? > - Is is always necessary to ack or fail every tuple received from the > spout in the first bolt to avoid Out of Memory error in Storm ? > > Thanks in advance > > Regards > Shubham Gupta > >
Fwd: Reliability in storm 0.9.4
Hi, I had a few doubts regarding storm reliability mechanism : - Do we need to ack (or fail) unanchored tuples ? - Is is always necessary to ack or fail every tuple received from the spout in the first bolt to avoid Out of Memory error in Storm ? Thanks in advance Regards Shubham Gupta
Data Stream Processing Workshop with Elsevier Parallel Computing Special Issue
New workshop on Data Stream Processing (best papers invited to a Special Issue Elsevier Parallel Computing) ** Auto-DaSP 2018: an Euro-Par 2018 International Workshop Autonomic Solutions for Parallel and Distributed Data Stream Processing (Special Issue of the best papers will appear on Elsevier Parallel Computing!) Date: 27-28, August 2018 Location: Turin, Italy Workshop web page: http://www.di.unipi.it/auto-dasp-18/ Euro-Par web page: https://europar2018.org/ ** * Call for Papers We are living in a hyper-connected world with a proliferation of devices continuously producing unbounded data flows that have to be processed “on the fly”. This extends to a wide spectrum of applications with high socio-economic impact, like systems for healthcare, emergency management, surveillance, intelligent transportation and many others. Data Stream Processing frameworks usually ingest high frequency flows of incoming data, and process the application queries by respecting strict performance requirements in terms of throughput and response time. The maintenance of these constraints is often fundamental despite an unplanned or unexpected workload variability or changes due to the dynamism of the execution environment. High-volume data streams can be efficiently handled through the adoption of novel high-performance solutions targeting today’s highly parallel hardware. This comprises multicore-based platforms and heterogeneous systems equipped with GPU and FPGA co-processors, aggregated at rack level by low-latency/high-bandwidth networks. The capacity of these highly-dense/highly-parallel rack-scale solutions has grown remarkably over the years, offering tens of thousands of heterogeneous cores and multiple terabytes of aggregated RAM reaching computing, memory and storage capacity of a large warehouse-scale cluster of just few years ago. However, despite this large computing power, high-performance data streaming solutions need to be equipped with flexible and autonomic logics in order to adapt the framework/application configuration to rapidly changing execution conditions and workloads. This turns out in mechanisms and strategies to adapt the queries and operator placement policies, intra-operator parallelism degree, scheduling strategies, load shedding rate and so forth, and fosters novel interdisciplinary approaches that exploit Control Theory and Artificial Intelligence methods. In this landscape, the workshop is willing to attract contributions in the area of Data Stream Processing with particular emphasis on supports for highly parallel platforms and autonomic features to deal with variable workloads. A partial list of interesting topics of this workshop is the following: - Highly parallel models for streaming applications - Parallel sliding-window query processing - Streaming parallel patterns - Autonomic intra-operator parallel solutions - Strategies for dynamic operator and query placement - Elastic techniques to cope with burstiness and workload variations - Integration of elasticity support in stream processing frameworks - Stream processing on heterogeneous and reconfigurable hardware - Stream scheduling strategies and load balancing - Adaptive load shedding techniques - Techniques to deal with out-of-order data streams - Power- and energy-aware management of parallel stream processing systems - Applications and use cases in various domains including Smart Cities, Internet of Things, Finance, Social Media, and Healthcare * Submission Instructions Submissions in PDF format should be between 10–12 pages in the Springer LNCS style, which can be downloaded from the Springer Web site. The 12 pages limit is a hard limit while the minimum bound of 10 pages is needed to see the paper published in the formal Springer proceedings. It includes everything (text, figures, references) and will be strictly enforced by the submission system. Complete LaTeX sources must be provided for accepted papers. All submitted research papers will be peer-reviewed. Only contributions that are not submitted elsewhere or currently under review will be considered. Accepted papers will be included in the workshop proceedings, published by Springer in the ARCoSS/LNCS series. Authors of accepted papers will have to sign a Springer copyright form. * Papers have to be submitted through EasyChair at: https://easychair.org/conferences/?conf=europar2018ws * Special Issue A number of selected papers presented at the workshop will be invited to the special issue titled “Data Stream Processing in HPC Systems: new frameworks and architectures for high-frequency streaming” that will appear on Parallel Computing (Elsevier PARCO). * Important Dates May 4, 2018 Paper submission deadline June 15, 2018