I hope this helps https://github.com/pict2014/storm-redis On Feb 7, 2014 12:07 AM, "Cheng-Kang Hsieh (Andy)" <[email protected]> wrote:
> Sorry, I realized that question was badly written. Simply put, my question > is that is there a recommended way to store the tuples emitted by a BOLT so > that the tuples can be replayed after crash without repeating the process > all the way up from the source spout? any advice would be appreciated. > Thank you! > > Best, > Andy > > > On Tue, Feb 4, 2014 at 11:58 AM, Cheng-Kang Hsieh (Andy) < > [email protected]> wrote: > >> Hi all, >> >> First of all, Thank Nathan and all the contributors for pulling out such a >> great framework! I am learning a lot, even just reading the discussion >> threads. >> >> I am building a topology that contains one spout along with a chain of >> bolts. (e.g. S -> A -> B, where S is the spout, A, B are bolts.) >> >> When S emits a tuple, the next bolt A will buffer the tuple in a DFS, and >> compute some aggregated values when it has received a sufficient amount of >> data and then emit the aggregation results to the next bolt B. >> >> Here comes my question, is there a recommended way to store the >> intermediate results emitted by a bolt, so that when machine crashes, the >> results can be replayed to the downstreaming bolts (i.e. bolt B)? >> >> One possible solution could be that: Don't keep any intermediate results, >> but resort to the storm's ack framework, so that the raw data will be >> replay from spout S when crash happened. >> >> However, this approach might not be appropriate in my case, as it might >> take pretty long time (like a couple of hours) before bolt A has received >> all the required data and emit the aggregated results, so that it will be >> very expensive for ack framework to keep tracking that many tuples for >> that >> long. >> >> An alternative solution could be: *making bolt A also a spout* and keep >> the >> emitted data in a DFS queue. When a result has been acked, the bolt A >> removes it from the queue. >> >> I am wondering if it is reasonable to make a task both bolt and spout at >> the same time? or if there is any better approach to do so. >> >> Thank you! >> >> -- >> Cheng-Kang Hsieh >> UCLA Computer Science PhD Student >> M: (310) 990-4297 >> A: 3770 Keystone Ave. Apt 402, >> Los Angeles, CA 90034 >> > > > > -- > Cheng-Kang Hsieh > UCLA Computer Science PhD Student > M: (310) 990-4297 > A: 3770 Keystone Ave. Apt 402, > Los Angeles, CA 90034 >
