Hi, Actually, the tuples aren't persisted into "zookeeper". If your "spout" emits a tuple with a unique id, it will be automatically follow internally by storm (i.e ackers) . Thus, in case the emitted tuple comes to fail because of a bolt failure, Storm invokes the method 'fail' on the origin spout task with the unique id as argument.
It's then up to you to re-emit the failed tuple. In sample codes, spouts use a Map to track which tuples are fully processed by your entire topology in order to be able to re-emit in case of a bolt failure. However, if the failure doesn't come from a bolt but from your spout, the in memory Map will be lost and your topology will not be able to remit failed tuples. For a such scenario you can rely on Kafka. In fact, the Kafka Spout store its read offset into zookeeper. In that way, if a spout task goes down it will be able to read its offset from zookeeper after restarting. Hope this help you. 2015-02-24 18:58 GMT+01:00 Douglas Alan <[email protected]>: > As I understand things, ZooKeeper will persist tuples emitted by bolts so > if a bolt crashes (or a computer with the bolt crashes, or the entire > cluster crashes), the tuple emitted by the bolt will not be lost. Once > everything is restarted, the tuples will be fetched from ZooKeeper, and > everything will continue on as if nothing bad ever happened. > > What I don't yet understand is if the same thing is true for spouts. If a > spout emits a tuple (i.e., the emit() function within a spout is executed), > and the computer the spout is running on crashes shortly thereafter, will > that tuple be resurrected by ZooKeeper? Or do we need Kafka in order to > guarantee this? > > |>oug > > P.S. I understand that the tuple emitted by the spout must be assigned a > unique ID in the call to emit(). > > P.P.S. I see sample code in books that uses something like > ConcurrentHashMap<UUID, Values> to track which spouted tuples have not yet > been acked. Is this somehow automatically persisted with ZooKeeper? I > suspect not and if not, then I shouldn't really be doing that, should I? > What should I being doing instead? Using Kafka? > -- Florian HUSSONNOIS Tel +33 6 26 92 82 23
