Well… I am a total storm newbie, so I can’t speak with any authority. The IBolt javadoc says this: "The IBolt does not have to process the Tuple immediately. It is perfectly fine to hang onto a tuple and process it later… (and)… It is required that all input tuples are acked or failed at some point using the OutputCollector.”
I would assume then, when you do the commit, you will need to loop through each saved tuple and do the Ack on it. So you can’t just process a tuple by saving the data to a database, and then do a commit later (without Ack), or Ack each tuple along the way, and postponing the commit (and then crash). Seems to me that you need to save (without commit), and then stick the Tuple in a cache, then at commit time, if it is successful, you then Ack each saved tuple using the output collector and clean out the cache. Perhaps someone with more experience can chime in… Craig On Mar 21, 2014, at 1:00 AM, Manthosh Kumar T <[email protected]<mailto:[email protected]>> wrote: Hi Craig, Can you elaborate on how to handle the Ack part?. I already added a TimerTask to each bolt. Since this is the last bolt in process flow, I don't emit any tuple. So how can I manually Ack the tuples after committing?. Moreover the Timer Task will just perform commit, how will I be able to handle Ack there? On 21 March 2014 12:59, King, Craig A. <[email protected]<mailto:[email protected]>> wrote: Just some thoughts… Is there only a single instance of the bolt that is responsible for committing the transactions? If there is more than one instance of that bolt running (lets say 10), then you could have them all sitting there with data to commit right before your threshold… but no spout pushing tuples so no commit... You could add a timer task to each bolt instance that would force a commit after a certain amount of time… Seems like a better option than counting… since the counts are distributed throughout the cluster. you would also want to commit when the bolt has an error, or when the cleanup method is called. Also seems that you would not want to do the Ack on the tuple until the commits have occurred. Otherwise it will look like everything has been processed, then the cord gets pulled on the computer and you have lost data. Craig On Mar 20, 2014, at 10:29 PM, Manthosh Kumar T <[email protected]<mailto:[email protected]>> wrote: Hi Alexei, Thanks. I can't use autocommit because it degrades performance. My spout emits tuples at the rate of 4000/second. I need to add them to DB after some processing. So I commit in batches in bolt, But my spout stops emitting tuples for sometime. In this case the uncommitted data will be committed only when the bolt recieves another tuple. So, during this time gap there are many uncommitted data. So is there a way to make commit periodically?. Is creating a separate thread for this in the prepare() method of bolt a good option? On 20 March 2014 20:52, Alexei Osipov <[email protected]<mailto:[email protected]>> wrote: If you don't care about choosing moment to commit your data the why you just don't use "autocommit" mode? On 3/20/2014 3:06 PM, Manthosh Kumar T wrote: I just want to call Connection.commit() for adding data to DB. Now I call commit in the bolt after some threshold. But when the spout stops emitting for some time, the uncomitted data are not yet added to DB. So if calling a function when a topology is killed isn't possible, is it possible to have a Time out in the bolt, so that I can commit data when no Tuple is received for some time? On 20 March 2014 16:39, Manthosh Kumar T <[email protected]<mailto:[email protected]>> wrote: Hi All, When running a topology in cluster, is it possible to call a function when in a bolt when a Topology is killed?. Or will cluster.shutdown() work in the same way as in local mode when running in Cluster mode?, if it is so, is it possible to call a function in bolt when the cluster is shutdown? -- Cheers, Manthosh Kumar. T -- Cheers, Manthosh Kumar. T -- Cheers, Manthosh Kumar. T -- Cheers, Manthosh Kumar. T
