I have been working on a project that requires a lot of calculation and retention of values in the bolts and here are some questions/considerations that I think will help you:
- You should read this if you already haven't. I must admit I had to read through it many times before I got the concept: http://storm.apache.org/documentation/Trident-state.html - When a bolt goes down, Storm will recover it automatically. Any of the in memory values that have been calculated will be lost unless you persist the state using Trident. - When persisting the state in Trident (saving it somewhere so Storm can reconstitute the values when restarting the Bolt) you have to decide how accurate the values calculated by the bolt need to be. This point is not discussed in the information that I found on Storm/Trident. Without writing thousands of words, my project required that the values calculated in a Trident Bolt never be incorrect (complex financial). So I had to make sure that when Storm obtained the Trident state to place into a Bolt for recovery from a persistent store, that the values it used must be ACID compliant. Therefore, I couldn't use Cassandra or any other non-ACID compliant persistent storage because of the risk (however large or small) of the values stored in Cassandra not being completely accurate. After a lot of analysis and lost-sleep, I decided to use MySQL to persist the in-process state of any Bolts. There are some other persistence solutions that will scale better than MySQL. However, MySQL is still in use in huge implementations and I estimated that I don't need a solution that can process a million events a second but rather one that will process thousands of events a second and make sure that, during start-up and recovery, the values it uses reflect all the changes to the data. There are some other persistence solutions that are ACID-compliant and say they can process faster than MySQL. MemSQL and VoltDB looked promising. However, they are nowhere near as mature as MySQL and I have a lot of MySQL experience. I would include more links to articles and git repos but I have to take my child to school :-) Craig Charleton [email protected] > On Nov 7, 2015, at 6:27 AM, Miguel Ángel Fernández Fernández > <[email protected]> wrote: > > In a trident scenario, a realtime operation needs to know the previous > calculated result. > > My current solution is very poor and probably incorrect (a hashmap in bolts). > Now I'm thinking to incorporate a cache (redis, memcached ...) > > However, I suppose that there is a standard solution for this problem in > Trident (maybe a special state). > > What do you think is the best approach? > > Thanks for your time
