I have been working on a project that requires a lot of calculation and 
retention of values in the bolts and here are some questions/considerations 
that I think will help you:

- You should read this if you already haven't.  I must admit I had to read 
through it many times before I got the concept: 
http://storm.apache.org/documentation/Trident-state.html

- When a bolt goes down, Storm will recover it automatically.  Any of the in 
memory values that have been calculated will be lost unless you persist the 
state using Trident.

- When persisting the state in Trident (saving it somewhere so Storm can 
reconstitute the values when restarting the Bolt) you have to decide how 
accurate the values calculated by the bolt need to be.  This point is not 
discussed in the information that I found on Storm/Trident.  Without writing 
thousands of words, my project required that the values calculated in a Trident 
Bolt never be incorrect (complex financial). So I had to make sure that when 
Storm obtained the Trident state to place into a Bolt for recovery from a 
persistent store, that the values it used must be ACID compliant.  Therefore, I 
couldn't use  Cassandra or any other non-ACID compliant persistent storage 
because of the risk (however large or small) of the values stored in Cassandra 
not being completely accurate.  After a lot of analysis and lost-sleep, I 
decided to use MySQL to persist the in-process state of any Bolts.  There are 
some other persistence solutions that will scale better than MySQL.  However, 
MySQL is still in use in huge implementations and I estimated that I don't need 
a solution that can process a million events a second but rather one that will 
process thousands of events a second and make sure that, during start-up and 
recovery, the values it uses reflect all the changes to the data.  There are 
some other persistence solutions that are ACID-compliant and say they can 
process faster than MySQL.  MemSQL and VoltDB looked promising.  However, they 
are nowhere near as mature as MySQL and I have a lot of MySQL experience.

I would include more links to articles and git repos but I have to take my 
child to school :-)



Craig Charleton
[email protected]


> On Nov 7, 2015, at 6:27 AM, Miguel Ángel Fernández Fernández 
> <[email protected]> wrote:
> 
> In a trident scenario, a realtime operation needs to know the previous 
> calculated result. 
> 
> My current solution is very poor and probably incorrect (a hashmap in bolts). 
> Now I'm thinking to incorporate a cache (redis, memcached ...)
> 
> However, I suppose that there is a standard solution for this problem in 
> Trident (maybe a special state). 
> 
> What do you think is the best approach?
> 
> Thanks for your time

Reply via email to