Re: Store previous calculated result

craig . charleton Mon, 09 Nov 2015 10:15:12 -0800

Great!  I may roll back to Hibernate to bring it more in line with my other 
Java EE stuff later.


Lets stay in touch,


Craig Charleton
[email protected]


> On Nov 9, 2015, at 11:45 AM, Stephen Powis <[email protected]> wrote:
> 
> Awesome, sounds like we've come to similar conclusions on several
> points.  At our company we prototyped with a single topology and as we
> iterate we've been breaking it up into multiple smaller topologies
> communicating via kafka topics.  We're using hibernate with c3p0 for
> talking to mysql and also aren't passing these entities between bolts
> to avoid serialization/session issues that come up with that.
> 
> Thanks for the insight :)
> 
>> On Mon, Nov 9, 2015 at 11:13 PM,  <[email protected]> wrote:
>> Stephen,
>> 
>> I originally looked at using the storm-jdbc external component but very
>> quickly realized that it is only available in the Storm 10.x.  So, I looked
>> at the source code for the storm-jdbc and it discussed using
>> http://brettwooldridge.github.io/HikariCP/ as a high-performance connection
>> pool for MySQL.  I have used JPA with Hibernate and EclipseLink before but I
>> thought I would give HikariCP a try.  So, far it works really well.
>> However, I haven't deployed it into production yet.
>> 
>> JPA allows you to work with POJOs as entities and can be easier to code.
>> However, I wanted to avoid any potential serialization issues that might
>> arise in my system because I am already doing
>> POJO->Avro->Kafka->Avro->Kryo->Storm.
>> 
>> The way I am interacting with all of it together ( would share the code but
>> it aint share-ready yet) is kind of in a pseudo-stateless manner.   I don't
>> know it this will make sense but here-goes:
>> 
>> I assume that when a calculation is performed in a bolt that it will not be
>> able to persist its state.  So, I persist values at places where I would ack
>> a tuple in Storm.  Ultimately, my tuples come from Kafka topics.  Therefore,
>> if I don't ack a tuple, it will get replayed in the case of a failure.  In
>> some places I have a bolt output its product to a Kafka topic as well as
>> write it somewhere in MySQL.  This allows me to break up a big topology into
>> smaller topologies that have different performance needs, calculation
>> frequencies, and characteristics without losing the speed and scalability.
>> (Think inbound data cleaning, filtering, transformation versus complex event
>> processing)
>> 
>> I am still working on elements of the whole solution.  However, it all adds
>> up to Storm and Kafka are made for each other.  I am leveraging Kafka's
>> speed, storage, scalability, etc to help Storm when something goes wrong.
>> Storm is awesome but it was built for speed and scalability (which is
>> super-awesome).  I just have to remind myself to use it for what I really
>> need, which is to spread many processes across many commodity servers.
>> 
>> 
>> 
>> Craig Charleton
>> [email protected]
>> 
>> 
>> On Nov 9, 2015, at 8:36 AM, Stephen Powis <[email protected]> wrote:
>> 
>> Hey Craig,
>> 
>> Just out of curiosity, how are you interacting with mysql?  Via
>> hibernate or something else?
>> 
>> Thanks!
>> 
>> 
>> On Mon, Nov 9, 2015 at 9:32 PM,  <[email protected]> wrote:
>> 
>> I have been working on a project that requires a lot of calculation and
>> 
>> retention of values in the bolts and here are some questions/considerations
>> 
>> that I think will help you:
>> 
>> 
>> - You should read this if you already haven't.  I must admit I had to read
>> 
>> through it many times before I got the concept:
>> 
>> http://storm.apache.org/documentation/Trident-state.html
>> 
>> 
>> - When a bolt goes down, Storm will recover it automatically.  Any of the in
>> 
>> memory values that have been calculated will be lost unless you persist the
>> 
>> state using Trident.
>> 
>> 
>> - When persisting the state in Trident (saving it somewhere so Storm can
>> 
>> reconstitute the values when restarting the Bolt) you have to decide how
>> 
>> accurate the values calculated by the bolt need to be.  This point is not
>> 
>> discussed in the information that I found on Storm/Trident.  Without writing
>> 
>> thousands of words, my project required that the values calculated in a
>> 
>> Trident Bolt never be incorrect (complex financial). So I had to make sure
>> 
>> that when Storm obtained the Trident state to place into a Bolt for recovery
>> 
>> from a persistent store, that the values it used must be ACID compliant.
>> 
>> Therefore, I couldn't use  Cassandra or any other non-ACID compliant
>> 
>> persistent storage because of the risk (however large or small) of the
>> 
>> values stored in Cassandra not being completely accurate.  After a lot of
>> 
>> analysis and lost-sleep, I decided to use MySQL to persist the in-process
>> 
>> state of any Bolts.  There are some other persistence solutions that will
>> 
>> scale better than MySQL.  However, MySQL is still in use in huge
>> 
>> implementations and I estimated that I don't need a solution that can
>> 
>> process a million events a second but rather one that will process thousands
>> 
>> of events a second and make sure that, during start-up and recovery, the
>> 
>> values it uses reflect all the changes to the data.  There are some other
>> 
>> persistence solutions that are ACID-compliant and say they can process
>> 
>> faster than MySQL.  MemSQL and VoltDB looked promising.  However, they are
>> 
>> nowhere near as mature as MySQL and I have a lot of MySQL experience.
>> 
>> 
>> I would include more links to articles and git repos but I have to take my
>> 
>> child to school :-)
>> 
>> 
>> 
>> 
>> Craig Charleton
>> 
>> [email protected]
>> 
>> 
>> 
>> On Nov 7, 2015, at 6:27 AM, Miguel Ángel Fernández Fernández
>> 
>> <[email protected]> wrote:
>> 
>> 
>> In a trident scenario, a realtime operation needs to know the previous
>> 
>> calculated result.
>> 
>> 
>> My current solution is very poor and probably incorrect (a hashmap in
>> 
>> bolts). Now I'm thinking to incorporate a cache (redis, memcached ...)
>> 
>> 
>> However, I suppose that there is a standard solution for this problem in
>> 
>> Trident (maybe a special state).
>> 
>> 
>> What do you think is the best approach?
>> 
>> 
>> Thanks for your time

Re: Store previous calculated result

Reply via email to