Awesome, sounds like we've come to similar conclusions on several
points.  At our company we prototyped with a single topology and as we
iterate we've been breaking it up into multiple smaller topologies
communicating via kafka topics.  We're using hibernate with c3p0 for
talking to mysql and also aren't passing these entities between bolts
to avoid serialization/session issues that come up with that.

Thanks for the insight :)

On Mon, Nov 9, 2015 at 11:13 PM,  <[email protected]> wrote:
> Stephen,
>
> I originally looked at using the storm-jdbc external component but very
> quickly realized that it is only available in the Storm 10.x.  So, I looked
> at the source code for the storm-jdbc and it discussed using
> http://brettwooldridge.github.io/HikariCP/ as a high-performance connection
> pool for MySQL.  I have used JPA with Hibernate and EclipseLink before but I
> thought I would give HikariCP a try.  So, far it works really well.
> However, I haven't deployed it into production yet.
>
> JPA allows you to work with POJOs as entities and can be easier to code.
> However, I wanted to avoid any potential serialization issues that might
> arise in my system because I am already doing
> POJO->Avro->Kafka->Avro->Kryo->Storm.
>
> The way I am interacting with all of it together ( would share the code but
> it aint share-ready yet) is kind of in a pseudo-stateless manner.   I don't
> know it this will make sense but here-goes:
>
> I assume that when a calculation is performed in a bolt that it will not be
> able to persist its state.  So, I persist values at places where I would ack
> a tuple in Storm.  Ultimately, my tuples come from Kafka topics.  Therefore,
> if I don't ack a tuple, it will get replayed in the case of a failure.  In
> some places I have a bolt output its product to a Kafka topic as well as
> write it somewhere in MySQL.  This allows me to break up a big topology into
> smaller topologies that have different performance needs, calculation
> frequencies, and characteristics without losing the speed and scalability.
> (Think inbound data cleaning, filtering, transformation versus complex event
> processing)
>
> I am still working on elements of the whole solution.  However, it all adds
> up to Storm and Kafka are made for each other.  I am leveraging Kafka's
> speed, storage, scalability, etc to help Storm when something goes wrong.
> Storm is awesome but it was built for speed and scalability (which is
> super-awesome).  I just have to remind myself to use it for what I really
> need, which is to spread many processes across many commodity servers.
>
>
>
> Craig Charleton
> [email protected]
>
>
> On Nov 9, 2015, at 8:36 AM, Stephen Powis <[email protected]> wrote:
>
> Hey Craig,
>
> Just out of curiosity, how are you interacting with mysql?  Via
> hibernate or something else?
>
> Thanks!
>
>
> On Mon, Nov 9, 2015 at 9:32 PM,  <[email protected]> wrote:
>
> I have been working on a project that requires a lot of calculation and
>
> retention of values in the bolts and here are some questions/considerations
>
> that I think will help you:
>
>
> - You should read this if you already haven't.  I must admit I had to read
>
> through it many times before I got the concept:
>
> http://storm.apache.org/documentation/Trident-state.html
>
>
> - When a bolt goes down, Storm will recover it automatically.  Any of the in
>
> memory values that have been calculated will be lost unless you persist the
>
> state using Trident.
>
>
> - When persisting the state in Trident (saving it somewhere so Storm can
>
> reconstitute the values when restarting the Bolt) you have to decide how
>
> accurate the values calculated by the bolt need to be.  This point is not
>
> discussed in the information that I found on Storm/Trident.  Without writing
>
> thousands of words, my project required that the values calculated in a
>
> Trident Bolt never be incorrect (complex financial). So I had to make sure
>
> that when Storm obtained the Trident state to place into a Bolt for recovery
>
> from a persistent store, that the values it used must be ACID compliant.
>
> Therefore, I couldn't use  Cassandra or any other non-ACID compliant
>
> persistent storage because of the risk (however large or small) of the
>
> values stored in Cassandra not being completely accurate.  After a lot of
>
> analysis and lost-sleep, I decided to use MySQL to persist the in-process
>
> state of any Bolts.  There are some other persistence solutions that will
>
> scale better than MySQL.  However, MySQL is still in use in huge
>
> implementations and I estimated that I don't need a solution that can
>
> process a million events a second but rather one that will process thousands
>
> of events a second and make sure that, during start-up and recovery, the
>
> values it uses reflect all the changes to the data.  There are some other
>
> persistence solutions that are ACID-compliant and say they can process
>
> faster than MySQL.  MemSQL and VoltDB looked promising.  However, they are
>
> nowhere near as mature as MySQL and I have a lot of MySQL experience.
>
>
> I would include more links to articles and git repos but I have to take my
>
> child to school :-)
>
>
>
>
> Craig Charleton
>
> [email protected]
>
>
>
> On Nov 7, 2015, at 6:27 AM, Miguel Ángel Fernández Fernández
>
> <[email protected]> wrote:
>
>
> In a trident scenario, a realtime operation needs to know the previous
>
> calculated result.
>
>
> My current solution is very poor and probably incorrect (a hashmap in
>
> bolts). Now I'm thinking to incorporate a cache (redis, memcached ...)
>
>
> However, I suppose that there is a standard solution for this problem in
>
> Trident (maybe a special state).
>
>
> What do you think is the best approach?
>
>
> Thanks for your time

Reply via email to