Stephen, I originally looked at using the storm-jdbc external component but very quickly realized that it is only available in the Storm 10.x. So, I looked at the source code for the storm-jdbc and it discussed using http://brettwooldridge.github.io/HikariCP/ as a high-performance connection pool for MySQL. I have used JPA with Hibernate and EclipseLink before but I thought I would give HikariCP a try. So, far it works really well. However, I haven't deployed it into production yet.
JPA allows you to work with POJOs as entities and can be easier to code. However, I wanted to avoid any potential serialization issues that might arise in my system because I am already doing POJO->Avro->Kafka->Avro->Kryo->Storm. The way I am interacting with all of it together ( would share the code but it aint share-ready yet) is kind of in a pseudo-stateless manner. I don't know it this will make sense but here-goes: I assume that when a calculation is performed in a bolt that it will not be able to persist its state. So, I persist values at places where I would ack a tuple in Storm. Ultimately, my tuples come from Kafka topics. Therefore, if I don't ack a tuple, it will get replayed in the case of a failure. In some places I have a bolt output its product to a Kafka topic as well as write it somewhere in MySQL. This allows me to break up a big topology into smaller topologies that have different performance needs, calculation frequencies, and characteristics without losing the speed and scalability. (Think inbound data cleaning, filtering, transformation versus complex event processing) I am still working on elements of the whole solution. However, it all adds up to Storm and Kafka are made for each other. I am leveraging Kafka's speed, storage, scalability, etc to help Storm when something goes wrong. Storm is awesome but it was built for speed and scalability (which is super-awesome). I just have to remind myself to use it for what I really need, which is to spread many processes across many commodity servers. Craig Charleton [email protected] > On Nov 9, 2015, at 8:36 AM, Stephen Powis <[email protected]> wrote: > > Hey Craig, > > Just out of curiosity, how are you interacting with mysql? Via > hibernate or something else? > > Thanks! > > >> On Mon, Nov 9, 2015 at 9:32 PM, <[email protected]> wrote: >> I have been working on a project that requires a lot of calculation and >> retention of values in the bolts and here are some questions/considerations >> that I think will help you: >> >> - You should read this if you already haven't. I must admit I had to read >> through it many times before I got the concept: >> http://storm.apache.org/documentation/Trident-state.html >> >> - When a bolt goes down, Storm will recover it automatically. Any of the in >> memory values that have been calculated will be lost unless you persist the >> state using Trident. >> >> - When persisting the state in Trident (saving it somewhere so Storm can >> reconstitute the values when restarting the Bolt) you have to decide how >> accurate the values calculated by the bolt need to be. This point is not >> discussed in the information that I found on Storm/Trident. Without writing >> thousands of words, my project required that the values calculated in a >> Trident Bolt never be incorrect (complex financial). So I had to make sure >> that when Storm obtained the Trident state to place into a Bolt for recovery >> from a persistent store, that the values it used must be ACID compliant. >> Therefore, I couldn't use Cassandra or any other non-ACID compliant >> persistent storage because of the risk (however large or small) of the >> values stored in Cassandra not being completely accurate. After a lot of >> analysis and lost-sleep, I decided to use MySQL to persist the in-process >> state of any Bolts. There are some other persistence solutions that will >> scale better than MySQL. However, MySQL is still in use in huge >> implementations and I estimated that I don't need a solution that can >> process a million events a second but rather one that will process thousands >> of events a second and make sure that, during start-up and recovery, the >> values it uses reflect all the changes to the data. There are some other >> persistence solutions that are ACID-compliant and say they can process >> faster than MySQL. MemSQL and VoltDB looked promising. However, they are >> nowhere near as mature as MySQL and I have a lot of MySQL experience. >> >> I would include more links to articles and git repos but I have to take my >> child to school :-) >> >> >> >> Craig Charleton >> [email protected] >> >> >> On Nov 7, 2015, at 6:27 AM, Miguel Ángel Fernández Fernández >> <[email protected]> wrote: >> >> In a trident scenario, a realtime operation needs to know the previous >> calculated result. >> >> My current solution is very poor and probably incorrect (a hashmap in >> bolts). Now I'm thinking to incorporate a cache (redis, memcached ...) >> >> However, I suppose that there is a standard solution for this problem in >> Trident (maybe a special state). >> >> What do you think is the best approach? >> >> Thanks for your time
