Great! I may roll back to Hibernate to bring it more in line with my other Java EE stuff later.
Lets stay in touch, Craig Charleton [email protected] > On Nov 9, 2015, at 11:45 AM, Stephen Powis <[email protected]> wrote: > > Awesome, sounds like we've come to similar conclusions on several > points. At our company we prototyped with a single topology and as we > iterate we've been breaking it up into multiple smaller topologies > communicating via kafka topics. We're using hibernate with c3p0 for > talking to mysql and also aren't passing these entities between bolts > to avoid serialization/session issues that come up with that. > > Thanks for the insight :) > >> On Mon, Nov 9, 2015 at 11:13 PM, <[email protected]> wrote: >> Stephen, >> >> I originally looked at using the storm-jdbc external component but very >> quickly realized that it is only available in the Storm 10.x. So, I looked >> at the source code for the storm-jdbc and it discussed using >> http://brettwooldridge.github.io/HikariCP/ as a high-performance connection >> pool for MySQL. I have used JPA with Hibernate and EclipseLink before but I >> thought I would give HikariCP a try. So, far it works really well. >> However, I haven't deployed it into production yet. >> >> JPA allows you to work with POJOs as entities and can be easier to code. >> However, I wanted to avoid any potential serialization issues that might >> arise in my system because I am already doing >> POJO->Avro->Kafka->Avro->Kryo->Storm. >> >> The way I am interacting with all of it together ( would share the code but >> it aint share-ready yet) is kind of in a pseudo-stateless manner. I don't >> know it this will make sense but here-goes: >> >> I assume that when a calculation is performed in a bolt that it will not be >> able to persist its state. So, I persist values at places where I would ack >> a tuple in Storm. Ultimately, my tuples come from Kafka topics. Therefore, >> if I don't ack a tuple, it will get replayed in the case of a failure. In >> some places I have a bolt output its product to a Kafka topic as well as >> write it somewhere in MySQL. This allows me to break up a big topology into >> smaller topologies that have different performance needs, calculation >> frequencies, and characteristics without losing the speed and scalability. >> (Think inbound data cleaning, filtering, transformation versus complex event >> processing) >> >> I am still working on elements of the whole solution. However, it all adds >> up to Storm and Kafka are made for each other. I am leveraging Kafka's >> speed, storage, scalability, etc to help Storm when something goes wrong. >> Storm is awesome but it was built for speed and scalability (which is >> super-awesome). I just have to remind myself to use it for what I really >> need, which is to spread many processes across many commodity servers. >> >> >> >> Craig Charleton >> [email protected] >> >> >> On Nov 9, 2015, at 8:36 AM, Stephen Powis <[email protected]> wrote: >> >> Hey Craig, >> >> Just out of curiosity, how are you interacting with mysql? Via >> hibernate or something else? >> >> Thanks! >> >> >> On Mon, Nov 9, 2015 at 9:32 PM, <[email protected]> wrote: >> >> I have been working on a project that requires a lot of calculation and >> >> retention of values in the bolts and here are some questions/considerations >> >> that I think will help you: >> >> >> - You should read this if you already haven't. I must admit I had to read >> >> through it many times before I got the concept: >> >> http://storm.apache.org/documentation/Trident-state.html >> >> >> - When a bolt goes down, Storm will recover it automatically. Any of the in >> >> memory values that have been calculated will be lost unless you persist the >> >> state using Trident. >> >> >> - When persisting the state in Trident (saving it somewhere so Storm can >> >> reconstitute the values when restarting the Bolt) you have to decide how >> >> accurate the values calculated by the bolt need to be. This point is not >> >> discussed in the information that I found on Storm/Trident. Without writing >> >> thousands of words, my project required that the values calculated in a >> >> Trident Bolt never be incorrect (complex financial). So I had to make sure >> >> that when Storm obtained the Trident state to place into a Bolt for recovery >> >> from a persistent store, that the values it used must be ACID compliant. >> >> Therefore, I couldn't use Cassandra or any other non-ACID compliant >> >> persistent storage because of the risk (however large or small) of the >> >> values stored in Cassandra not being completely accurate. After a lot of >> >> analysis and lost-sleep, I decided to use MySQL to persist the in-process >> >> state of any Bolts. There are some other persistence solutions that will >> >> scale better than MySQL. However, MySQL is still in use in huge >> >> implementations and I estimated that I don't need a solution that can >> >> process a million events a second but rather one that will process thousands >> >> of events a second and make sure that, during start-up and recovery, the >> >> values it uses reflect all the changes to the data. There are some other >> >> persistence solutions that are ACID-compliant and say they can process >> >> faster than MySQL. MemSQL and VoltDB looked promising. However, they are >> >> nowhere near as mature as MySQL and I have a lot of MySQL experience. >> >> >> I would include more links to articles and git repos but I have to take my >> >> child to school :-) >> >> >> >> >> Craig Charleton >> >> [email protected] >> >> >> >> On Nov 7, 2015, at 6:27 AM, Miguel Ángel Fernández Fernández >> >> <[email protected]> wrote: >> >> >> In a trident scenario, a realtime operation needs to know the previous >> >> calculated result. >> >> >> My current solution is very poor and probably incorrect (a hashmap in >> >> bolts). Now I'm thinking to incorporate a cache (redis, memcached ...) >> >> >> However, I suppose that there is a standard solution for this problem in >> >> Trident (maybe a special state). >> >> >> What do you think is the best approach? >> >> >> Thanks for your time
