Hi Andrew, Thanks for pointing this out. I agree with your point about bit rot.
However, we had to remove the the 0mq transport due to license incompatibilities with Apache, so any kind of release test suite would have to be maintained outside of Apache since it would likely pull in LGPL-licensed dependencies. So if something like you’re suggesting could be accomplished in the storm-0mq project, that would be the best option. I’m open to pull requests, help, contributions, etc. to storm-0mq. It just can’t be part of Apache. I’ll test out your changes to storm-0mq to see if I can reproduce the issue you’re seeing. As Nathan mentioned, any additional information (thread dumps, etc.) you could provide would help. Thanks (and sorry for the inconvenience), Taylor On Jun 19, 2014, at 6:09 PM, Andrew Montalenti <[email protected]> wrote: > Another interesting 0.9.2 issue I came across: the IConnection interface has > changed, meaning any pluggable transports no longer work without a code > change. > > I implemented changes to storm-0mq to get it to be compatible with this > interface change in my fork here. > > https://github.com/Parsely/storm-0mq/compare/ptgoetz:master...master > > I tested that and it nominally works in distributed mode with two independent > workers in my cluster. Don't know what the performance impact is of the > interface change. > > I get that zmq is no longer part of storm core, but maintaining a stable > interface for pluggable components like this transport is probably something > that should be in the release test suite. Otherwise bitrot will take its > toll. I am glad to volunteer help with this. > > My team is now debugging an issue where Storm stops asking our spout for next > tuples after awhile of running the topology, causing the tool go to basically > freeze with no errors in the logs. At first blush, seems like a regression > from 0.9.1. But we'll have more detailed info once we isolate some variables > soon. > > On Jun 18, 2014 4:32 PM, "Andrew Montalenti" <[email protected]> wrote: > I built the v0.9.2-incubating rc-3 locally and once verifying that it worked > for our topology, pushed it into our cluster. So far, so good. > > One thing for the community to be aware of. If you try to upgrade an existing > v0.9.1-incubating or 0.8 cluster to v0.9.2-incubating, you may hit exceptions > upon nimbus/supervisor startup about stormcode.ser/stormconf.ser. > > The issue is that the new cluster will try to re-submit the topologies that > were already running before the upgrade. These will fail because Storm's > Clojure version has been upgraded from 1.4 -> 1.5, thus the serialization > formats & IDs have changed. This would be true basically if any class serial > IDs change that happen to be in these .ser files (stormconf.ser & > stormcode.ser, as defined in Storm's internal config). > > The solution is to clear out the storm data directories on your worker > nodes/nimbus nodes and restart the cluster. > > I have some open source tooling that submits topologies to the nimbus using > StormSubmitter. This upgrade also made me realize that due to the use of > serialized Java files, it is very important the StormSubmitter class used for > submitting and the running Storm cluster be precisely the same version / > classpath. I describe this more in the GH issue here: > > https://github.com/Parsely/streamparse/issues/27 > > I wonder if maybe it's worth it to consider using a less finicky > serialization format within Storm itself. Would that change be welcome as a > pull request? > > It would make it easier to script Storm clusters without consideration for > client/server Storm version mismatches, which I presume was the original > reasoning behind putting Storm functionality behind a Thrift API anyway. And > it would prevent crashed topologies during minor Storm version upgrades.
signature.asc
Description: Message signed with OpenPGP using GPGMail
