FYI, the issue happened with both zmq and netty transports. We will investigate more tomorrow. We think the issue only happens with more than one supervisor and multiple workers. On Jun 19, 2014 7:32 PM, "P. Taylor Goetz" <[email protected]> wrote:
> Hi Andrew, > > Thanks for pointing this out. I agree with your point about bit rot. > > However, we had to remove the the 0mq transport due to license > incompatibilities with Apache, so any kind of release test suite would have > to be maintained outside of Apache since it would likely pull in > LGPL-licensed dependencies. So if something like you're suggesting could be > accomplished in the storm-0mq project, that would be the best option. > > I'm open to pull requests, help, contributions, etc. to storm-0mq. It just > can't be part of Apache. > > I'll test out your changes to storm-0mq to see if I can reproduce the > issue you're seeing. As Nathan mentioned, any additional information > (thread dumps, etc.) you could provide would help. > > Thanks (and sorry for the inconvenience), > > Taylor > > > On Jun 19, 2014, at 6:09 PM, Andrew Montalenti <[email protected]> wrote: > > Another interesting 0.9.2 issue I came across: the IConnection interface > has changed, meaning any pluggable transports no longer work without a code > change. > > I implemented changes to storm-0mq to get it to be compatible with this > interface change in my fork here. > > https://github.com/Parsely/storm-0mq/compare/ptgoetz:master...master > > I tested that and it nominally works in distributed mode with two > independent workers in my cluster. Don't know what the performance impact > is of the interface change. > > I get that zmq is no longer part of storm core, but maintaining a stable > interface for pluggable components like this transport is probably > something that should be in the release test suite. Otherwise bitrot will > take its toll. I am glad to volunteer help with this. > > My team is now debugging an issue where Storm stops asking our spout for > next tuples after awhile of running the topology, causing the tool go to > basically freeze with no errors in the logs. At first blush, seems like a > regression from 0.9.1. But we'll have more detailed info once we isolate > some variables soon. > On Jun 18, 2014 4:32 PM, "Andrew Montalenti" <[email protected]> wrote: > >> I built the v0.9.2-incubating rc-3 locally and once verifying that it >> worked for our topology, pushed it into our cluster. So far, so good. >> >> One thing for the community to be aware of. If you try to upgrade an >> existing v0.9.1-incubating or 0.8 cluster to v0.9.2-incubating, you may hit >> exceptions upon nimbus/supervisor startup about stormcode.ser/stormconf.ser. >> >> The issue is that the new cluster will try to re-submit the topologies >> that were already running before the upgrade. These will fail because >> Storm's Clojure version has been upgraded from 1.4 -> 1.5, thus the >> serialization formats & IDs have changed. This would be true basically if >> any class serial IDs change that happen to be in these .ser files >> (stormconf.ser & stormcode.ser, as defined in Storm's internal config >> <https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/config.clj#L143-L153> >> ). >> >> The solution is to clear out the storm data directories on your worker >> nodes/nimbus nodes and restart the cluster. >> >> I have some open source tooling that submits topologies to the nimbus >> using StormSubmitter. This upgrade also made me realize that due to the use >> of serialized Java files >> <https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/utils/Utils.java#L73-L97>, >> it is very important the StormSubmitter class used for submitting and the >> running Storm cluster be precisely the same version / classpath. I describe >> this more in the GH issue here: >> >> https://github.com/Parsely/streamparse/issues/27 >> >> I wonder if maybe it's worth it to consider using a less finicky >> serialization format within Storm itself. Would that change be welcome as a >> pull request? >> >> It would make it easier to script Storm clusters without consideration >> for client/server Storm version mismatches, which I presume was the >> original reasoning behind putting Storm functionality behind a Thrift API >> anyway. And it would prevent crashed topologies during minor Storm version >> upgrades. >> > >
