Okay. Keep me posted. I still plan on looking at and testing your patch to storm-0mq, but probably won't get to that until early next week.
-Taylor > On Jun 19, 2014, at 7:43 PM, Andrew Montalenti <[email protected]> wrote: > > FYI, the issue happened with both zmq and netty transports. We will > investigate more tomorrow. We think the issue only happens with more than one > supervisor and multiple workers. > >> On Jun 19, 2014 7:32 PM, "P. Taylor Goetz" <[email protected]> wrote: >> Hi Andrew, >> >> Thanks for pointing this out. I agree with your point about bit rot. >> >> However, we had to remove the the 0mq transport due to license >> incompatibilities with Apache, so any kind of release test suite would have >> to be maintained outside of Apache since it would likely pull in >> LGPL-licensed dependencies. So if something like you’re suggesting could be >> accomplished in the storm-0mq project, that would be the best option. >> >> I’m open to pull requests, help, contributions, etc. to storm-0mq. It just >> can’t be part of Apache. >> >> I’ll test out your changes to storm-0mq to see if I can reproduce the issue >> you’re seeing. As Nathan mentioned, any additional information (thread >> dumps, etc.) you could provide would help. >> >> Thanks (and sorry for the inconvenience), >> >> Taylor >> >> >>> On Jun 19, 2014, at 6:09 PM, Andrew Montalenti <[email protected]> wrote: >>> >>> Another interesting 0.9.2 issue I came across: the IConnection interface >>> has changed, meaning any pluggable transports no longer work without a code >>> change. >>> >>> I implemented changes to storm-0mq to get it to be compatible with this >>> interface change in my fork here. >>> >>> https://github.com/Parsely/storm-0mq/compare/ptgoetz:master...master >>> >>> I tested that and it nominally works in distributed mode with two >>> independent workers in my cluster. Don't know what the performance impact >>> is of the interface change. >>> >>> I get that zmq is no longer part of storm core, but maintaining a stable >>> interface for pluggable components like this transport is probably >>> something that should be in the release test suite. Otherwise bitrot will >>> take its toll. I am glad to volunteer help with this. >>> >>> My team is now debugging an issue where Storm stops asking our spout for >>> next tuples after awhile of running the topology, causing the tool go to >>> basically freeze with no errors in the logs. At first blush, seems like a >>> regression from 0.9.1. But we'll have more detailed info once we isolate >>> some variables soon. >>> >>>> On Jun 18, 2014 4:32 PM, "Andrew Montalenti" <[email protected]> wrote: >>>> I built the v0.9.2-incubating rc-3 locally and once verifying that it >>>> worked for our topology, pushed it into our cluster. So far, so good. >>>> >>>> One thing for the community to be aware of. If you try to upgrade an >>>> existing v0.9.1-incubating or 0.8 cluster to v0.9.2-incubating, you may >>>> hit exceptions upon nimbus/supervisor startup about >>>> stormcode.ser/stormconf.ser. >>>> >>>> The issue is that the new cluster will try to re-submit the topologies >>>> that were already running before the upgrade. These will fail because >>>> Storm's Clojure version has been upgraded from 1.4 -> 1.5, thus the >>>> serialization formats & IDs have changed. This would be true basically if >>>> any class serial IDs change that happen to be in these .ser files >>>> (stormconf.ser & stormcode.ser, as defined in Storm's internal config). >>>> >>>> The solution is to clear out the storm data directories on your worker >>>> nodes/nimbus nodes and restart the cluster. >>>> >>>> I have some open source tooling that submits topologies to the nimbus >>>> using StormSubmitter. This upgrade also made me realize that due to the >>>> use of serialized Java files, it is very important the StormSubmitter >>>> class used for submitting and the running Storm cluster be precisely the >>>> same version / classpath. I describe this more in the GH issue here: >>>> >>>> https://github.com/Parsely/streamparse/issues/27 >>>> >>>> I wonder if maybe it's worth it to consider using a less finicky >>>> serialization format within Storm itself. Would that change be welcome as >>>> a pull request? >>>> >>>> It would make it easier to script Storm clusters without consideration for >>>> client/server Storm version mismatches, which I presume was the original >>>> reasoning behind putting Storm functionality behind a Thrift API anyway. >>>> And it would prevent crashed topologies during minor Storm version >>>> upgrades. >>
