A stack dump of all workers would be useful in the case of a topology freeze.
On Thu, Jun 19, 2014 at 3:52 PM, Nathan Marz <[email protected]> wrote: > There were a bunch of changes to the internals, so a regression is > certainly possible. Let us know as many details as possible if you are able > to reproduce it. > > > On Thu, Jun 19, 2014 at 3:09 PM, Andrew Montalenti <[email protected]> > wrote: > >> Another interesting 0.9.2 issue I came across: the IConnection interface >> has changed, meaning any pluggable transports no longer work without a code >> change. >> >> I implemented changes to storm-0mq to get it to be compatible with this >> interface change in my fork here. >> >> https://github.com/Parsely/storm-0mq/compare/ptgoetz:master...master >> >> I tested that and it nominally works in distributed mode with two >> independent workers in my cluster. Don't know what the performance impact >> is of the interface change. >> >> I get that zmq is no longer part of storm core, but maintaining a stable >> interface for pluggable components like this transport is probably >> something that should be in the release test suite. Otherwise bitrot will >> take its toll. I am glad to volunteer help with this. >> >> My team is now debugging an issue where Storm stops asking our spout for >> next tuples after awhile of running the topology, causing the tool go to >> basically freeze with no errors in the logs. At first blush, seems like a >> regression from 0.9.1. But we'll have more detailed info once we isolate >> some variables soon. >> On Jun 18, 2014 4:32 PM, "Andrew Montalenti" <[email protected]> wrote: >> >>> I built the v0.9.2-incubating rc-3 locally and once verifying that it >>> worked for our topology, pushed it into our cluster. So far, so good. >>> >>> One thing for the community to be aware of. If you try to upgrade an >>> existing v0.9.1-incubating or 0.8 cluster to v0.9.2-incubating, you may hit >>> exceptions upon nimbus/supervisor startup about stormcode.ser/stormconf.ser. >>> >>> The issue is that the new cluster will try to re-submit the topologies >>> that were already running before the upgrade. These will fail because >>> Storm's Clojure version has been upgraded from 1.4 -> 1.5, thus the >>> serialization formats & IDs have changed. This would be true basically if >>> any class serial IDs change that happen to be in these .ser files >>> (stormconf.ser & stormcode.ser, as defined in Storm's internal config >>> <https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/config.clj#L143-L153> >>> ). >>> >>> The solution is to clear out the storm data directories on your worker >>> nodes/nimbus nodes and restart the cluster. >>> >>> I have some open source tooling that submits topologies to the nimbus >>> using StormSubmitter. This upgrade also made me realize that due to the use >>> of serialized Java files >>> <https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/utils/Utils.java#L73-L97>, >>> it is very important the StormSubmitter class used for submitting and the >>> running Storm cluster be precisely the same version / classpath. I describe >>> this more in the GH issue here: >>> >>> https://github.com/Parsely/streamparse/issues/27 >>> >>> I wonder if maybe it's worth it to consider using a less finicky >>> serialization format within Storm itself. Would that change be welcome as a >>> pull request? >>> >>> It would make it easier to script Storm clusters without consideration >>> for client/server Storm version mismatches, which I presume was the >>> original reasoning behind putting Storm functionality behind a Thrift API >>> anyway. And it would prevent crashed topologies during minor Storm version >>> upgrades. >>> >> > > > -- > Twitter: @nathanmarz > http://nathanmarz.com > -- Twitter: @nathanmarz http://nathanmarz.com
