A stack dump of all workers would be useful in the case of a topology
freeze.


On Thu, Jun 19, 2014 at 3:52 PM, Nathan Marz <[email protected]> wrote:

> There were  a bunch of changes to the internals, so a regression is
> certainly possible. Let us know as many details as possible if you are able
> to reproduce it.
>
>
> On Thu, Jun 19, 2014 at 3:09 PM, Andrew Montalenti <[email protected]>
> wrote:
>
>> Another interesting 0.9.2 issue I came across: the IConnection interface
>> has changed, meaning any pluggable transports no longer work without a code
>> change.
>>
>> I implemented changes to storm-0mq to get it to be compatible with this
>> interface change in my fork here.
>>
>> https://github.com/Parsely/storm-0mq/compare/ptgoetz:master...master
>>
>> I tested that and it nominally works in distributed mode with two
>> independent workers in my cluster. Don't know what the performance impact
>> is of the interface change.
>>
>> I get that zmq is no longer part of storm core, but maintaining a stable
>> interface for pluggable components like this transport is probably
>> something that should be in the release test suite. Otherwise bitrot will
>> take its toll. I am glad to volunteer help with this.
>>
>> My team is now debugging an issue where Storm stops asking our spout for
>> next tuples after awhile of running the topology, causing the tool go to
>> basically freeze with no errors in the logs. At first blush, seems like a
>> regression from 0.9.1. But we'll have more detailed info once we isolate
>> some variables soon.
>>  On Jun 18, 2014 4:32 PM, "Andrew Montalenti" <[email protected]> wrote:
>>
>>> I built the v0.9.2-incubating rc-3 locally and once verifying that it
>>> worked for our topology, pushed it into our cluster. So far, so good.
>>>
>>> One thing for the community to be aware of. If you try to upgrade an
>>> existing v0.9.1-incubating or 0.8 cluster to v0.9.2-incubating, you may hit
>>> exceptions upon nimbus/supervisor startup about stormcode.ser/stormconf.ser.
>>>
>>> The issue is that the new cluster will try to re-submit the topologies
>>> that were already running before the upgrade. These will fail because
>>> Storm's Clojure version has been upgraded from 1.4 -> 1.5, thus the
>>> serialization formats & IDs have changed. This would be true basically if
>>> any class serial IDs change that happen to be in these .ser files
>>> (stormconf.ser & stormcode.ser, as defined in Storm's internal config
>>> <https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/config.clj#L143-L153>
>>> ).
>>>
>>> The solution is to clear out the storm data directories on your worker
>>> nodes/nimbus nodes and restart the cluster.
>>>
>>> I have some open source tooling that submits topologies to the nimbus
>>> using StormSubmitter. This upgrade also made me realize that due to the use
>>> of serialized Java files
>>> <https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/utils/Utils.java#L73-L97>,
>>> it is very important the StormSubmitter class used for submitting and the
>>> running Storm cluster be precisely the same version / classpath. I describe
>>> this more in the GH issue here:
>>>
>>> https://github.com/Parsely/streamparse/issues/27
>>>
>>> I wonder if maybe it's worth it to consider using a less finicky
>>> serialization format within Storm itself. Would that change be welcome as a
>>> pull request?
>>>
>>> It would make it easier to script Storm clusters without consideration
>>> for client/server Storm version mismatches, which I presume was the
>>> original reasoning behind putting Storm functionality behind a Thrift API
>>> anyway. And it would prevent crashed topologies during minor Storm version
>>> upgrades.
>>>
>>
>
>
> --
> Twitter: @nathanmarz
> http://nathanmarz.com
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com

Reply via email to