Okay. Keep me posted. I still plan on looking at and testing your patch to 
storm-0mq, but probably won't get to that until early next week.

-Taylor

> On Jun 19, 2014, at 7:43 PM, Andrew Montalenti <[email protected]> wrote:
> 
> FYI, the issue happened with both zmq and netty transports. We will 
> investigate more tomorrow. We think the issue only happens with more than one 
> supervisor and multiple workers.
> 
>> On Jun 19, 2014 7:32 PM, "P. Taylor Goetz" <[email protected]> wrote:
>> Hi Andrew,
>> 
>> Thanks for pointing this out. I agree with your point about bit rot.
>> 
>> However, we had to remove the the 0mq transport due to license 
>> incompatibilities with Apache, so any kind of release test suite would have 
>> to be maintained outside of Apache since it would likely pull in 
>> LGPL-licensed dependencies. So if something like you’re suggesting could be 
>> accomplished in the storm-0mq project, that would be the best option.
>> 
>> I’m open to pull requests, help, contributions, etc. to storm-0mq. It just 
>> can’t be part of Apache.
>> 
>> I’ll test out your changes to storm-0mq to see if I can reproduce the issue 
>> you’re seeing. As Nathan mentioned, any additional information (thread 
>> dumps, etc.) you could provide would help.
>> 
>> Thanks (and sorry for the inconvenience),
>> 
>> Taylor
>> 
>> 
>>> On Jun 19, 2014, at 6:09 PM, Andrew Montalenti <[email protected]> wrote:
>>> 
>>> Another interesting 0.9.2 issue I came across: the IConnection interface 
>>> has changed, meaning any pluggable transports no longer work without a code 
>>> change.
>>> 
>>> I implemented changes to storm-0mq to get it to be compatible with this 
>>> interface change in my fork here.
>>> 
>>> https://github.com/Parsely/storm-0mq/compare/ptgoetz:master...master
>>> 
>>> I tested that and it nominally works in distributed mode with two 
>>> independent workers in my cluster. Don't know what the performance impact 
>>> is of the interface change.
>>> 
>>> I get that zmq is no longer part of storm core, but maintaining a stable 
>>> interface for pluggable components like this transport is probably 
>>> something that should be in the release test suite. Otherwise bitrot will 
>>> take its toll. I am glad to volunteer help with this.
>>> 
>>> My team is now debugging an issue where Storm stops asking our spout for 
>>> next tuples after awhile of running the topology, causing the tool go to 
>>> basically freeze with no errors in the logs. At first blush, seems like a 
>>> regression from 0.9.1. But we'll have more detailed info once we isolate 
>>> some variables soon.
>>> 
>>>> On Jun 18, 2014 4:32 PM, "Andrew Montalenti" <[email protected]> wrote:
>>>> I built the v0.9.2-incubating rc-3 locally and once verifying that it 
>>>> worked for our topology, pushed it into our cluster. So far, so good.
>>>> 
>>>> One thing for the community to be aware of. If you try to upgrade an 
>>>> existing v0.9.1-incubating or 0.8 cluster to v0.9.2-incubating, you may 
>>>> hit exceptions upon nimbus/supervisor startup about 
>>>> stormcode.ser/stormconf.ser.
>>>> 
>>>> The issue is that the new cluster will try to re-submit the topologies 
>>>> that were already running before the upgrade. These will fail because 
>>>> Storm's Clojure version has been upgraded from 1.4 -> 1.5, thus the 
>>>> serialization formats & IDs have changed. This would be true basically if 
>>>> any class serial IDs change that happen to be in these .ser files 
>>>> (stormconf.ser & stormcode.ser, as defined in Storm's internal config).
>>>> 
>>>> The solution is to clear out the storm data directories on your worker 
>>>> nodes/nimbus nodes and restart the cluster.
>>>> 
>>>> I have some open source tooling that submits topologies to the nimbus 
>>>> using StormSubmitter. This upgrade also made me realize that due to the 
>>>> use of serialized Java files, it is very important the StormSubmitter 
>>>> class used for submitting and the running Storm cluster be precisely the 
>>>> same version / classpath. I describe this more in the GH issue here:
>>>> 
>>>> https://github.com/Parsely/streamparse/issues/27
>>>> 
>>>> I wonder if maybe it's worth it to consider using a less finicky 
>>>> serialization format within Storm itself. Would that change be welcome as 
>>>> a pull request?
>>>> 
>>>> It would make it easier to script Storm clusters without consideration for 
>>>> client/server Storm version mismatches, which I presume was the original 
>>>> reasoning behind putting Storm functionality behind a Thrift API anyway. 
>>>> And it would prevent crashed topologies during minor Storm version 
>>>> upgrades.
>> 

Reply via email to