[GitHub] storm issue #2363: STORM-2759: Let users indicate if a blob should restart a...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2363 @revans2 Thanks. I opened https://issues.apache.org/jira/browse/STORM-2809. When I have some time I'll probably look at this too, to see if I can figure out what's going on. Regarding the 3 minutes of doing nothing, the tests also kill the topologies if nothing gets emitted within 180 seconds, so I'm not sure if the supervisor timeout is related. I'll poke at it some more. ---
[GitHub] storm issue #2363: STORM-2759: Let users indicate if a blob should restart a...
Github user revans2 commented on the issue: https://github.com/apache/storm/pull/2363 @srdo 3 mins would correspond to a timeout that we have when the supervisor gets confused. https://github.com/apache/storm/blob/7afd6fbe4603e35114a84e836b02484fe8cda660/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedBlob.java#L238 If you want to file a JIRA and assign it to me I will take a look at it and see if I can reliably reproduce it. If not I may ask for your help, because I have not see this yet. But I also don't typically run the integration tests, I just let travis do it for me. ---
[GitHub] storm issue #2363: STORM-2759: Let users indicate if a blob should restart a...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2363 @revans2 I think something in this PR is causing topology deployment to either fail or be really slow occasionally. The integration test has been failing fairly consistently since cef450064fa20e2194ef3f51a21c8e6693a285e3. I tried running the test outside a VM with a locally installed Storm setup, and it has failed every time for me. Most runs seem to fail in ways that make it look like the integration test is just flaky (e.g. tuple windows not matching the calculated window), but in at least a few tests I saw the topology get submitted to Nimbus followed by about 3 minutes of nothing happening. The workers never started and the supervisor didn't seem aware of the scheduling. The only evidence that the topology was submitted was in the Nimbus log. This still happens even if the test topologies are killed with a timeout of 0, so there should be slots free for the next test immediately. I tried reverting cef450064fa20e2194ef3f51a21c8e6693a285e3 and it seems to make the integration test pass much more often. Over 5 runs there was still an instance of a supervisor failing to start the workers, but the other 4 passed. ---
[GitHub] storm issue #2363: STORM-2759: Let users indicate if a blob should restart a...
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/2363 OK. Thanks for quick addressing. +1 again. ---
[GitHub] storm issue #2363: STORM-2759: Let users indicate if a blob should restart a...
Github user revans2 commented on the issue: https://github.com/apache/storm/pull/2363 @HeartSaVioR I addressed your review comments. I didn't change the name of shouldLogLeader, but I added javadocs to make it clear what it does. ---
[GitHub] storm issue #2363: STORM-2759: Let users indicate if a blob should restart a...
Github user revans2 commented on the issue: https://github.com/apache/storm/pull/2363 @kishorvpatil I addressed your review comments. ---
[GitHub] storm issue #2363: STORM-2759: Let users indicate if a blob should restart a...
Github user revans2 commented on the issue: https://github.com/apache/storm/pull/2363 #2345 was merged so I rebased to make it more clear the new changes. I actually delete code now :). ---