[jira] [Updated] (STORM-2809) Integration test is failing consistently and topologies sometimes fail to start workers

2017-11-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2809:
--
Labels: pull-request-available  (was: )

> Integration test is failing consistently and topologies sometimes fail to 
> start workers
> ---
>
> Key: STORM-2809
> URL: https://issues.apache.org/jira/browse/STORM-2809
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Stig Rohde Døssing
>Assignee: Robert Joseph Evans
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>
> The integration test has been failing fairly consistently since 
> https://github.com/apache/storm/pull/2363. I tried running the test outside a 
> VM with a locally installed Storm setup, and it has failed every time for me.
> Most runs seem to fail in ways that make it look like the integration test is 
> just flaky (e.g. tuple windows not matching the calculated window), but in at 
> least a few tests I saw the topology get submitted to Nimbus followed by 
> about 3 minutes of nothing happening. The workers never started and the 
> supervisor didn't seem aware of the scheduling. The only evidence that the 
> topology was submitted was in the Nimbus log. This still happens even if the 
> test topologies are killed with a timeout of 0, so there should be slots free 
> for the next test immediately.
> I tried reverting https://github.com/apache/storm/pull/2363 and it seems to 
> make the integration test pass much more often. Over 5 runs there was still 
> an instance of a supervisor failing to start the workers, but the other 4 
> passed.
> We should try to fix whatever is causing the supervisor to fail to start 
> workers, and get the integration test more stable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2809) Integration test is failing consistently and topologies sometimes fail to start workers

2017-11-13 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated STORM-2809:
---
Priority: Blocker  (was: Major)

> Integration test is failing consistently and topologies sometimes fail to 
> start workers
> ---
>
> Key: STORM-2809
> URL: https://issues.apache.org/jira/browse/STORM-2809
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Stig Rohde Døssing
>Assignee: Robert Joseph Evans
>Priority: Blocker
> Fix For: 2.0.0
>
>
> The integration test has been failing fairly consistently since 
> https://github.com/apache/storm/pull/2363. I tried running the test outside a 
> VM with a locally installed Storm setup, and it has failed every time for me.
> Most runs seem to fail in ways that make it look like the integration test is 
> just flaky (e.g. tuple windows not matching the calculated window), but in at 
> least a few tests I saw the topology get submitted to Nimbus followed by 
> about 3 minutes of nothing happening. The workers never started and the 
> supervisor didn't seem aware of the scheduling. The only evidence that the 
> topology was submitted was in the Nimbus log. This still happens even if the 
> test topologies are killed with a timeout of 0, so there should be slots free 
> for the next test immediately.
> I tried reverting https://github.com/apache/storm/pull/2363 and it seems to 
> make the integration test pass much more often. Over 5 runs there was still 
> an instance of a supervisor failing to start the workers, but the other 4 
> passed.
> We should try to fix whatever is causing the supervisor to fail to start 
> workers, and get the integration test more stable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)