[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-08-03 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-410216354
 
 
   Happened again: 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTests_onBinaries/detail/NightlyTests_onBinaries/102/pipeline


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-07-25 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-407896798
 
 
   Great, thanks a lot for the link Thomas!
   
   Can we bring the tutorial tests into nightly and then close this issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-07-25 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-407868421
 
 
   Oh, I was under the impression that we disabled the test after adding the 
1.1s delay because it was still failing and we didn't know why because it 
should have worked due to the 1s timeout.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-07-25 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-407835774
 
 
   Okay, that makes sense. Please note that even with the increased delay we 
sometimes experienced this error. 
   
   For reproducing, please run multiple instances of the tests in parallel to 
see if the error occurs.
   
   We can easily increase the delay to more than a second if that fixes the 
problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-07-25 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-407832509
 
 
   The question is whether we are masking a problem with that sleep. It feels 
like fixing a race condition by adding a sleep.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-07-25 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-407706844
 
 
   Did you remove the thread.sleep you introduced to work around this error?
   
   The environment is the regular CI pipeline. 
   
   I'll leave it up to you, Thomas.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-07-12 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-404459433
 
 
   They are defined here: 
https://github.com/apache/incubator-mxnet/blob/master/ci/docker/runtime_functions.sh#L580


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-06-28 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-400974210
 
 
   Just to be on the safe side: please try what Thomas suggested only locally
   and don't submit these jobs to CI. This test suite causes resource
   exhaustion on our CI and running so many in parallel would basically knock
   off our instances.
   
   Thomas Delteil  schrieb am Do., 28. Juni 2018,
   11:29:
   
   > To reproduce the setup I suggest looking at the jenkins function for
   > tutorial tests.
   >
   > I had put a fix in my last PR before removing them from CI, the issue might
   > be gone already. Can someone start a few hundred runs of the tutorial tests
   > and see if it still happen? Note that they take ~25min so that could take a
   > few days.
   >
   > Actually commenting out most tests except three very fast ones might be a
   > better idea since it isn't related to a specific test, and as one simple
   > test runs in 2-3s with the jupyter kernel overhead. To know which one is
   > fast check the tutorials, some do not much like the NDArray ones.
   >
   > My current best guess is that the issue is related to the fact that the
   > ports used by jupiter internal mechanism are chosen randomly and that there
   > is linger=1000 hard-coded in the jupiter code somewhere that keep it being
   > used for 1sec. For every test there is ~1/1 chance that the same port
   > will be reused (3 ports are picked between 1-10), which makes it 1/300
   > because we have 30 tests and 1/150 because we run on python2 and python3.
   > That seems roughly consistent with the number of reports we've had, about
   > once every 150 CI runs.
   >
   > There is no easy way to set the ports to fixed deterministic value. My
   > latest fix added a non-ideal 1.1 sleep between tests. Let's see if that
   > fixed it. The above explanation might be bogus too.
   >
   > I'm on my phone in a plane and can assist more from Friday onwards.
   >
   > Thanks for looking into it @reminisce  and
   > @access2rohit !
   >
   >
   > On Wed, Jun 27, 2018, 20:45 Anirudh Subramanian 
   > wrote:
   >
   > > assigned to @reminisce  @access2rohit
   > >  is working on this.
   > >
   > > —
   > > You are receiving this because you were mentioned.
   > > Reply to this email directly, view it on GitHub
   > > <
   > 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-400864919
   > >,
   > > or mute the thread
   > > <
   > 
https://github.com/notifications/unsubscribe-auth/ADi001F-ydKxRBX2r1PYR-DJXVfJv14lks5uBBkDgaJpZM4UWPoF
   > >
   > > .
   > >
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-06-27 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-400852510
 
 
   @ThomasDelteil would you mind assisting here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-06-01 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-393971156
 
 
   I think that's what we should do. In general, this is required to prepare 
the path for parallel execution.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-06-01 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-393956875
 
 
   I'm not a fan of the delay in between notebooks anyways because it masks a 
problem. I'd propose that we now remove the delay entirely and track down all 
the issues coming from that. Otherwise, we're flaky and depend on timing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-06-01 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-393935633
 
 
   They are running concurrently. We got up to 4 containers in parallel. Why 
should it matter whether a socket is released or not? First of all, networking 
is virtualized per container, so there should be no problem on that side. 
Second, we should not require a certain port but just use a random free one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-06-01 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-393935633
 
 
   They are running concurrently. We got up to 4 containers in parallel.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] marcoabreu commented on issue #11120: Address already in use during tutorial test

2018-06-01 Thread GitBox
marcoabreu commented on issue #11120: Address already in use during tutorial 
test
URL: 
https://github.com/apache/incubator-mxnet/issues/11120#issuecomment-393794611
 
 
   @ThomasDelteil 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services