[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-11 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-404238313 But yeah, we are regularly testing MXNet CPU on windows and don't seem to encounter many problems. I also wouldn't know much reason bes

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-11 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-404237611 I tried, what you described was exactly my observation: you run out of cuda memory if you run MXNet in parallel. I didn't know we had t

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-11 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-404230989 Every process can manage it's own GPU memory - cuda supports multi tenancy out of the box. You're probably only going to run into out o

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-11 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-404226812 My stand on that would be that our tests should all be completely self contained. What do you rgibk --

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-11 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-404226482 Yes, because everything else is undefined behaviour, right? If we design a system to be used only in a specific way, we don't "guarante

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-11 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-404101435 This is the assumption stated above: "Many design decisions are based on having a single instance of mxnet in a system. "

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-11 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-404100607 Considering the fact that MXNet is not threadsafe and the interface (CAPI) is single threaded (with sticky thread) together with the as

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-10 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-403936302 Just to clarify: The problem here seems to be function-under-test python\mxnet\gluon\model_zoo\model_store.py:get_model_file, thus it's

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-09 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-403595104 To clarify: do we expect our users to be able to run multiple instances of MXNet on the same host? ---

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-09 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-403579612 Thanks for elaborating on the home variable. That makes sense because we run as administrator on windows.

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-09 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-403579484 On production systems while running a service, we can't necessarily assume that every single request is handled in a sandbox. On Ubuntu

[GitHub] marcoabreu commented on issue #11616: Flaky test test_gluon.test_export

2018-07-09 Thread GitBox
marcoabreu commented on issue #11616: Flaky test test_gluon.test_export URL: https://github.com/apache/incubator-mxnet/issues/11616#issuecomment-403567727 To me, it looks like a test problem. It seems like we are hardcoding paths, causing problems if multiple runs are triggered in parallel