[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656293010


   Interestingly this test is failing before we see the "hanging": 
   ```
   [2020-07-09T18:17:49.110Z] [gw1] [ 88%] FAILED 
tests/python/unittest/test_profiler.py::test_profiler
   ```
   Oh right.. @leezu  and I just see your comment above. That's actually 
happening before test_profiler is failing.
   I'm looking at the newer test run and notice the following: The timeout you 
describe above (from the previous test run) is happening around 3%. Looking at 
the newer test run, I'm also seeing a timeout, however this time its happening 
at around 51%. However, I think it's still related to the dataloader. In the 
newer test run we see the same Timeout, immediately followed by the log line 
"PASSED tests/python/unittest/test_gluon_data.py::test_dataloader_context" (so 
somehow the dataloader test is still passing!?).
   ```
   [2020-07-02T19:44:30.029Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-mergesort]
 
   [2020-07-02T19:44:30.286Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.286Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.543Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] [gw1] [ 51%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] Timeout (0:20:00)!
   [2020-07-02T19:44:30.798Z] Thread 0x7fd3c0475700 (most recent call 
first):
   ...
   ...
   ...
   [2020-07-02T19:44:39.208Z] [gw0] [ 51%] PASSED 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   We see that the dataloader test was started much earlier (at around 2%):
   ```
   [2020-07-02T19:24:39.154Z] [gw0] [  2%] PASSED 
tests/python/unittest/test_gluon_data.py::test_multi_worker 
   [2020-07-02T19:24:39.154Z] 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   Maybe I can isolate this timeout somehow.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656293010


   Interestingly this test is failing before we see the "hanging": 
   ```
   [2020-07-09T18:17:49.110Z] [gw1] [ 88%] FAILED 
tests/python/unittest/test_profiler.py::test_profiler
   ```
   Oh right.. @leezu  and I just see your comment above. That's actually 
happening before test_profiler is failing.
   I'm looking at the newer test run and notice the following: The timeout you 
describe above (from the previous test run) is happening around 3%. Looking at 
the newer test run, I'm also seeing a timeout, however this time its happening 
at around 51%. However, I think it's still related to the dataloader. In the 
newer test run we see the same Timeout, immediately followed by the log line 
"PASSED tests/python/unittest/test_gluon_data.py::test_dataloader_context" (so 
somehow the dataloader test is still passing!?).
   ```
   [2020-07-02T19:44:30.029Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-mergesort]
 
   [2020-07-02T19:44:30.286Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.286Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.543Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] [gw1] [ 51%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] Timeout (0:20:00)!
   [2020-07-02T19:44:30.798Z] Thread 0x7fd3c0475700 (most recent call 
first):
   ...
   [2020-07-02T19:44:39.208Z] [gw0] [ 51%] PASSED 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   We see that the dataloader test was started much earlier (at around 2%):
   ```
   [2020-07-02T19:24:39.154Z] [gw0] [  2%] PASSED 
tests/python/unittest/test_gluon_data.py::test_multi_worker 
   [2020-07-02T19:24:39.154Z] 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   Maybe I can isolate this timeout somehow.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656293010


   Interestingly this test is failing before we see the "hanging": 
   ```
   [2020-07-09T18:17:49.110Z] [gw1] [ 88%] FAILED 
tests/python/unittest/test_profiler.py::test_profiler
   ```
   Oh right.. @leezu  and I just see your comment above. That's actually 
happening before test_profiler is failing.
   I'm looking at the newer test run and notice the following: The timeout you 
describe above (from the previous test run) is happening around 3%. Looking at 
the newer test run, I'm also seeing a timeout, however this time its happening 
at around 51%. However, I think it's still related to the dataloader. In the 
newer test run we see the same Timeout, immediately followed by the log line 
"PASSED tests/python/unittest/test_gluon_data.py::test_dataloader_context" (so 
somehow the dataloader test is still passing).
   ```
   [2020-07-02T19:44:30.029Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-mergesort]
 
   [2020-07-02T19:44:30.286Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.286Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.543Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] [gw1] [ 51%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] Timeout (0:20:00)!
   [2020-07-02T19:44:30.798Z] Thread 0x7fd3c0475700 (most recent call 
first):
   ...
   [2020-07-02T19:44:39.208Z] [gw0] [ 51%] PASSED 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   We see that the dataloader test was started much earlier (at around 2%):
   ```
   [2020-07-02T19:24:39.154Z] [gw0] [  2%] PASSED 
tests/python/unittest/test_gluon_data.py::test_multi_worker 
   [2020-07-02T19:24:39.154Z] 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   Maybe I can isolate this timeout somehow.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656293010


   Interestingly this test is failing before we see the "hanging": 
   ```
   [2020-07-09T18:17:49.110Z] [gw1] [ 88%] FAILED 
tests/python/unittest/test_profiler.py::test_profiler
   ```
   Oh right.. @leezu  and I just see your comment above. That's actually 
happening before test_profiler is failing.
   I'm looking at the newer test run and notice the following: The timeout you 
describe above (from the previous test run) is happening around 3%. Looking at 
the newer test run, I'm also seeing a timeout, however this time its happening 
at around 51%. However, I think it's still related to the dataloader. In the 
newer test run we see the same Timeout followed by the log line "PASSED 
tests/python/unittest/test_gluon_data.py::test_dataloader_context" (so somehow 
the dataloader test is still passing).
   ```
   [2020-07-02T19:44:30.029Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-mergesort]
 
   [2020-07-02T19:44:30.286Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.286Z] [gw1] [ 50%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape15-heapsort]
 
   [2020-07-02T19:44:30.543Z] 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] [gw1] [ 51%] PASSED 
tests/python/unittest/test_numpy_op.py::test_np_sort[True-int32-shape16-quicksort]
 
   [2020-07-02T19:44:30.798Z] Timeout (0:20:00)!
   [2020-07-02T19:44:30.798Z] Thread 0x7fd3c0475700 (most recent call 
first):
   ...
   [2020-07-02T19:44:39.208Z] [gw0] [ 51%] PASSED 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   We see that the dataloader test was started much earlier (at around 2%):
   ```
   [2020-07-02T19:24:39.154Z] [gw0] [  2%] PASSED 
tests/python/unittest/test_gluon_data.py::test_multi_worker 
   [2020-07-02T19:24:39.154Z] 
tests/python/unittest/test_gluon_data.py::test_dataloader_context 
   ```
   Maybe I can isolate this timeout somehow.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656293010


   Interestingly this test is failing before we see the "hanging": 
   ```
   [2020-07-09T18:17:49.110Z] [gw1] [ 88%] FAILED 
tests/python/unittest/test_profiler.py::test_profiler
   ```
   Oh right.. and I just see your comment above. That's actually happening 
before test_profiler is failing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656232421


   Seems like the unit tests in the unix-cpu job are failing at this point
   ```
   [2020-07-02T19:59:32.830Z] 
tests/python/unittest/test_profiler.py::test_gpu_memory_profiler_gluon 
   [2020-07-02T19:59:32.830Z] [gw0] [ 89%] SKIPPED 
tests/python/unittest/test_profiler.py::test_gpu_memory_profiler_gluon 
   [2020-07-02T19:59:32.830Z] 
tests/python/unittest/test_recordio.py::test_recordio 
   [2020-07-02T22:59:39.221Z] Sending interrupt signal to process
   [2020-07-02T22:59:44.185Z] 2020-07-02 22:59:39,244 - root - WARN
   ```
   Trying to reproduce it locally. 
   Note: comparing this with the same job running on python 3.6, I see that the 
same test is running much later (at 99% completion compared to 89% in this job):
   ```
   [2020-07-08T22:31:40.815Z] 
tests/python/unittest/test_profiler.py::test_gpu_memory_profiler_gluon 
   [2020-07-08T22:31:40.815Z] [gw3] [ 99%] SKIPPED 
tests/python/unittest/test_profiler.py::test_gpu_memory_profiler_gluon 
   [2020-07-08T22:31:40.815Z] 
tests/python/unittest/test_recordio.py::test_recordio 
   [2020-07-08T22:31:40.815Z] [gw3] [ 99%] PASSED 
tests/python/unittest/test_recordio.py::test_recordio 
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-02 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-653149747


   Thanks @leezu, I think I found the underlying cause of the test failure in 
unittest/onnx/test_node.py::TestNode::test_import_export. In onnx 1.7, the 
input of the Pad operator has changed. We can see this by comparing 
https://github.com/onnx/onnx/blob/master/docs/Operators.md#Pad to 
https://github.com/onnx/onnx/blob/master/docs/Changelog.md#Pad-1. I believe I 
can fix this test and I'm working on that now. However the same test will not 
pass with onnx 1.5 anymore after that (but at least we know how to fix it, I 
guess). I assume the stacktrace you posted above from the unrelated cd job 
probably has some similar cause. Besides that I'm trying to make pylint happy.. 
but that's the smaller issue I think.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-02 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-653149747


   Thanks @leezu, I think I found the underlying cause of the test failure in 
unittest/onnx/test_node.py::TestNode::test_import_export. In onnx 1.7, the 
input of the Pad operator has changed. We can see this by comparing 
https://github.com/onnx/onnx/blob/master/docs/Operators.md#Pad to 
https://github.com/onnx/onnx/blob/master/docs/Changelog.md#Pad-1. I believe I 
can fix this test and I'm working on that now. However the same test will not 
pass with onnx 1.5 anymore after that (but at least we know how to fix it, I 
guess). I assume the stacktrace you posted above from the unrelated cd job 
probably has some similar root cause. Besides that I'm trying to make pylint 
happy.. but that's the smaller issue I think.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-24 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-649099499


   @Roshrini I found an issue when updating onnx from 1.5.0 to 1.7.0. The issue 
can be reproduced with python 3.6. The following code reproduces the issue. Do 
you have any idea what's going on?
   ```
   import numpy as np
   from onnx import TensorProto
   from onnx import helper
   from onnx import mapping
   from mxnet.contrib.onnx.onnx2mx.import_onnx import GraphProto
   from mxnet.contrib.onnx.mx2onnx.export_onnx import MXNetGraph
   import mxnet as mx
   
   inputshape = (2, 3, 20, 20)
   input_tensor = [helper.make_tensor_value_info("input1", TensorProto.FLOAT, 
shape = inputshape)]
   
   outputshape = (2, 3, 17, 16)
   output_tensor = [helper.make_tensor_value_info("output", TensorProto.FLOAT, 
shape=outputshape)]
   
   onnx_attrs = {'kernel_shape': (4, 5), 'pads': (0, 0), 'strides': (1, 1), 
'p': 1}
   nodes = [helper.make_node("LpPool", ["input1"], ["output"], **onnx_attrs)]
   
   graph = helper.make_graph(nodes, "test_lppool1", input_tensor, output_tensor)
   
   onnxmodel = helper.make_model(graph)
   
   graph = GraphProto()
   
   ctx = mx.cpu()
   
   sym, arg_params, aux_params = graph.from_onnx(onnxmodel.graph)
   
   metadata = graph.get_graph_metadata(onnxmodel.graph)
   input_data = metadata['input_tensor_data']
   input_shape = [data[1] for data in input_data]
   
   """ Import ONNX model to mxnet model and then export to ONNX model
   and then import it back to mxnet for verifying the result"""
   
   params = {}
   params.update(arg_params)
   params.update(aux_params)
   converter = MXNetGraph()
   
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape, 
in_type=mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype('float32')])
   ```
   The line that is throwing the error is:
   ```
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape, 
in_type=mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype('float32')])
   ```
   The error I'm seeing is:
   ```
 File 
"/opt/anaconda3/envs/p36/lib/python3.6/site-packages/onnx/checker.py", line 54, 
in checker
   proto.SerializeToString(), ctx)
   onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has input size 
1 not in range [min=2, max=3].
   
   ==> Context: Bad node spec: input: "input1" output: "pad0" name: "pad0" 
op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } attribute 
{ name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 
type: INTS } attribute { name: "value" f: 0 type: FLOAT }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-24 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-649099499


   @Roshrini I found an issue when updating onnx from 1.5.0 to 1.7.0. The issue 
can be reproduced with python 3.6. The following core reproduces the issue. Do 
you have any idea what's going on?
   ```
   import numpy as np
   from onnx import TensorProto
   from onnx import helper
   from onnx import mapping
   from mxnet.contrib.onnx.onnx2mx.import_onnx import GraphProto
   from mxnet.contrib.onnx.mx2onnx.export_onnx import MXNetGraph
   import mxnet as mx
   
   inputshape = (2, 3, 20, 20)
   input_tensor = [helper.make_tensor_value_info("input1", TensorProto.FLOAT, 
shape = inputshape)]
   
   outputshape = (2, 3, 17, 16)
   output_tensor = [helper.make_tensor_value_info("output", TensorProto.FLOAT, 
shape=outputshape)]
   
   onnx_attrs = {'kernel_shape': (4, 5), 'pads': (0, 0), 'strides': (1, 1), 
'p': 1}
   nodes = [helper.make_node("LpPool", ["input1"], ["output"], **onnx_attrs)]
   
   graph = helper.make_graph(nodes, "test_lppool1", input_tensor, output_tensor)
   
   onnxmodel = helper.make_model(graph)
   
   graph = GraphProto()
   
   ctx = mx.cpu()
   
   sym, arg_params, aux_params = graph.from_onnx(onnxmodel.graph)
   
   metadata = graph.get_graph_metadata(onnxmodel.graph)
   input_data = metadata['input_tensor_data']
   input_shape = [data[1] for data in input_data]
   
   """ Import ONNX model to mxnet model and then export to ONNX model
   and then import it back to mxnet for verifying the result"""
   
   params = {}
   params.update(arg_params)
   params.update(aux_params)
   converter = MXNetGraph()
   
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape, 
in_type=mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype('float32')])
   ```
   The line that is throwing the error is:
   ```
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape, 
in_type=mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype('float32')])
   ```
   The error I'm seeing is:
   ```
 File 
"/opt/anaconda3/envs/p36/lib/python3.6/site-packages/onnx/checker.py", line 54, 
in checker
   proto.SerializeToString(), ctx)
   onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has input size 
1 not in range [min=2, max=3].
   
   ==> Context: Bad node spec: input: "input1" output: "pad0" name: "pad0" 
op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } attribute 
{ name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 
type: INTS } attribute { name: "value" f: 0 type: FLOAT }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-14 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-643807772


   @leezu I think I got to a state where I can run the unittests with python3.8 
and reproduce what is described in the issue ticket 
https://github.com/apache/incubator-mxnet/issues/18380
   I ignored the lint errors for now by adding disable flags to the pylintrc 
file.
   
   As described in https://github.com/apache/incubator-mxnet/issues/18380 we're 
seeing an issue related to the usage of time.clock(). Besides that, I found the 
following issue:
   
   The test 
tests/python/unittest/onnx/test_node.py::TestNode::test_import_export seems to 
fail. In the Jenkins job I don't see the error, but when running the test 
locally with python3.8 and onnx 1.7, I'm getting:
   
   ```
   >   bkd_rep = backend.prepare(onnxmodel, operation='export', 
backend='mxnet')
   
   tests/python/unittest/onnx/test_node.py:164:
   tests/python/unittest/onnx/backend.py:104: in prepare
   sym, arg_params, aux_params = MXNetBackend.perform_import_export(sym, 
arg_params, aux_params,
   tests/python/unittest/onnx/backend.py:62: in perform_import_export
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape,
   python/mxnet/contrib/onnx/mx2onnx/export_onnx.py:308: in 
create_onnx_graph_proto
   ...
   E   onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has 
input size 1 not in range [min=2, max=3].
   E 
   E   ==> Context: Bad node spec: input: "input1" output: "pad0" name: 
"pad0" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } 
attribute { name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 in
   ts: 0 ints: 0 ints: 0 ints: 0 type: INTS } attribute { name: "value" f: 0 
type: FLOAT }
   
   ../../../Library/Python/3.8/lib/python/site-packages/onnx/checker.py:53: 
ValidationError
   ```
   
   I believe this is an issue in onnx 1.7 as it looks exactly like 
https://github.com/onnx/onnx/issues/2548.
   
   I also found that the test job Python3: MKL-CPU is not running through which 
seems to be due to a Timeout. I believe this is happening in test 
tests/python/conftest.py but the log output is not telling me what exactly goes 
wrong here. Do you have any idea how to reproduce this locally, or to get 
better insight into that failure?
   
   I will now look into the following:
   - can I work around the onnx 1.7 related issue?
   - even after aligning pylint and astroid, I'm seeing unexpected linting 
errors. The linter is telling me that ndarray is a bad class name. Why is that 
happening?
   
   Btw. I believe that you and @marcoabreu are getting pinged every time I do a 
commit to this PR. I don't think this PR is actually in a reviewable state yet 
but the purpose of it is more to see what's breaking when upgrading to 
Python3.8.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-14 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-643807772


   @leezu I think I got to a state where I can run the unittests with python3.8 
and reproduce what is described in the issue ticket 
https://github.com/apache/incubator-mxnet/issues/18380
   I ignored the lint errors for now by adding disable flags to the pylintrc 
file.
   
   As described in https://github.com/apache/incubator-mxnet/issues/18380 we're 
seeing an issue related to the usage of time.clock(). Besides that, I found the 
following issue:
   
   The test 
tests/python/unittest/onnx/test_node.py::TestNode::test_import_export seems to 
fail. In the Jenkins job I don't see the error, but when running the test 
locally with python3.8 and onnx 1.7, I'm getting:
   
   ```
   >   bkd_rep = backend.prepare(onnxmodel, operation='export', 
backend='mxnet')
   
   tests/python/unittest/onnx/test_node.py:164:
   tests/python/unittest/onnx/backend.py:104: in prepare
   sym, arg_params, aux_params = MXNetBackend.perform_import_export(sym, 
arg_params, aux_params,
   tests/python/unittest/onnx/backend.py:62: in perform_import_export
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape,
   python/mxnet/contrib/onnx/mx2onnx/export_onnx.py:308: in 
create_onnx_graph_proto
   ...
   E   onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has 
input size 1 not in range [min=2, max=3].
   E 
   E   ==> Context: Bad node spec: input: "input1" output: "pad0" name: 
"pad0" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } 
attribute { name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 in
   ts: 0 ints: 0 ints: 0 ints: 0 type: INTS } attribute { name: "value" f: 0 
type: FLOAT }
   
   ../../../Library/Python/3.8/lib/python/site-packages/onnx/checker.py:53: 
ValidationError
   ```
   
   I believe this is an issue in onnx 1.7 as it looks exactly like 
https://github.com/onnx/onnx/issues/2548.
   
   I also found that the test job Python3: MKL-CPU is not running through which 
seems to be due to a Timeout. I believe this is happening in test 
tests/python/conftest.py but the log output is not telling me which test is 
failing and I can run the test successfully locally. Do you have any idea how 
to reproduce this locally, or to get better insight into that failure?
   
   I will now look into the following:
   - can I work around the onnx 1.7 related issue?
   - even after aligning pylint and astroid, I'm seeing unexpected linting 
errors. The linter is telling me that ndarray is a bad class name. Why is that 
happening?
   
   Btw. I believe that you and @marcoabreu are getting pinged every time I do a 
commit to this PR. I don't think this PR is actually in a reviewable state yet 
but the purpose of it is more to see what's breaking when upgrading to 
Python3.8.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-14 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-643807772


   @leezu I think I got to a state where I can run the unittests with python3.8 
and reproduce what is described in the issue ticket 
https://github.com/apache/incubator-mxnet/issues/18380
   I ignored the lint errors for now by adding disable flags to the pylintrc 
file.
   
   As described in https://github.com/apache/incubator-mxnet/issues/18380 we're 
seeing an issue related to the usage of time.clock(). Besides that, I found the 
following issue:
   
   The test 
tests/python/unittest/onnx/test_node.py::TestNode::test_import_export seems to 
fail. In the Jenkins job I don't see the error, but when running the test 
locally with python3.8 and onnx 1.7, I'm getting:
   
   ```
   >   bkd_rep = backend.prepare(onnxmodel, operation='export', 
backend='mxnet')
   
   tests/python/unittest/onnx/test_node.py:164:
   tests/python/unittest/onnx/backend.py:104: in prepare
   sym, arg_params, aux_params = MXNetBackend.perform_import_export(sym, 
arg_params, aux_params,
   tests/python/unittest/onnx/backend.py:62: in perform_import_export
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape,
   python/mxnet/contrib/onnx/mx2onnx/export_onnx.py:308: in 
create_onnx_graph_proto
   ...
   E   onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has 
input size 1 not in range [min=2, max=3].
   E 
   E   ==> Context: Bad node spec: input: "input1" output: "pad0" name: 
"pad0" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } 
attribute { name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 in
   ts: 0 ints: 0 ints: 0 ints: 0 type: INTS } attribute { name: "value" f: 0 
type: FLOAT }
   
   ../../../Library/Python/3.8/lib/python/site-packages/onnx/checker.py:53: 
ValidationError
   ```
   
   I believe this is an issue in onnx 1.7 as it looks exactly like 
https://github.com/onnx/onnx/issues/2548.
   
   I also found that the test job Python3: MKL-CPU is not running through which 
seems to be due to a Timeout. I believe this is happening in test 
tests/python/conftest.py but the log output is not telling me which test is 
failing and I can run the test successfully locally. Do you have any idea how 
to reproduce this locally, or to get better insight into that failure?
   
   I will now look into the following:
   - can I work around the onnx 1.7 related issue?
   - even after aligning pylint and astroid, I'm seeing unexpected linting 
errors. The linter is telling me that ndarray is a bad class name. Why is that 
happening?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-05 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-639837099


   In the build job ci/jenkins/mxnet-validation/unix-cpu the following command 
was previously failing:
   `
   ci/build.py --docker-registry mxnetci --platform ubuntu_cpu 
--docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh sanity_check
   `
   I was able to reproduce the issue locally and I fixed it by making 
additional changes to ci/docker/Dockerfile.build.ubuntu and 
ci/docker/install/requirements. What I did in those files is the following:
   
   - making python3.8 the default python3 binary, by creating a symlink (see 
change in ci/docker/Dockerfile.build.ubuntu)
   - update requirement versions in ci/docker/install/requirements, so that 
`python3 -m pip install -r /work/requirements` in Dockerfile.build.ubuntu can 
be run successfully. I needed to update onnx, Cython and Pillow. The current 
versions were not installable via apt-get with python3.8.
   
   I was then able to build the image successfully and was able to successfully 
run the ci/build.py command I mentioned above. 
   I'm now wondering if the failure in "continuous build / macosx-x86_64" that 
I'm seeing below is already a consequence of the onnx update I made (which is 
necessary in order to update to python 3.8, which in turn is the goal of this 
PR). My question is basically: what is each of the CI jobs below doing? In 
which of the jobs below should I be able to observe the test failures? Also let 
me know if you think this is going down a wrong route and I should try 
something different.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-05 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-639837099


   In the build job ci/jenkins/mxnet-validation/unix-cpu the following command 
was previously failing:
   `
   ci/build.py --docker-registry mxnetci --platform ubuntu_cpu 
--docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh sanity_check
   `
   I was able to reproduce the issue locally and I fixed it by making 
additional changes to ci/docker/Dockerfile.build.ubuntu and 
ci/docker/install/requirements. What I did in those files is the following:
   
   - making python3.8 the default python3 binary, by creating a symlink (see 
change in ci/docker/Dockerfile.build.ubuntu)
   - update requirement versions in ci/docker/install/requirements, so that 
`python3 -m pip install -r /work/requirements` in Dockerfile.build.ubuntu can 
be run successfully. I needed to update onnx, Cython and Pillow. The current 
versions were not installable via apt-get with python3.8.
   
   I was then able to build the image successfully and was able to successfully 
run the ci/build.py command I mentioned above. 
   I'm now wondering if the failure in "continuous build / macosx-x86_64" that 
I'm seeing above is already a consequence of the onnx update I made (which is 
necessary in order to update to python 3.8, which in turn is the goal of this 
PR). My question is basically: what is each of the CI jobs below doing? In 
which of the jobs below should I be able to observe the test failures? Also let 
me know if you think this is going down a wrong route and I should try 
something different.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-05 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-639837099


   In the build job ci/jenkins/mxnet-validation/unix-cpu the following command 
was previously failing:
   `
   ci/build.py --docker-registry mxnetci --platform ubuntu_cpu 
--docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh sanity_check
   `
   I was able to reproduce the issue locally and I fixed it by making 
additional changes to ci/docker/Dockerfile.build.ubuntu and 
ci/docker/install/requirements. What I did in those files is the following:
   
   - making python3.8 the default python3 binary, by creating a symlink (see 
change in ci/docker/Dockerfile.build.ubuntu)
   - update requirement versions in ci/docker/install/requirements, so that 
`python3 -m pip install -r /work/requirements` in Dockerfile.build.ubuntu can 
be run successfully. I needed to update onnx, Cython and Pillow. The current 
versions were not installable via apt-get with python3.8.
   
   I was then able to build the image successfully and was able to successfully 
run the ci/build.py command I mentioned above. 
   I'm now wondering if the failure in "continuous build / macosx-x86_64" that 
I'm seeing above is already a consequence of the onnx update I made (which is 
necessary in order to update to python 3.8, which in turn is the goal of this 
PR). My question is basically: what is each of the CI jobs below doing? In 
which of the above jobs should I be able to observe the test failures? Also let 
me know if you think this is going down a wrong route and I should try 
something different.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei edited a comment on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-01 Thread GitBox


ma-hei edited a comment on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-637192961


   @leezu I was hoping that I could observe the test failures related to 
Python3.8 in one of the ci/jenkins/mxnet-validation build jobs. I assume those 
jobs did not run because the ci/jenkins/mxnet-validation/sanity build failed. 
Does the failure of the sanity build look related to the python3.8 update I 
made in Dockerfile.build.ubuntu to you? To me it looks like the build stalled 
at the end and was automatically killed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org