[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-15 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-659075560


   @mxnet-bot run ci [centos-cpu]



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-13 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-657853880


   @mxnet-bot run ci [unix-cpu]
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656293010


   Interestingly this test is failing before we see the "hanging": 
   ```
   [2020-07-09T18:17:49.110Z] [gw1] [ 88%] FAILED 
tests/python/unittest/test_profiler.py::test_profiler
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656232421


   Seems like the unit tests in the unix-cpu job are failing at this point
   ```
   [2020-07-02T19:59:32.830Z] 
tests/python/unittest/test_profiler.py::test_gpu_memory_profiler_gluon 
   [2020-07-02T19:59:32.830Z] [gw0] [ 89%] SKIPPED 
tests/python/unittest/test_profiler.py::test_gpu_memory_profiler_gluon 
   [2020-07-02T19:59:32.830Z] 
tests/python/unittest/test_recordio.py::test_recordio 
   [2020-07-02T22:59:39.221Z] Sending interrupt signal to process
   [2020-07-02T22:59:44.185Z] 2020-07-02 22:59:39,244 - root - WARN
   ```
   Trying to reproduce it locally. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-09 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656227320


   Looking at the failing job ci/jenkins/mxnet-validation/unix-cpu:
   It seems like when running this with python 3.8, a larger number of unit 
tests is running: 
   Looking at the beginning of the unit test output:
   ```
   [2020-07-02T19:24:10.035Z] scheduling tests via LoadScheduling
   [2020-07-02T19:24:10.035Z] 
   [2020-07-02T19:24:10.035Z] 
tests/python/unittest/test_autograd.py::test_unary_func 
   [2020-07-02T19:24:10.035Z] 
tests/python/unittest/test_attr.py::test_attr_dict 
   [2020-07-02T19:24:10.035Z] 
tests/python/unittest/test_attr.py::test_list_attr 
   [2020-07-02T19:24:10.035Z] 
tests/python/unittest/test_attr.py::test_attr_basic 
   [2020-07-02T19:24:10.035Z] [gw1] [  0%] PASSED 
tests/python/unittest/test_attr.py::test_list_attr 
   [2020-07-02T19:24:10.035Z] [gw2] [  0%] PASSED 
tests/python/unittest/test_autograd.py::test_unary_func 
   [2020-07-02T19:24:10.035Z] [gw3] [  0%] PASSED 
tests/python/unittest/test_attr.py::test_attr_dict 
   [2020-07-02T19:24:10.035Z] [gw0] [  0%] PASSED 
tests/python/unittest/test_attr.py::test_attr_basic 
   [2020-07-02T19:24:10.035Z] 
tests/python/unittest/test_autograd.py::test_argnum 
   [2020-07-02T19:24:10.035Z] 
tests/python/unittest/test_autograd.py::test_out_grads 
   [2020-07-02T19:24:10.035Z] 
tests/python/unittest/test_autograd.py::test_training 
   [2020-07-02T19:24:10.035Z] [gw2] [  0%] PASSED 
tests/python/unittest/test_autograd.py::test_out_grads 
   ```
   Now I compare this with a passing test run with python 3.6 (from some 
unrelated recent PR I randomly picked), the output looks a follows:
   ```
   [2020-07-06T17:54:08.924Z] scheduling tests via LoadScheduling
   [2020-07-06T17:54:08.924Z] 
   [2020-07-06T17:54:08.924Z] 
tests/python/unittest/test_contrib_autograd.py::test_operator_with_state 
   [2020-07-06T17:54:08.924Z] tests/python/unittest/test_attr.py::test_operator 
   [2020-07-06T17:54:08.924Z] 
tests/python/unittest/test_autograd.py::test_operator_with_state 
   [2020-07-06T17:54:08.924Z] 
tests/python/unittest/test_operator.py::test_RNN_float64 
   [2020-07-06T17:54:08.924Z] [gw1] [  0%] PASSED 
tests/python/unittest/test_attr.py::test_operator 
   [2020-07-06T17:54:08.924Z] [gw3] [  0%] PASSED 
tests/python/unittest/test_contrib_autograd.py::test_operator_with_state 
   [2020-07-06T17:54:08.924Z] [gw0] [  1%] PASSED 
tests/python/unittest/test_autograd.py::test_operator_with_state 
   [2020-07-06T17:54:08.924Z] [gw2] [  1%] PASSED 
tests/python/unittest/test_operator.py::test_RNN_float64 
   ```
   Possibly thats the reason of the timeout.. looking into it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-02 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-653323761


   I see that the discussion above regarding the failing test 
unittest/onnx/test_node.py::TestNode::test_import_export is now obsolete since 
this test got removed with commit fb73de7582de4e622299a4ad045e25f771568193



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-07-02 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-653149747


   Thanks @leezu, I think I found the underlying cause of the test failure in 
unittest/onnx/test_node.py::TestNode::test_import_export. In onnx 1.7, the 
input of the Pad operator has changed. We can see this by comparing 
https://github.com/onnx/onnx/blob/master/docs/Operators.md#Pad to 
https://github.com/onnx/onnx/blob/master/docs/Changelog.md#Pad-1. I believe I 
can fix this test and I'm working on that now. However the same test will not 
pass with onnx 1.5 anymore after that (but at least we know how to fix it, I 
guess). I assume the stacktrace you posted above from the unrelated cd job 
probably has some similar cause. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-30 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-651616085


   here's whats going on with onnx 1.7: https://github.com/onnx/onnx/issues/2865
   We just need to use the newer way of instantiating a Pad node.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-24 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-649099499


   @Roshrini I found an issue when updating onnx from 1.5.0 to 1.7.0. The issue 
can be reproduced with python 3.6. The following core reproduces the issue. Do 
you have any idea what's going on?
   `
   import numpy as np
   from onnx import TensorProto
   from onnx import helper
   from onnx import mapping
   from mxnet.contrib.onnx.onnx2mx.import_onnx import GraphProto
   from mxnet.contrib.onnx.mx2onnx.export_onnx import MXNetGraph
   import mxnet as mx
   
   inputshape = (2, 3, 20, 20)
   input_tensor = [helper.make_tensor_value_info("input1", TensorProto.FLOAT, 
shape = inputshape)]
   
   outputshape = (2, 3, 17, 16)
   output_tensor = [helper.make_tensor_value_info("output", TensorProto.FLOAT, 
shape=outputshape)]
   
   onnx_attrs = {'kernel_shape': (4, 5), 'pads': (0, 0), 'strides': (1, 1), 
'p': 1}
   nodes = [helper.make_node("LpPool", ["input1"], ["output"], **onnx_attrs)]
   
   graph = helper.make_graph(nodes, "test_lppool1", input_tensor, output_tensor)
   
   onnxmodel = helper.make_model(graph)
   
   graph = GraphProto()
   
   ctx = mx.cpu()
   
   sym, arg_params, aux_params = graph.from_onnx(onnxmodel.graph)
   
   metadata = graph.get_graph_metadata(onnxmodel.graph)
   input_data = metadata['input_tensor_data']
   input_shape = [data[1] for data in input_data]
   
   """ Import ONNX model to mxnet model and then export to ONNX model
   and then import it back to mxnet for verifying the result"""
   
   params = {}
   params.update(arg_params)
   params.update(aux_params)
   converter = MXNetGraph()
   
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape, 
in_type=mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype('float32')])
   `
   The line that is throwing the error is:
   `
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape, 
in_type=mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype('float32')])
   `
   The error I'm seeing is:
   `
 File 
"/opt/anaconda3/envs/p36/lib/python3.6/site-packages/onnx/checker.py", line 54, 
in checker
   proto.SerializeToString(), ctx)
   onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has input size 
1 not in range [min=2, max=3].
   
   ==> Context: Bad node spec: input: "input1" output: "pad0" name: "pad0" 
op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } attribute 
{ name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 
type: INTS } attribute { name: "value" f: 0 type: FLOAT }
   `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-14 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-643807772


   @leezu I think I got to a state where I can run the unittests with python3.8 
and reproduce what is described in the issue ticket 
https://github.com/apache/incubator-mxnet/issues/18380
   I ignored the lint errors for now by adding disable flags to the pylintrc 
file.
   
   As described in https://github.com/apache/incubator-mxnet/issues/18380 we're 
seeing an issue related to the usage of time.clock(). Besides that, I found the 
following issue:
   
   The test 
tests/python/unittest/onnx/test_node.py::TestNode::test_import_export seems to 
fail. In the Jenkins job I don't see the error, but when running the test 
locally with python3.8 and onnx 1.7, I'm getting:
   
   `>   bkd_rep = backend.prepare(onnxmodel, operation='export', 
backend='mxnet')
   
   tests/python/unittest/onnx/test_node.py:164:
   tests/python/unittest/onnx/backend.py:104: in prepare
   sym, arg_params, aux_params = MXNetBackend.perform_import_export(sym, 
arg_params, aux_params,
   tests/python/unittest/onnx/backend.py:62: in perform_import_export
   graph_proto = converter.create_onnx_graph_proto(sym, params, 
in_shape=input_shape,
   python/mxnet/contrib/onnx/mx2onnx/export_onnx.py:308: in 
create_onnx_graph_proto
   ...
   E   onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has 
input size 1 not in range [min=2, max=3].
   E 
   E   ==> Context: Bad node spec: input: "input1" output: "pad0" name: 
"pad0" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } 
attribute { name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 in
   ts: 0 ints: 0 ints: 0 ints: 0 type: INTS } attribute { name: "value" f: 0 
type: FLOAT }
   
   ../../../Library/Python/3.8/lib/python/site-packages/onnx/checker.py:53: 
ValidationError`
   
   I believe this is an issue in onnx 1.7 as it looks exactly like 
https://github.com/onnx/onnx/issues/2548.
   
   I also found that the test job Python3: MKL-CPU is not running through which 
seems to be due to a Timeout. I believe this is happening in test 
tests/python/conftest.py but the log output is not telling me which test is 
failing and I can run the test successfully locally. Do you have any idea how 
to reproduce this locally, or to get better insight into that failure?
   
   I will now look into the following:
   - can I work around the onnx 1.7 related issue?
   - even after aligning pylint and astroid, I'm seeing unexpected linting 
errors. The linter is telling me that ndarray is a bad class name. Why is that 
happening?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-13 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-643709568


   @mxnet-bot run ci [all]



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-05 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-639837099


   In the build job ci/jenkins/mxnet-validation/unix-cpu the following command 
was previously failing:
   `
   ci/build.py --docker-registry mxnetci --platform ubuntu_cpu 
--docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh sanity_check
   `
   I was able to reproduce the issue locally and I fixed it by making 
additional changes to ci/docker/Dockerfile.build.ubuntu and 
ci/docker/install/requirements. What I did in those files is the following:
   
   - making python3.8 the default python3 binary, by creating a symlink (see 
change in ci/docker/Dockerfile.build.ubuntu)
   - update requirement versions in ci/docker/install/requirements, so that 
`python3 -m pip install -r /work/requirements` in Dockerfile.build.ubuntu can 
be run successfully. I needed to update onnx, Cython and Pillow. The current 
versions were not installable via apt-get with python3.8.
   
   I was then able to build the image successfully and was able to successfully 
run the ci/build.py command I mentioned above. 
   I'm now wondering if the failure in "continuous build / macosx-x86_64" that 
I'm seeing above is already a consequence of the onnx update I made (which is 
necessary in order to update to python 3.8, which in turn is the goal of this 
PR). My question is basically: what is each of the jobs above doing? In which 
of the above jobs should I be able to observe the test failures? Also let me 
know if you think this is going down a wrong route and I should try something 
different.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-01 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-637192961


   @leezu I was hoping that I could observe the test failures in one of the 
ci/jenkins/mxnet-validation build jobs. I assume those jobs did not run because 
the ci/jenkins/mxnet-validation/sanity build failed. Does the failure of the 
sanity build look related to the python3.8 update I made in 
Dockerfile.build.ubuntu to you? To me it looks like the build stalled at the 
end and was automatically killed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-01 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-637189523


   @mxnet-bot run ci [all]



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-01 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-637189325


   @mxnet-bot run ci all



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8

2020-06-01 Thread GitBox


ma-hei commented on pull request #18445:
URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-636997952


   @leezu When using Ubuntu 20.04 as the base image of Docker.build.ubuntu, I 
found some issues with the apt-get installation of packages such as clang-10 
and doxygen. To solve the problem at hand, I went back to Ubuntu 18.04 but 
instead of installing python3 (in Docker.build.ubuntu), I'm installing 
python3.8. The image builds successfully locally. I have two questions:
   - Will the CI tests of this pull request start automatically after some time?
   - After fixing the issues with python 3.8, do we actually want to upgrade to 
Ubuntu 20.04 in the base image or should this be a separate pull request? 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org