[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-659075560 @mxnet-bot run ci [centos-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-657853880 @mxnet-bot run ci [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656293010 Interestingly this test is failing before we see the "hanging": ``` [2020-07-09T18:17:49.110Z] [gw1] [ 88%] FAILED tests/python/unittest/test_profiler.py::test_profiler ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656232421 Seems like the unit tests in the unix-cpu job are failing at this point ``` [2020-07-02T19:59:32.830Z] tests/python/unittest/test_profiler.py::test_gpu_memory_profiler_gluon [2020-07-02T19:59:32.830Z] [gw0] [ 89%] SKIPPED tests/python/unittest/test_profiler.py::test_gpu_memory_profiler_gluon [2020-07-02T19:59:32.830Z] tests/python/unittest/test_recordio.py::test_recordio [2020-07-02T22:59:39.221Z] Sending interrupt signal to process [2020-07-02T22:59:44.185Z] 2020-07-02 22:59:39,244 - root - WARN ``` Trying to reproduce it locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-656227320 Looking at the failing job ci/jenkins/mxnet-validation/unix-cpu: It seems like when running this with python 3.8, a larger number of unit tests is running: Looking at the beginning of the unit test output: ``` [2020-07-02T19:24:10.035Z] scheduling tests via LoadScheduling [2020-07-02T19:24:10.035Z] [2020-07-02T19:24:10.035Z] tests/python/unittest/test_autograd.py::test_unary_func [2020-07-02T19:24:10.035Z] tests/python/unittest/test_attr.py::test_attr_dict [2020-07-02T19:24:10.035Z] tests/python/unittest/test_attr.py::test_list_attr [2020-07-02T19:24:10.035Z] tests/python/unittest/test_attr.py::test_attr_basic [2020-07-02T19:24:10.035Z] [gw1] [ 0%] PASSED tests/python/unittest/test_attr.py::test_list_attr [2020-07-02T19:24:10.035Z] [gw2] [ 0%] PASSED tests/python/unittest/test_autograd.py::test_unary_func [2020-07-02T19:24:10.035Z] [gw3] [ 0%] PASSED tests/python/unittest/test_attr.py::test_attr_dict [2020-07-02T19:24:10.035Z] [gw0] [ 0%] PASSED tests/python/unittest/test_attr.py::test_attr_basic [2020-07-02T19:24:10.035Z] tests/python/unittest/test_autograd.py::test_argnum [2020-07-02T19:24:10.035Z] tests/python/unittest/test_autograd.py::test_out_grads [2020-07-02T19:24:10.035Z] tests/python/unittest/test_autograd.py::test_training [2020-07-02T19:24:10.035Z] [gw2] [ 0%] PASSED tests/python/unittest/test_autograd.py::test_out_grads ``` Now I compare this with a passing test run with python 3.6 (from some unrelated recent PR I randomly picked), the output looks a follows: ``` [2020-07-06T17:54:08.924Z] scheduling tests via LoadScheduling [2020-07-06T17:54:08.924Z] [2020-07-06T17:54:08.924Z] tests/python/unittest/test_contrib_autograd.py::test_operator_with_state [2020-07-06T17:54:08.924Z] tests/python/unittest/test_attr.py::test_operator [2020-07-06T17:54:08.924Z] tests/python/unittest/test_autograd.py::test_operator_with_state [2020-07-06T17:54:08.924Z] tests/python/unittest/test_operator.py::test_RNN_float64 [2020-07-06T17:54:08.924Z] [gw1] [ 0%] PASSED tests/python/unittest/test_attr.py::test_operator [2020-07-06T17:54:08.924Z] [gw3] [ 0%] PASSED tests/python/unittest/test_contrib_autograd.py::test_operator_with_state [2020-07-06T17:54:08.924Z] [gw0] [ 1%] PASSED tests/python/unittest/test_autograd.py::test_operator_with_state [2020-07-06T17:54:08.924Z] [gw2] [ 1%] PASSED tests/python/unittest/test_operator.py::test_RNN_float64 ``` Possibly thats the reason of the timeout.. looking into it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-653323761 I see that the discussion above regarding the failing test unittest/onnx/test_node.py::TestNode::test_import_export is now obsolete since this test got removed with commit fb73de7582de4e622299a4ad045e25f771568193 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-653149747 Thanks @leezu, I think I found the underlying cause of the test failure in unittest/onnx/test_node.py::TestNode::test_import_export. In onnx 1.7, the input of the Pad operator has changed. We can see this by comparing https://github.com/onnx/onnx/blob/master/docs/Operators.md#Pad to https://github.com/onnx/onnx/blob/master/docs/Changelog.md#Pad-1. I believe I can fix this test and I'm working on that now. However the same test will not pass with onnx 1.5 anymore after that (but at least we know how to fix it, I guess). I assume the stacktrace you posted above from the unrelated cd job probably has some similar cause. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-651616085 here's whats going on with onnx 1.7: https://github.com/onnx/onnx/issues/2865 We just need to use the newer way of instantiating a Pad node. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-649099499 @Roshrini I found an issue when updating onnx from 1.5.0 to 1.7.0. The issue can be reproduced with python 3.6. The following core reproduces the issue. Do you have any idea what's going on? ` import numpy as np from onnx import TensorProto from onnx import helper from onnx import mapping from mxnet.contrib.onnx.onnx2mx.import_onnx import GraphProto from mxnet.contrib.onnx.mx2onnx.export_onnx import MXNetGraph import mxnet as mx inputshape = (2, 3, 20, 20) input_tensor = [helper.make_tensor_value_info("input1", TensorProto.FLOAT, shape = inputshape)] outputshape = (2, 3, 17, 16) output_tensor = [helper.make_tensor_value_info("output", TensorProto.FLOAT, shape=outputshape)] onnx_attrs = {'kernel_shape': (4, 5), 'pads': (0, 0), 'strides': (1, 1), 'p': 1} nodes = [helper.make_node("LpPool", ["input1"], ["output"], **onnx_attrs)] graph = helper.make_graph(nodes, "test_lppool1", input_tensor, output_tensor) onnxmodel = helper.make_model(graph) graph = GraphProto() ctx = mx.cpu() sym, arg_params, aux_params = graph.from_onnx(onnxmodel.graph) metadata = graph.get_graph_metadata(onnxmodel.graph) input_data = metadata['input_tensor_data'] input_shape = [data[1] for data in input_data] """ Import ONNX model to mxnet model and then export to ONNX model and then import it back to mxnet for verifying the result""" params = {} params.update(arg_params) params.update(aux_params) converter = MXNetGraph() graph_proto = converter.create_onnx_graph_proto(sym, params, in_shape=input_shape, in_type=mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype('float32')]) ` The line that is throwing the error is: ` graph_proto = converter.create_onnx_graph_proto(sym, params, in_shape=input_shape, in_type=mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype('float32')]) ` The error I'm seeing is: ` File "/opt/anaconda3/envs/p36/lib/python3.6/site-packages/onnx/checker.py", line 54, in checker proto.SerializeToString(), ctx) onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has input size 1 not in range [min=2, max=3]. ==> Context: Bad node spec: input: "input1" output: "pad0" name: "pad0" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } attribute { name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 type: INTS } attribute { name: "value" f: 0 type: FLOAT } ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-643807772 @leezu I think I got to a state where I can run the unittests with python3.8 and reproduce what is described in the issue ticket https://github.com/apache/incubator-mxnet/issues/18380 I ignored the lint errors for now by adding disable flags to the pylintrc file. As described in https://github.com/apache/incubator-mxnet/issues/18380 we're seeing an issue related to the usage of time.clock(). Besides that, I found the following issue: The test tests/python/unittest/onnx/test_node.py::TestNode::test_import_export seems to fail. In the Jenkins job I don't see the error, but when running the test locally with python3.8 and onnx 1.7, I'm getting: `> bkd_rep = backend.prepare(onnxmodel, operation='export', backend='mxnet') tests/python/unittest/onnx/test_node.py:164: tests/python/unittest/onnx/backend.py:104: in prepare sym, arg_params, aux_params = MXNetBackend.perform_import_export(sym, arg_params, aux_params, tests/python/unittest/onnx/backend.py:62: in perform_import_export graph_proto = converter.create_onnx_graph_proto(sym, params, in_shape=input_shape, python/mxnet/contrib/onnx/mx2onnx/export_onnx.py:308: in create_onnx_graph_proto ... E onnx.onnx_cpp2py_export.checker.ValidationError: Node (pad0) has input size 1 not in range [min=2, max=3]. E E ==> Context: Bad node spec: input: "input1" output: "pad0" name: "pad0" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING } attribute { name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 in ts: 0 ints: 0 ints: 0 ints: 0 type: INTS } attribute { name: "value" f: 0 type: FLOAT } ../../../Library/Python/3.8/lib/python/site-packages/onnx/checker.py:53: ValidationError` I believe this is an issue in onnx 1.7 as it looks exactly like https://github.com/onnx/onnx/issues/2548. I also found that the test job Python3: MKL-CPU is not running through which seems to be due to a Timeout. I believe this is happening in test tests/python/conftest.py but the log output is not telling me which test is failing and I can run the test successfully locally. Do you have any idea how to reproduce this locally, or to get better insight into that failure? I will now look into the following: - can I work around the onnx 1.7 related issue? - even after aligning pylint and astroid, I'm seeing unexpected linting errors. The linter is telling me that ndarray is a bad class name. Why is that happening? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-643709568 @mxnet-bot run ci [all] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-639837099 In the build job ci/jenkins/mxnet-validation/unix-cpu the following command was previously failing: ` ci/build.py --docker-registry mxnetci --platform ubuntu_cpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh sanity_check ` I was able to reproduce the issue locally and I fixed it by making additional changes to ci/docker/Dockerfile.build.ubuntu and ci/docker/install/requirements. What I did in those files is the following: - making python3.8 the default python3 binary, by creating a symlink (see change in ci/docker/Dockerfile.build.ubuntu) - update requirement versions in ci/docker/install/requirements, so that `python3 -m pip install -r /work/requirements` in Dockerfile.build.ubuntu can be run successfully. I needed to update onnx, Cython and Pillow. The current versions were not installable via apt-get with python3.8. I was then able to build the image successfully and was able to successfully run the ci/build.py command I mentioned above. I'm now wondering if the failure in "continuous build / macosx-x86_64" that I'm seeing above is already a consequence of the onnx update I made (which is necessary in order to update to python 3.8, which in turn is the goal of this PR). My question is basically: what is each of the jobs above doing? In which of the above jobs should I be able to observe the test failures? Also let me know if you think this is going down a wrong route and I should try something different. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-637192961 @leezu I was hoping that I could observe the test failures in one of the ci/jenkins/mxnet-validation build jobs. I assume those jobs did not run because the ci/jenkins/mxnet-validation/sanity build failed. Does the failure of the sanity build look related to the python3.8 update I made in Dockerfile.build.ubuntu to you? To me it looks like the build stalled at the end and was automatically killed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-637189523 @mxnet-bot run ci [all] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-637189325 @mxnet-bot run ci all This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ma-hei commented on pull request #18445: updating ubuntu_cpu base image to 20.04 to observe failing tests regarding Python 3.8
ma-hei commented on pull request #18445: URL: https://github.com/apache/incubator-mxnet/pull/18445#issuecomment-636997952 @leezu When using Ubuntu 20.04 as the base image of Docker.build.ubuntu, I found some issues with the apt-get installation of packages such as clang-10 and doxygen. To solve the problem at hand, I went back to Ubuntu 18.04 but instead of installing python3 (in Docker.build.ubuntu), I'm installing python3.8. The image builds successfully locally. I have two questions: - Will the CI tests of this pull request start automatically after some time? - After fixing the issues with python 3.8, do we actually want to upgrade to Ubuntu 20.04 in the base image or should this be a separate pull request? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org