[GitHub] [incubator-mxnet] szha commented on pull request #18197: [Numpy] Add qr backward part 2 for wide matrices with m < n
szha commented on pull request #18197: URL: https://github.com/apache/incubator-mxnet/pull/18197#issuecomment-660592664 @D-Roberts we will likely need to automate it so that stale CI checks are invalidated. In the meantime, if the PR sits for a long time, feel free to ping me or any other committer to get more attention on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] TristonC edited a comment on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.
TristonC edited a comment on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660583673 Great repro script @gilbertfrancois. The CPU result seems wrong, while the GPU result seems reasonable. On your plot, the GPU running mean actually goes to 1 and running_var goes to 0 during iterations as expected. On GPU, the first running mean is 0, while the following 3 running means are 0.1, 0.19 and 0.271, which can be explained as running_mean = 0.1 * running_mean + 0.9 * previous running_mean Not sure why the running mean does not change on CPU context. We need to figure it out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] TristonC removed a comment on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.
TristonC removed a comment on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660586280 On GPU, the first running mean is 0, while the following 3 running means are 0.1, 0.19 and 0.271, which can be explained as running_mean = 0.1 * running_mean + 0.9 * previous running_mean Not sure why the running mean does not change on CPU context. We need to figure it out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660586280 On GPU, the first running mean is 0, while the following 3 running means are 0.1, 0.19 and 0.271, which can be explained as running_mean = 0.1 * running_mean + 0.9 * previous running_mean Not sure why the running mean does not change on CPU context. We need to figure it out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] TristonC edited a comment on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.
TristonC edited a comment on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660583673 Great repro script @gilbertfrancois. The CPU result seems wrong, while the GPU result seems reasonable. On your plot, the GPU running mean actually goes to 1 and running_var goes to 0 during iterations as expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] TristonC edited a comment on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.
TristonC edited a comment on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660583673 Great repro script @gilbertfrancois. The CPU result seems wrong, while the GPU result seems reasonable. On your plot, the GPU running mean actually goes to 1 and running_var goes to 1 during iterations as expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] TristonC edited a comment on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.
TristonC edited a comment on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660583673 Great repro script @gilbertfrancois. The CPU result seems wrong, while the GPU result seems reasonable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660583673 The CPU result seems wrong, while the GPU result seems more resonable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] DickJC123 opened a new issue #18756: pytest worker crash seen on newly introduced unittest test_profiler_gpu.py::test_aggregate_duplication
DickJC123 opened a new issue #18756: URL: https://github.com/apache/incubator-mxnet/issues/18756 ## Description tests/python/gpu/test_profiler_gpu.py has recently started importing unittests/test_profiler.py, and so now runs those tests for the first time with a gpu default context. With that change, I have seen on centos-gpu: ``` worker 'gw3' crashed while running 'tests/python/gpu/test_profiler_gpu.py::test_aggregate_duplication' ``` ## Occurrences http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-18694/11/pipeline @leezu You might want to look into this or watch for more occurences. ## What have you tried to solve it? 1. I've bypassed this test in my current PR with a `del test_aggregate_duplication` after the newly introduced import. 2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] szha commented on pull request #18749: Refactor Gluon parameter serialization format
szha commented on pull request #18749: URL: https://github.com/apache/incubator-mxnet/pull/18749#issuecomment-660577794 > their use is not yet defined Frontend for JVM is planned in #17783. We will at least need to support inference with backend API which will require loading the parameters. MXNet will not be a Python-only framework and the design decision now on the serialization affects other frontends, so it must be taken with care. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] szha commented on pull request #18694: Unittest tolerance handling improvements
szha commented on pull request #18694: URL: https://github.com/apache/incubator-mxnet/pull/18694#issuecomment-660577521 @DickJC123 thanks for the change and for fixing the issues you found along the way. Remember to mark the issues this PR resolve in the description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] xidulu commented on issue #18755: test_gluon_probability_v2.py::test_gluon_kl and test_gluon_probability_v1.py::test_gluon_kl_v1 are flaky
xidulu commented on issue #18755: URL: https://github.com/apache/incubator-mxnet/issues/18755#issuecomment-660574101 @DickJC123 Thanks for discovering this issue, I will take a look at your fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18608: Cherry-pick #18310 #18355
ChaiBapchya commented on pull request #18608: URL: https://github.com/apache/incubator-mxnet/pull/18608#issuecomment-660573558 @sandeep-krishnamurthy @leezu @ciyongch This one fixes MKLDNN missing headers. Please help review/merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 03bc4e1 Bump the publish timestamp. 03bc4e1 is described below commit 03bc4e1aefac00beb1bdc7f78fbe48812cb8b4ba Author: mxnet-ci AuthorDate: Sun Jul 19 00:42:34 2020 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..1ea3a99 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Sun Jul 19 00:42:34 UTC 2020
[incubator-mxnet-site] branch asf-site updated: Publish triggered by CI
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 43c16e1 Publish triggered by CI 43c16e1 is described below commit 43c16e149f8373684c29ac9131dc44d2f8752047 Author: mxnet-ci AuthorDate: Sun Jul 19 00:42:28 2020 + Publish triggered by CI --- api/python/docs/_modules/mxnet/util.html | 82 date.txt | 1 - feed.xml | 2 +- 3 files changed, 42 insertions(+), 43 deletions(-) diff --git a/api/python/docs/_modules/mxnet/util.html b/api/python/docs/_modules/mxnet/util.html index 42e8ea84..b0c301a 100644 --- a/api/python/docs/_modules/mxnet/util.html +++ b/api/python/docs/_modules/mxnet/util.html @@ -815,7 +815,7 @@ return free_mem.value, total_mem.value -def set_np_shape(active): +[docs]def set_np_shape(active): Turns on/off NumPy shape semantics, in which `()` represents the shape of scalar tensors, and tuples with `0` elements, for example, `(0,)`, `(1, 0, 2)`, represent the shapes of zero-size tensors. This is turned off by default for keeping backward compatibility. @@ -859,10 +859,10 @@ deactivate both of them.) prev = ctypes.c_int() check_call(_LIB.MXSetIsNumpyShape(ctypes.c_int(active), ctypes.byref(prev))) -return bool(prev.value) +return bool(prev.value) -def is_np_shape(): +[docs]def is_np_shape(): Checks whether the NumPy shape semantics is currently turned on. In NumPy shape semantics, `()` represents the shape of scalar tensors, and tuples with `0` elements, for example, `(0,)`, `(1, 0, 2)`, represent @@ -893,7 +893,7 @@ curr = ctypes.c_bool() check_call(_LIB.MXIsNumpyShape(ctypes.byref(curr))) -return curr.value +return curr.value class _NumpyShapeScope(object): @@ -924,7 +924,7 @@ set_np_shape(self._prev_is_np_shape) -def np_shape(active=True): +[docs]def np_shape(active=True): Returns an activated/deactivated NumPy shape scope to be used in with statement and captures code that needs the NumPy shape semantics, i.e. support of scalar and zero-size tensors. @@ -990,10 +990,10 @@ assert arg_shapes[0] == () assert out_shapes[0] == () -return _NumpyShapeScope(active) +return _NumpyShapeScope(active) -def use_np_shape(func): +[docs]def use_np_shape(func): A decorator wrapping a function or class with activated NumPy-shape semantics. When `func` is a function, this ensures that the execution of the function is scoped with NumPy shape semantics, such as the support for zero-dim and zero size tensors. When @@ -1064,7 +1064,7 @@ return _with_np_shape else: raise TypeError(use_np_shape can only decorate classes and callable objects, -while received a {}.format(str(type(func +while received a {}.format(str(type(func def _sanity_check_params(func_name, unsupported_params, param_dict): @@ -1074,7 +1074,7 @@ .format(func_name, param_name)) -def set_module(module): +[docs]def set_module(module): Decorator for overriding __module__ on a function or class. Example usage:: @@ -1089,7 +1089,7 @@ if module is not None: func.__module__ = module return func -return decorator +return decorator class _NumpyArrayScope(object): @@ -1117,7 +1117,7 @@ _NumpyArrayScope._current.value = self._old_scope -def np_array(active=True): +[docs]def np_array(active=True): Returns an activated/deactivated NumPy-array scope to be used in with statement and captures code that needs the NumPy-array semantics. @@ -1143,10 +1143,10 @@ _NumpyShapeScope A scope object for wrapping the code w/ or w/o NumPy-shape semantics. -return _NumpyArrayScope(active) +return _NumpyArrayScope(active) -[docs]def is_np_array(): +[docs]def is_np_array(): Checks whether the NumPy-array semantics is currently turned on. This is currently used in Gluon for checking whether an array of type `mxnet.numpy.ndarray` or `mx.nd.NDArray` should be created. For example, at the time when a parameter @@ -1169,7 +1169,7 @@ _NumpyArrayScope._current, value) else False -def use_np_array(func): +[docs]def use_np_array(func): A decorator wrapping Gluon `Block`s and all its methods, properties, and static functions with the semantics of NumPy-array, which means that where ndarrays are created, `mxnet.numpy.ndarray`s should be created, instead of legacy ndarrays of type `mx.nd.NDArray`. @@ -1248,10 +1248,10 @@ return _with_np_array
[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18608: Cherry-pick #18310 #18355
ChaiBapchya commented on pull request #18608: URL: https://github.com/apache/incubator-mxnet/pull/18608#issuecomment-660554703 @MoisesHer no point in retriggering edge pipeline. It's not a flaky issue. I saw the same issue in my cherrypick PR: https://github.com/apache/incubator-mxnet/pull/18742 Please update the cmakevar: @leezu https://github.com/apache/incubator-mxnet/pull/18713 That should resolve the edge issue. I cherry-picked it in 1.x You should do that for 1.7.x Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18608: Cherry-pick #18310 #18355
mxnet-bot commented on pull request #18608: URL: https://github.com/apache/incubator-mxnet/pull/18608#issuecomment-660535624 Jenkins CI successfully triggered : [edge, unix-gpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] MoisesHer commented on pull request #18608: Cherry-pick #18310 #18355
MoisesHer commented on pull request #18608: URL: https://github.com/apache/incubator-mxnet/pull/18608#issuecomment-660535605 @mxnet-bot run ci [edge, unix-gpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] DickJC123 opened a new issue #18755: test_gluon_probability_v2.py::test_gluon_kl and test_gluon_probability_v1.py::test_gluon_kl_v1 are flaky
DickJC123 opened a new issue #18755: URL: https://github.com/apache/incubator-mxnet/issues/18755 ## Description I encountered a failure in one of my CI runs, and have supplied a fix in PR https://github.com/apache/incubator-mxnet/pull/18694 in commit https://github.com/apache/incubator-mxnet/pull/18694/commits/3e05edb3ff4e166831779d438c4537550986b2a0. The fix was developed by printing out the distribution params and noting that the failures occurred when the parameters of the Geometric distribution (being selected from a [0,1] uniform distribution) were in the range 1e-4 -> 1e-3. The fix was to select the parameters from np.random.uniform(size=shape, low=1e-3). This is consistent with approaches taken elsewhere in the tests, where the parameters of the binomial distribution are taken from np.random.uniform(low=0.1, size=shape). After the fix, both tests passed 3000 trials without error. Tagging test creator @xidulu. ## Repro with MXNET_TEST_SEED=1633210984 pytest --verbose -s tests/python/unittest/test_gluon_probability_v2.py::test_gluon_kl MXNET_TEST_SEED=702820740 pytest --verbose -s tests/python/unittest/test_gluon_probability_v1.py::test_gluon_kl_v1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 71c1a62 Bump the publish timestamp. 71c1a62 is described below commit 71c1a628161fba6fbdf98533de2ab954fb80cf8c Author: mxnet-ci AuthorDate: Sat Jul 18 18:42:04 2020 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..951ee20 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Sat Jul 18 18:42:04 UTC 2020
[incubator-mxnet-site] branch asf-site updated: Publish triggered by CI
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 8694042 Publish triggered by CI 8694042 is described below commit 8694042ed087cd414898f6c00985cfe7d1c59eb1 Author: mxnet-ci AuthorDate: Sat Jul 18 18:41:52 2020 + Publish triggered by CI --- date.txt | 1 - feed.xml | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/date.txt b/date.txt deleted file mode 100644 index d0bf04e..000 --- a/date.txt +++ /dev/null @@ -1 +0,0 @@ -Sat Jul 18 12:42:10 UTC 2020 diff --git a/feed.xml b/feed.xml index a9a7cfa..ba84bd1 100644 --- a/feed.xml +++ b/feed.xml @@ -1 +1 @@ -http://www.w3.org/2005/Atom; >https://jekyllrb.com/; version="4.0.0">Jekyllhttps://mxnet.apache.org/feed.xml; rel="self" type="application/atom+xml" />https://mxnet.apache.org/; rel="alternate" type="text/html" />2020-07-18T12:29:57+00:00https://mxnet.apache.org/feed.xmlApache MXNetA flexible and efficient library for deep [...] \ No newline at end of file +http://www.w3.org/2005/Atom; >https://jekyllrb.com/; version="4.0.0">Jekyllhttps://mxnet.apache.org/feed.xml; rel="self" type="application/atom+xml" />https://mxnet.apache.org/; rel="alternate" type="text/html" />2020-07-18T18:30:04+00:00https://mxnet.apache.org/feed.xmlApache MXNetA flexible and efficient library for deep [...] \ No newline at end of file
[GitHub] [incubator-mxnet] D-Roberts commented on pull request #18197: [Numpy] Add qr backward part 2 for wide matrices with m < n
D-Roberts commented on pull request #18197: URL: https://github.com/apache/incubator-mxnet/pull/18197#issuecomment-660501841 @hzfan Thank you for your prompt assistance, I appreciate it. @leezu @szha @DickJC123 I will resubmit a separate PR. For my future reference - what are your recommendations to avoid the "stale PR" situation? CI passed when first submitted about 3 months ago and I rebased and CI passed about 2 months ago when the PR was reviewed. All along I followed up on the PR every 2-3 weeks or so. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ptrendx commented on issue #17913: ModuleNotFoundError: No module named 'mxnet.contrib.amp'
ptrendx commented on issue #17913: URL: https://github.com/apache/incubator-mxnet/issues/17913#issuecomment-660498934 AMP was introduced in MXNet 1.5. Could you try that version (or newer)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new b252ad2 Bump the publish timestamp. b252ad2 is described below commit b252ad270ef0e65caf0545d74d78872afc092df4 Author: mxnet-ci AuthorDate: Sat Jul 18 12:42:11 2020 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..d0bf04e --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Sat Jul 18 12:42:10 UTC 2020
[incubator-mxnet-site] branch asf-site updated: Publish triggered by CI
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 0cb0f3a Publish triggered by CI 0cb0f3a is described below commit 0cb0f3a71641ccead61d42295b87dfc57a6905a9 Author: mxnet-ci AuthorDate: Sat Jul 18 12:42:00 2020 + Publish triggered by CI --- date.txt | 1 - feed.xml | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/date.txt b/date.txt deleted file mode 100644 index 701debd..000 --- a/date.txt +++ /dev/null @@ -1 +0,0 @@ -Sat Jul 18 06:43:22 UTC 2020 diff --git a/feed.xml b/feed.xml index dab3675..a9a7cfa 100644 --- a/feed.xml +++ b/feed.xml @@ -1 +1 @@ -http://www.w3.org/2005/Atom; >https://jekyllrb.com/; version="4.0.0">Jekyllhttps://mxnet.apache.org/feed.xml; rel="self" type="application/atom+xml" />https://mxnet.apache.org/; rel="alternate" type="text/html" />2020-07-18T06:30:16+00:00https://mxnet.apache.org/feed.xmlApache MXNetA flexible and efficient library for deep [...] \ No newline at end of file +http://www.w3.org/2005/Atom; >https://jekyllrb.com/; version="4.0.0">Jekyllhttps://mxnet.apache.org/feed.xml; rel="self" type="application/atom+xml" />https://mxnet.apache.org/; rel="alternate" type="text/html" />2020-07-18T12:29:57+00:00https://mxnet.apache.org/feed.xmlApache MXNetA flexible and efficient library for deep [...] \ No newline at end of file
[GitHub] [incubator-mxnet] gilbertfrancois commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.
gilbertfrancois commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660447079 @szha I've just tested it against **mxnet-cu102 v2.0.0b20200716** and it has the same problem. See below: ``` gamma on CPU and GPU are (almost) equal: True, err: 0.0+-0.0 beta on CPU and GPU are (almost) equal: True, err: 0.0+-0.0 running_mean on CPU and GPU are (almost) equal: False, err: 0.90099+-0.20569 running_var on CPU and GPU are (almost) equal: False, err: 0.90099+-0.20569 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] davidhewitt opened a new issue #18754: Python 3.8 wheel appears to be slightly corrupted
davidhewitt opened a new issue #18754: URL: https://github.com/apache/incubator-mxnet/issues/18754 ## Description I'm attempting to debug a crash observed by a user of PyO3 (https://github.com/PyO3/pyo3/issues/1044) which occurs when `mxnet` is imported. Attempting to use `gdb` (via `rust-gdb` wrapper script) suggests that `mxnet.so` is partially corrupted. `readelf -a` also emits some warnings. Both are pasted below. ### Error Message Errors seen from `readelf -a path/to/libmxnet.so | grep -i warning`: ``` readelf: Warning: Section 0 has an out of range sh_link value of 1179403647 readelf: Warning: Section 1 has an out of range sh_link value of 2381354254 readelf: Warning: Section 2 has an out of range sh_link value of 3825592164 readelf: Warning: Section 3 has an out of range sh_link value of 149717832 readelf: Warning: Section 3 has an out of range sh_info value of 3171257160 readelf: Warning: Section 4 has an out of range sh_link value of 2781517102 readelf: Warning: Section 4 has an out of range sh_info value of 3221398224 readelf: Warning: Section 5 has an out of range sh_link value of 222961676 readelf: Warning: Section 5 has an out of range sh_info value of 2840782096 readelf: Warning: Section 6 has an out of range sh_link value of 536469743 readelf: Warning: Section 7 has an out of range sh_link value of 548555395 readelf: Warning: Section 7 has an out of range sh_info value of 564797439 readelf: Warning: [ 0]: Unexpected value (65794) in info field. readelf: Warning: Size of section 1 is larger than the entire file! readelf: Warning: [ 3]: Expected link to another section in info fieldreadelf: Warning: [ 4]: Expected link to another section in info fieldreadelf: Warning: [ 5]: Expected link to another section in info fieldreadelf: Warning: Size of section 6 is larger than the entire file! readelf: Warning: [ 7]: Expected link to another section in info fieldreadelf: Error: no .dynamic section in the dynamic segment ``` Errors seen from `gdb`: ``` BFD: warning: /home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/mxnet/libmxnet.so has a corrupt section with a size (ff20fb3700 08) larger than the file size BFD: warning: /home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/mxnet/libmxnet.so has a corrupt section with a size (ff20fb3700 08) larger than the file size Error while mapping shared library sections: `/home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/mxnet/libmxnet.so': not in executable format: file format not recognized ``` ## To Reproduce Run `readelf -a path/to/libmxnet.so | grep -i warning`. Alternatively request and I can write tutorial how to install & run the linked Rust code under rust-lldb. ## Environment I'm using Ubuntu 20.04 on WSL2. According to pip, `mxnet` was installed via the wheel `mxnet-1.6.0-py2.py3-none-any.whl`. ``` curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python --Python Info-- Version : 3.8.2 Compiler : GCC 9.3.0 Build: ('default', 'Apr 27 2020 15:53:34') Arch : ('64bit', 'ELF') Pip Info--- Version : 20.0.2 Directory: /home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/pip --MXNet Info--- Version : 1.6.0 Directory: /home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/mxnet Num GPUs : 0 Commit Hash : 6eec9da55c5096079355d1f1a5fa58dcf35d6752 --System Info-- Platform : Linux-4.19.104-microsoft-standard-x86_64-with-glibc2.29 system : Linux node : david-laptop release : 4.19.104-microsoft-standard version : #1 SMP Wed Feb 19 06:37:35 UTC 2020 --Hardware Info-- machine : x86_64 processor: x86_64 Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 165 Model name: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz Stepping:2 CPU MHz: 2592.007 BogoMIPS:5184.01 Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 192 KiB L1i cache:
[GitHub] [incubator-mxnet] Dragas commented on issue #17913: ModuleNotFoundError: No module named 'mxnet.contrib.amp'
Dragas commented on issue #17913: URL: https://github.com/apache/incubator-mxnet/issues/17913#issuecomment-660444070 Seems to be an issue when running an `mxnet/python:1.4.0_cpu_py3` docker image. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu edited a comment on pull request #18749: Refactor Gluon parameter serialization format
leezu edited a comment on pull request #18749: URL: https://github.com/apache/incubator-mxnet/pull/18749#issuecomment-660439938 I think we can delete `mx.npx.load` (or make it private to have a transition period). This PR marks it as deprecated. Do you see any problems? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on pull request #18749: Refactor Gluon parameter serialization format
leezu commented on pull request #18749: URL: https://github.com/apache/incubator-mxnet/pull/18749#issuecomment-660439938 I think we can delete `mx.npx.load` (or make it private to have a transition period) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ZheyuYe commented on pull request #18749: Refactor Gluon parameter serialization format
ZheyuYe commented on pull request #18749: URL: https://github.com/apache/incubator-mxnet/pull/18749#issuecomment-660439727 Yes, this PR do slove #18717. We might also need do refactor `mx.npx.load` to load 'npz' and return a parameters dict. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu edited a comment on pull request #18749: Refactor Gluon parameter serialization format
leezu edited a comment on pull request #18749: URL: https://github.com/apache/incubator-mxnet/pull/18749#issuecomment-660434713 There's no code in the backend that interacts with parameters. You can search for "MXNDArrayLoad" and you see that all code invoking this API has been removed. As the backend does not interact with the format, there is no point in adding APIs here, as their use is not yet defined. The zipfile and numpy formats are trivial to work with, so there's no issue in adding the support at any time. > Please add test for backward compatibility The old format is faulty and I don't see a strong reason to provide backwards compatibility throughout 2.x series. We may remove the C API for loading the old format in a later PR (for example when removing the ndarray operators). As of this PR, backwards compatibility provided as there is no change to C APIs and the Python code backs of to using the C API if the input is not of the new format. It's tested already via unittests and integration tests (via the model-zoo API). Unittest: https://github.com/apache/incubator-mxnet/blob/a4ea4a8330251dd244947074cf4ddb875e611dd5/tests/python/unittest/test_ndarray.py#L1988-L2001 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet-site] branch asf-site updated: Publish triggered by CI
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new ef9eee3 Publish triggered by CI ef9eee3 is described below commit ef9eee300264401711f8d99880f9208d1d7ad91d Author: mxnet-ci AuthorDate: Sat Jul 18 06:43:16 2020 + Publish triggered by CI --- api/python/docs/_modules/mxnet/gluon/block.html| 116 ++--- .../docs/_modules/mxnet/gluon/parameter.html | 60 +-- api/python/docs/_modules/mxnet/util.html | 82 +++ date.txt | 1 - feed.xml | 2 +- 5 files changed, 130 insertions(+), 131 deletions(-) diff --git a/api/python/docs/_modules/mxnet/gluon/block.html b/api/python/docs/_modules/mxnet/gluon/block.html index fd90da6..1717fc8 100644 --- a/api/python/docs/_modules/mxnet/gluon/block.html +++ b/api/python/docs/_modules/mxnet/gluon/block.html @@ -963,7 +963,7 @@ return _merger(args, fmt)[0] -class Block: +[docs]class Block: Base class for all neural network layers and models. Your models should subclass this class. @@ -1060,7 +1060,7 @@ childrens parameters). return self._reg_params -def collect_params(self, select=None): +[docs] def collect_params(self, select=None): Returns a :py:class:`Dict` containing this :py:class:`Block` and all of its childrens Parameters(default), also can returns the select :py:class:`Dict` which match some given regular expressions. @@ -1086,7 +1086,7 @@ # We need to check here because blocks inside containers are not supported. self._check_container_with_block() -return self._collect_params_with_prefix(select=select) +return self._collect_params_with_prefix(select=select) def _collect_params_with_prefix(self, prefix=, select=None): if prefix: @@ -1101,7 +1101,7 @@ ret.update(child()._collect_params_with_prefix(prefix + name, select)) return ret -def save_parameters(self, filename, deduplicate=False): +[docs] def save_parameters(self, filename, deduplicate=False): Save parameters to file. Saved parameters can only be loaded with `load_parameters`. Note that this @@ -1135,9 +1135,9 @@ arg_dict = {key: val._reduce() for key, val in params.items()} save_fn = _mx_npx.save if is_np_array() else ndarray.save -save_fn(filename, arg_dict) +save_fn(filename, arg_dict) -def load_parameters(self, filename, ctx=None, allow_missing=False, +[docs] def load_parameters(self, filename, ctx=None, all [...] ignore_extra=False, cast_dtype=False, dtype_source=current): Load parameters from file previously saved by `save_parameters`. @@ -1190,9 +1190,9 @@ if not loaded: return full_dict = {params: loaded, filename: filename} -self.load_dict(full_dict, ctx, allow_missing, ignore_extra, cast_dtype, dtype_source) +self.load_dict(full_dict, ctx, allow_missing, ignore_extra, cast_dtype, dtype_source) -def load_dict(self, param_dict, ctx=None, allow_missing=False, +[docs] def load_dict(self, param_dict, ctx=None, allow_missingignore_extra=False, cast_dtype=False, dtype_source=current): Load parameters from dict @@ -1249,16 +1249,16 @@ param = loaded[name] if isinstance(param, np.ndarray): param = _mx_np.array(param) if is_np_array() else nd.array(param) -params[name]._load_init(param, ctx, cast_dtype=cast_dtype, dtype_source=params[name]._load_init(param, ctx, cast_dtype=cast_dtype, dtype_source=def register_child(self, block, name=None): +[docs] def register_child(self, block, name=None): Registers block as a child of self. :py:class:`Block` s assigned to self as attributes will be registered automatically. if name is None: name = str(len(self._children)) -self._children[name] = weakref.ref(block) +self._children[name] = weakref.ref(block) -def register_forward_pre_hook(self, hook): +[docs] def register_forward_pre_hook(self, hook): rRegisters a forward pre-hook on the block. The hook function is called immediately before :func:`forward`. @@ -1275,9 +1275,9 @@ handle = HookHandle() handle.attach(self._forward_pre_hooks, hook) -return handle +return handle -def register_forward_hook(self, hook): +[docs] def register_forward_hook(self, hook): rRegisters a forward hook on the block. The hook function is called immediately after :func:`forward`. @@ -1294,9
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new b433e0e Bump the publish timestamp. b433e0e is described below commit b433e0efaf72d96ce6287beacea56a8c81329791 Author: mxnet-ci AuthorDate: Sat Jul 18 06:43:22 2020 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..701debd --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Sat Jul 18 06:43:22 UTC 2020
[GitHub] [incubator-mxnet] leezu commented on pull request #18749: Refactor Gluon parameter serialization format
leezu commented on pull request #18749: URL: https://github.com/apache/incubator-mxnet/pull/18749#issuecomment-660434713 There's no code in the backend that interacts with parameters. You can search for "MXNDArrayLoad" and you see that all code invoking this API has been removed. As the backend does not interact with the format, there is no point in adding APIs here, as their use is not yet defined. The zipfile and numpy formats are trivial to work with, so there's no issue in adding the support at any time. > Please add test for backward compatibility The old format is faulty and I don't see a strong reason to provide backwards compatibility throughout 2.x series. We may remove the C API for loading the old format in a later PR (for example when removing the ndarray operators). As of this PR, backwards compatibility provided as there is no change to C APIs and the Python code backs of to using the C API if the input is not of the new format. It's tested via the model-zoo API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet] branch leezu-patch-3 updated (620e070 -> 4e0fce4)
This is an automated email from the ASF dual-hosted git repository. lausen pushed a change to branch leezu-patch-3 in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. from 620e070 Fix metric API page add 4e0fce4 Update index.rst No new revisions were added by this update. Summary of changes: docs/python_docs/python/api/gluon/metric/index.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
[incubator-mxnet] branch leezu-patch-3 updated (620e070 -> 4e0fce4)
This is an automated email from the ASF dual-hosted git repository. lausen pushed a change to branch leezu-patch-3 in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. from 620e070 Fix metric API page add 4e0fce4 Update index.rst No new revisions were added by this update. Summary of changes: docs/python_docs/python/api/gluon/metric/index.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)