[GitHub] zheng-da commented on issue #9844: Flaky test_operator_gpu.test_binary_op @ Python3: MKLDNN-GPU
zheng-da commented on issue #9844: Flaky test_operator_gpu.test_binary_op @ Python3: MKLDNN-GPU URL: https://github.com/apache/incubator-mxnet/issues/9844#issuecomment-369151122 This isn't flaky. This bug is caused by the implementation of bmod. We can easily reproduce the error as below. ``` ubuntu@ip-172-31-7-213:~/incubator-mxnet$ export MXNET_TEST_SEED=1138777814 ubuntu@ip-172-31-7-213:~/incubator-mxnet$ for i in {1..10}; do nosetests -v tests/python/gpu/test_operator_gpu.py:test_binary_op; done [INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=2101295148 to reproduce. [WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically *** test_operator_gpu.test_binary_op ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1138777814 to reproduce. FAIL == FAIL: test_operator_gpu.test_binary_op -- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc return func(*arg, **kw) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/common.py", line 155, in test_new orig_test(*args, **kwargs) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1377, in test_binary_op test_bmod(a, b) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1353, in test_bmod lambda g_out, a, b: (g_out, - g_out * (np.float32(a) // np.float32(b))), gen_binary_data) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1319, in check_binary_op_backward assert_allclose(y_2.asnumpy(), x_2, rtol=rtol, atol=atol) File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1395, in assert_allclose verbose=verbose, header=header, equal_nan=equal_nan) File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 778, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=0.001, atol=1e-05 (mismatch 0.%) x: array([ -0.00e+00, -0.00e+00, -0.00e+00], [ -6.009688e-01, -0.00e+00, -1.463857e+00]], ... y: array([ -0.00e+00, -0.00e+00, -0.00e+00], [ -6.009688e-01, -0.00e+00, -1.463857e+00]], ... >> begin captured logging << common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=2101295148 to reproduce. common: WARNING: *** test-level seed set: all "@with_seed()" tests run deterministically *** common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1138777814 to reproduce. - >> end captured logging << - -- Ran 1 test in 4.466s FAILED (failures=1) [INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1174927805 to reproduce. [WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically *** test_operator_gpu.test_binary_op ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1138777814 to reproduce. FAIL == FAIL: test_operator_gpu.test_binary_op -- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc return func(*arg, **kw) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/common.py", line 155, in test_new orig_test(*args, **kwargs) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1377, in test_binary_op test_bmod(a, b) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1353, in test_bmod lambda g_out, a, b: (g_out, - g_out * (np.float32(a) // np.float32(b))), gen_binary_data) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1319, in check_binary_op_backward assert_allclose(y_2.asnumpy(), x_2, rtol=rtol, atol=atol) File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1395, in assert_allclose verbose=verbose, header=header, equal_nan=equal_nan) File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 778, in
[GitHub] zheng-da commented on issue #9844: Flaky test_operator_gpu.test_binary_op @ Python3: MKLDNN-GPU
zheng-da commented on issue #9844: Flaky test_operator_gpu.test_binary_op @ Python3: MKLDNN-GPU URL: https://github.com/apache/incubator-mxnet/issues/9844#issuecomment-369151122 It seems this bug is caused by the implementation of bmod. We can easily reproduce the error as below. ``` ubuntu@ip-172-31-7-213:~/incubator-mxnet$ export MXNET_TEST_SEED=1138777814 ubuntu@ip-172-31-7-213:~/incubator-mxnet$ for i in {1..10}; do nosetests -v tests/python/gpu/test_operator_gpu.py:test_binary_op; done [INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=2101295148 to reproduce. [WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically *** test_operator_gpu.test_binary_op ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1138777814 to reproduce. FAIL == FAIL: test_operator_gpu.test_binary_op -- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc return func(*arg, **kw) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/common.py", line 155, in test_new orig_test(*args, **kwargs) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1377, in test_binary_op test_bmod(a, b) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1353, in test_bmod lambda g_out, a, b: (g_out, - g_out * (np.float32(a) // np.float32(b))), gen_binary_data) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1319, in check_binary_op_backward assert_allclose(y_2.asnumpy(), x_2, rtol=rtol, atol=atol) File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1395, in assert_allclose verbose=verbose, header=header, equal_nan=equal_nan) File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 778, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=0.001, atol=1e-05 (mismatch 0.%) x: array([ -0.00e+00, -0.00e+00, -0.00e+00], [ -6.009688e-01, -0.00e+00, -1.463857e+00]], ... y: array([ -0.00e+00, -0.00e+00, -0.00e+00], [ -6.009688e-01, -0.00e+00, -1.463857e+00]], ... >> begin captured logging << common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=2101295148 to reproduce. common: WARNING: *** test-level seed set: all "@with_seed()" tests run deterministically *** common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1138777814 to reproduce. - >> end captured logging << - -- Ran 1 test in 4.466s FAILED (failures=1) [INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1174927805 to reproduce. [WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically *** test_operator_gpu.test_binary_op ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1138777814 to reproduce. FAIL == FAIL: test_operator_gpu.test_binary_op -- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc return func(*arg, **kw) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/common.py", line 155, in test_new orig_test(*args, **kwargs) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1377, in test_binary_op test_bmod(a, b) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1353, in test_bmod lambda g_out, a, b: (g_out, - g_out * (np.float32(a) // np.float32(b))), gen_binary_data) File "/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/test_operator.py", line 1319, in check_binary_op_backward assert_allclose(y_2.asnumpy(), x_2, rtol=rtol, atol=atol) File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1395, in assert_allclose verbose=verbose, header=header, equal_nan=equal_nan) File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 778, in
[GitHub] TaoLv commented on issue #9828: Building with MKL fails on OSX
TaoLv commented on issue #9828: Building with MKL fails on OSX URL: https://github.com/apache/incubator-mxnet/issues/9828#issuecomment-369150613 @sbodenstein Please update your mkldnn to the latest version to see if the compilation issue is addressed. If so, we will submit a seperated PR to update mkldnn version in mxnet. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xinyu-intel commented on issue #9828: Building with MKL fails on OSX
xinyu-intel commented on issue #9828: Building with MKL fails on OSX URL: https://github.com/apache/incubator-mxnet/issues/9828#issuecomment-369149314 @sbodenstein You can manually update mkldnn submodule to the newest one and then build again. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yzzymt commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation
yzzymt commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation URL: https://github.com/apache/incubator-mxnet/issues/8671#issuecomment-369148753 realese date for cu91 for win? Xiexie This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-369142927 @marcoabreu Reorder2Default and MKLDNNDataReorder shouldn't be called frequently. They are not in the critical path. The whole point of this PR is to further remove the invocation of these two methods. Creating temporary arrays isn't in the critical path either. It's used in a very special case: copy MKLDNN data to GPU memory. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-369142927 @marcoabreu Reorder2Default and MKLDNNDataReorder shouldn't be called frequently. They are not in the critical path. The whole point of this PR is to further remove the invocation of these two methods. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.
zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r171157571 ## File path: src/ndarray/ndarray.cc ## @@ -375,7 +375,45 @@ void NDArray::Chunk::Reorder2Default() { CheckAndAlloc(def_pd.get_size()); // TODO(zhengda) We need to avoid memory copy here. memcpy(shandle.dptr, def_mem->get_data_handle(), def_pd.get_size()); - mkl_mem_.reset(new mkldnn::memory(def_pd, shandle.dptr)); + mkl_mem_ = nullptr; +} + +void NDArray::Chunk::MKLDNNDataReorder(const mkldnn::memory::primitive_desc ) { + // If the memory already uses the specified layout, don't do anything. + if (mkl_mem_ != nullptr && mkl_mem_->get_primitive_desc() == pd) +return; + auto _pd = pd; + auto _desc = _pd.desc(); + auto def_format = GetDefaultFormat(_desc); + // If the memory is default, don't do anything. + if (def_format == _desc.data.format && IsDefault()) +return; + // If the specified layout is default, we should use Reorder2Default. + if (def_format == _desc.data.format) { +Reorder2Default(); +return; + } + + std::shared_ptr new_mem(new mkldnn::memory(pd)); + std::shared_ptr old_mem; + if (IsDefault()) { +auto def_pd = GetPrimitiveDesc(pd, def_format); +old_mem.reset(new mkldnn::memory(def_pd, shandle.dptr)); + } else { +old_mem = this->mkl_mem_; + } + CHECK(old_mem->get_primitive_desc().desc().data.ndims == _desc.data.ndims); + + // This may be called in MKLDNN operators. We can't use MKLDNNStream here. + std::vector net; + net.push_back(mkldnn::reorder(*old_mem, *new_mem)); + mkldnn::stream(mkldnn::stream::kind::eager).submit(net).wait(); + + CHECK(shandle.size >= pd.get_size()); + CheckAndAlloc(pd.get_size()); + // TODO(zhengda) We need to avoid memory copy here. Review comment: This is from the previous PR. I just moved code here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeremiedb commented on issue #9625: sparse regression operators
jeremiedb commented on issue #9625: sparse regression operators URL: https://github.com/apache/incubator-mxnet/pull/9625#issuecomment-369116598 I agree that `data`, `label` or other naming assumption isn't ideal. I think it also concerns the Module API if I'm not misstaken, which also uses the fact that an argument ending with `label` will be silently created if no `label` argument is passed to the final loss operator. I think forcing an explicit `label` argument whose name matches what is fed by the iterator would however create back compatibility issues. I think that just adding the `input.names` and `output names` to the `fixed.params` argument of `mx.simple.bind` should solve this PR test issue and maintain backcompatibility. I've integrated that change fix in #9803 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tornadomeet commented on issue #9880: TVM bridge support to JIT NDArray Function by TVM
tornadomeet commented on issue #9880: TVM bridge support to JIT NDArray Function by TVM URL: https://github.com/apache/incubator-mxnet/pull/9880#issuecomment-369141651 build with commit ```48749a5d43864a41653ccd8746cdccf1477b2ae4```, will error exits during make ```shell tvm/runtiime/packed_func.h, No such file or directory ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tornadomeet commented on issue #9880: TVM bridge support to JIT NDArray Function by TVM
tornadomeet commented on issue #9880: TVM bridge support to JIT NDArray Function by TVM URL: https://github.com/apache/incubator-mxnet/pull/9880#issuecomment-369141651 build with commit ```48749a5d43864a41653ccd8746cdccf1477b2ae4```, error exits during make ```shell tvm/runtiime/packed_func.h, No such file or directory ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tornadomeet commented on issue #9880: TVM bridge support to JIT NDArray Function by TVM
tornadomeet commented on issue #9880: TVM bridge support to JIT NDArray Function by TVM URL: https://github.com/apache/incubator-mxnet/pull/9880#issuecomment-369141651 build with commit ```48749a5d43864a41653ccd8746cdccf1477b2ae4```, will error during make ```shell tvm/runtiime/packed_func.h, No such file or directory ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chengdazhi opened a new issue #9914: Gluon speed issus when input size varies across batches
chengdazhi opened a new issue #9914: Gluon speed issus when input size varies across batches URL: https://github.com/apache/incubator-mxnet/issues/9914 Hi. I have noticed that the gluon framework has speed issues when the input spatial size varies across batches. It causes an approximate **2x delay** on a single GPU, and makes **multiple GPUs to have little gain**. The framework discards previous GPU memory blocks when start processing a new batch, which leads to violent GPU memory fluctuations. This problem is absent in previous non-gluon training systems. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XiaotaoChen commented on issue #9611: program can't finished normally in dist_sync mode
XiaotaoChen commented on issue #9611: program can't finished normally in dist_sync mode URL: https://github.com/apache/incubator-mxnet/issues/9611#issuecomment-369137381 Can you tell me the details of setting epoch-size? @feiyuvl I set the epoch-size behind the train_imagenet.py. it tells: train_imagenet.py: error: unrecognized arguments: --epoch-size 320 - [this docs]( http://newdocs.readthedocs.io/en/latest/distributed_training.html) says it's better to set epoch_size explicitly in dist_sync. but haven't tell how to set. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] iblis17 commented on issue #8727: jenkins: julia build script
iblis17 commented on issue #8727: jenkins: julia build script URL: https://github.com/apache/incubator-mxnet/pull/8727#issuecomment-369135435 > Would you be fine with that repo being a mirror of the Apache repository? yeah, mirroring to that repo sounds good to me. > We are make use of the tagging feature and label each issue according to the language binding. In future (see the current vote thread on dev@), we will make use of Jira. This should give you all the tools and overview you need. :+1: > From a first clance it seems like it is not too complicated to migrate to a Sphinx compatible layout, but I could underestimate the required effort. Do you think this would be an issue? Well, I have no idea about any detail of Sphinx at all. Julia's doc system is `Documenter.jl`, which launches Julia compiler, collects docstring from package and render the static html as output. If Sphinx can accept some extra html files from external source, I guess most of work is done. > Generally, this sounds like a discussion we should involve the community into. Would you mind creating a thread on dev@? In the meantime, I will check back about the committership. Ok, will do. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 commented on issue #9913: TODO list for Exception Handling Support
anirudh2290 commented on issue #9913: TODO list for Exception Handling Support URL: https://github.com/apache/incubator-mxnet/issues/9913#issuecomment-369133258 Can a committer please add the "Call for Contribution" tag. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 opened a new issue #9913: TODO list for Exception Handling Support
anirudh2290 opened a new issue #9913: TODO list for Exception Handling Support URL: https://github.com/apache/incubator-mxnet/issues/9913 # Exception Handling Phase 2 - [ ] Improved Exception Types for the Backend - [ ] Improved Exception Types for the Frontend Language Bindings - (Python, Scala, Perl..) - [ ] Support for handling exception thrown from consumed libraries Please see: https://cwiki.apache.org/confluence/display/MXNET/Improved+Exception+Handling+in+MXNet+-+Phase+2 for additional details on the tasks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 commented on issue #9869: Exception handling documentation
anirudh2290 commented on issue #9869: Exception handling documentation URL: https://github.com/apache/incubator-mxnet/pull/9869#issuecomment-369132282 @piiswrong Doesn't it make sense to document that rethrow happens during the blocking calls i.e. waittoreads and also the limitations like waitall etc. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 commented on issue #9869: Exception handling documentation
anirudh2290 commented on issue #9869: Exception handling documentation URL: https://github.com/apache/incubator-mxnet/pull/9869#issuecomment-369132282 @piiswrong Doesn't it make sense to document that rethrow happens during the blocking calls and waittoreads and also the limitations like waitall etc. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #9869: Exception handling documentation
piiswrong commented on issue #9869: Exception handling documentation URL: https://github.com/apache/incubator-mxnet/pull/9869#issuecomment-369131495 I don't think its worth making a tutorial for this. Ideally the async rethrow mechanism should work in the same way native python code works This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Rikorose closed issue #9531: KeyError: in mx.nd.array.empty()
Rikorose closed issue #9531: KeyError: in mx.nd.array.empty() URL: https://github.com/apache/incubator-mxnet/issues/9531 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Rikorose commented on issue #9531: KeyError: in mx.nd.array.empty()
Rikorose commented on issue #9531: KeyError: in mx.nd.array.empty() URL: https://github.com/apache/incubator-mxnet/issues/9531#issuecomment-369129910 No, I close this for now. I'll This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimon777 opened a new issue #9912: No training happening when CSVIter is used.
dimon777 opened a new issue #9912: No training happening when CSVIter is used. URL: https://github.com/apache/incubator-mxnet/issues/9912 ## Description It appears to me CSVIter is broken or something else in MXNet which makes it impossible to train model with CSVIter feeds. I have a reproducible with CSV MNIST dataset (from here: https://pjreddie.com/projects/mnist-in-csv/) ## Environment info (Required) ``` What to do: 1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py 2. Run the script using `python diagnose.py` and paste its output here. $ python3 diagnose.py --Python Info-- Version : 3.6.4 Compiler : GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2) Build: ('default', 'Feb 18 2018 11:42:51') Arch : ('64bit', '') Pip Info--- Version : 9.0.1 Directory: /usr/local/homebrew/lib/python3.6/site-packages/pip --MXNet Info--- Version : 1.1.0 Directory: /usr/local/homebrew/lib/python3.6/site-packages/mxnet Commit Hash : 07a83a0325a3d782513a04f47d711710972cb144 --System Info-- Platform : Darwin-16.7.0-x86_64-i386-64bit system : Darwin node : MAC-DBuzolin release : 16.7.0 version : Darwin Kernel Version 16.7.0: Thu Jan 11 22:59:40 PST 2018; root:xnu-3789.73.8~1/RELEASE_X86_64 --Hardware Info-- machine : x86_64 processor: i386 b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT RDTSCP TSCI' b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID FPU_CSDS' b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C' b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz' --Network Test-- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0266 sec, LOAD: 0.5727 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0413 sec, LOAD: 0.1074 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1497 sec, LOAD: 0.9886 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0516 sec, LOAD: 0.8435 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0285 sec, LOAD: 0.1718 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0313 sec, LOAD: 0.1717 sec. ``` Package used (Python/R/Scala/Julia): Python ## Error Message: No error message but training is converging to "nan" ## Minimum reproducible example ``` from __future__ import print_function import numpy as np import mxnet as mx from mxnet import nd, autograd, gluon import matplotlib.pyplot as plt from numpy import genfromtxt mx.random.seed(1) data_ctx = mx.cpu() model_ctx = mx.cpu() num_inputs=784 data_shape = (num_inputs,) label_shape=(1,) num_outputs = 10 batch_size = 32 train_data = mx.io.CSVIter(data_csv="./data/mnist/mnist_iter_train_data.csv", data_shape=data_shape, label_csv="./data/mnist/mnist_iter_train_label.csv", label_shape=label_shape, batch_size=batch_size, round_batch = False) test_data = mx.io.CSVIter(data_csv="./data/mnist/mnist_iter_test_data.csv", data_shape=data_shape, label_csv="./data/mnist/mnist_iter_test_label.csv", label_shape=label_shape, batch_size=batch_size, round_batch = False) net = gluon.nn.Dense(num_outputs) net.collect_params().initialize(mx.init.Normal(sigma=.1), ctx=model_ctx) softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss() def evaluate_accuracy(data_iterator, net): acc = mx.metric.Accuracy() for i, batch in enumerate(data_iterator): data = batch.data[0].as_in_context(model_ctx)/255 #.reshape((-1,num_inputs)) label = batch.label[0].as_in_context(model_ctx) output = net(data) predictions = nd.argmax(output, axis=1) acc.update(preds=predictions, labels=label) return acc.get()[1] epochs = 10 moving_loss = 0. num_examples = 6 loss_sequence = [] trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1}) for e in range(epochs): cumulative_loss = 0 for i, batch in enumerate(train_data): data = batch.data[0].as_in_context(model_ctx)/255 #.reshape((-1,num_inputs)) label =
[GitHub] sxjscience closed issue #9866: The default weight initialization strategy makes the VGG network difficult to converge when utilizing examples under 'example/image-classification'
sxjscience closed issue #9866: The default weight initialization strategy makes the VGG network difficult to converge when utilizing examples under 'example/image-classification' URL: https://github.com/apache/incubator-mxnet/issues/9866 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sxjscience commented on issue #9866: The default weight initialization strategy makes the VGG network difficult to converge when utilizing examples under 'example/image-classification'
sxjscience commented on issue #9866: The default weight initialization strategy makes the VGG network difficult to converge when utilizing examples under 'example/image-classification' URL: https://github.com/apache/incubator-mxnet/issues/9866#issuecomment-369122573 Closed by https://github.com/apache/incubator-mxnet/pull/9867 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeremiedb commented on issue #9358: Why does running 1 round of an MXNET model training produce Train-mse=NaN?
jeremiedb commented on issue #9358: Why does running 1 round of an MXNET model training produce Train-mse=NaN? URL: https://github.com/apache/incubator-mxnet/issues/9358#issuecomment-369122384 Bug to be fixed by this open PR #9803 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866) (#9867)
This is an automated email from the ASF dual-hosted git repository. sxjscience pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 17a9c6a Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866) (#9867) 17a9c6a is described below commit 17a9c6ad440139d3f87924a8e989d4da252504be Author: Shufan <33112206+juliusshu...@users.noreply.github.com> AuthorDate: Wed Feb 28 13:01:34 2018 +0800 Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866) (#9867) * Enable the reporting of cross-entropy or nll loss value during training * Set the default value of loss as a '' to avoid a Python runtime issue when loss argument is not set * Applying the Xavier with "uniform" type to initialize weight when network is VGG --- example/image-classification/common/fit.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/example/image-classification/common/fit.py b/example/image-classification/common/fit.py index 0e0cd52..9412b6f 100755 --- a/example/image-classification/common/fit.py +++ b/example/image-classification/common/fit.py @@ -237,6 +237,9 @@ def fit(args, network, data_loader, **kwargs): if args.network == 'alexnet': # AlexNet will not converge using Xavier initializer = mx.init.Normal() +# VGG will not trend to converge using Xavier-Gaussian +elif 'vgg' in args.network: +initializer = mx.init.Xavier() else: initializer = mx.init.Xavier( rnd_type='gaussian', factor_type="in", magnitude=2) -- To stop receiving notification emails like this one, please contact sxjscie...@apache.org.
[GitHub] sxjscience closed pull request #9867: Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866)
sxjscience closed pull request #9867: Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866) URL: https://github.com/apache/incubator-mxnet/pull/9867 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/example/image-classification/common/fit.py b/example/image-classification/common/fit.py index 0e0cd521f2..9412b6f937 100755 --- a/example/image-classification/common/fit.py +++ b/example/image-classification/common/fit.py @@ -237,6 +237,9 @@ def fit(args, network, data_loader, **kwargs): if args.network == 'alexnet': # AlexNet will not converge using Xavier initializer = mx.init.Normal() +# VGG will not trend to converge using Xavier-Gaussian +elif 'vgg' in args.network: +initializer = mx.init.Xavier() else: initializer = mx.init.Xavier( rnd_type='gaussian', factor_type="in", magnitude=2) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sxjscience commented on issue #9866: The default weight initialization strategy makes the VGG network difficult to converge when utilizing examples under 'example/image-classification'
sxjscience commented on issue #9866: The default weight initialization strategy makes the VGG network difficult to converge when utilizing examples under 'example/image-classification' URL: https://github.com/apache/incubator-mxnet/issues/9866#issuecomment-369121694 I think it's a nice catch and should be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] juliusshufan commented on issue #9867: Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866)
juliusshufan commented on issue #9867: Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866) URL: https://github.com/apache/incubator-mxnet/pull/9867#issuecomment-369120230 @szha May I have any comments on review from you or other domain owner, I understand normally it is the user to decide the weight initialization method. For this case, as the current implementation of the example explicitly uses a different initialization method for Alexnet to avoid convergence issue, it might be possible to follow similar way for VGG... What do you think? (For description of the issue, you might move to https://github.com/apache/incubator-mxnet/pull/9867 Thanks for your time. BR, Shufan This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] juliusshufan commented on issue #9866: The default weight initialization strategy makes the VGG network difficult to converge when utilizing examples under 'example/image-classification'
juliusshufan commented on issue #9866: The default weight initialization strategy makes the VGG network difficult to converge when utilizing examples under 'example/image-classification' URL: https://github.com/apache/incubator-mxnet/issues/9866#issuecomment-369119883 @sxjscience May I have any comments for this issue, I understand normally it is the user to decide the weight initialization method. For this case, as the current implementation of the example explicitly uses a different initialization method for Alexnet to avoid convergence issue, it might be possible to follow similar way for VGG... What do you think? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeremiedb commented on issue #9625: sparse regression operators
jeremiedb commented on issue #9625: sparse regression operators URL: https://github.com/apache/incubator-mxnet/pull/9625#issuecomment-369116598 I agree that `data`, `label` or other naming assumption isn't ideal. I think it also concerns the Module API if I'm not misstaken, which also uses the fact that an argument ending with `label` will be silently created if no `label` argument is passed to the final loss operator. I think forcing an explicit `label` argument whose name matches what is fed by the iterator would however create back compatibility issues. I think that just adding the `input.names` and `output names` to the `fixed.params` argument of `mx.simple.bind` should solve this PR test issue and maintain backcompatibility. Should I open a new PR for this fix? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] CoinCheung commented on issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ?
CoinCheung commented on issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ? URL: https://github.com/apache/incubator-mxnet/issues/9909#issuecomment-369116553 Yes, what if I would like to change the seed after fetching three batches? The only moment to set the seed is when I define it. Thus once I define it, every thing is fixed and I have no way to add more randoms in my sample batches? It is said that I could reset the seed by adding a line mx.random.seed(4) at where I need it. However, it does not work from my observation in the code I provided in the above. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] huyangc opened a new issue #9911: The different between rescale_grad on optimizer and normalization in loss layer.
huyangc opened a new issue #9911: The different between rescale_grad on optimizer and normalization in loss layer. URL: https://github.com/apache/incubator-mxnet/issues/9911 When I normalization the gradient of the SoftmaxOutput using ``normalization=valid``, and setting the ``rescale_grad=1/batchsize``, it seems like that the gradient will be rescale twice? First is using the number of valid label and second using the batchsize? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet
anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet URL: https://github.com/apache/incubator-mxnet/pull/9892#discussion_r171135726 ## File path: python/mxnet/contrib/serde/_import/import_onnx.py ## @@ -0,0 +1,328 @@ +# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. +# Licensed under the Apache License, Version 2.0 (the "License"). +# You may not use this file except in compliance with the License. +# A copy of the License is located at +# http://www.apache.org/licenses/LICENSE-2.0 +# or in the "license" file accompanying this file. This file is distributed +# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either +# express or implied. See the License for the specific language governing +# permissions and limitations under the License. + +# Derived from Apache 2.0 licensed onnx.py file from DMLC NNVM: +# https://github.com/dmlc/nnvm/blob/3da53e46db57c438b05fbebe8aa332ee8c5994d1/python/nnvm/frontend/onnx.py + +# coding: utf-8 +# pylint: disable=invalid-name,too-many-locals,no-self-use +""" Support import export formats.""" +from __future__ import absolute_import as _abs +from import symbol +from import ndarray as nd +from onnx_mxnet.import_helper import _identity_list, _convert_map, _pad_sequence_fix + +def _convert_operator(op_name, attrs, identity_list=None, convert_map=None): +"""Convert from onnx operator to mxnet operator. +The converter must specify conversions explicitly for incompatible name, and +apply handlers to operator attributes. + +Parameters +-- +op_name : str +Operator name, such as Convolution, FullyConnected +attrs : dict +Dict of operator attributes +identity_list : list +List of operators that don't require conversion +convert_map : dict +Dict of name : callable, where name is the op's name that +require conversion to mxnet, callable are functions which +take attrs and return (new_op_name, new_attrs) + +Returns +--- +(op_name, attrs) +Converted (op_name, attrs) for mxnet. +""" +identity_list = identity_list if identity_list else _identity_list +convert_map = convert_map if convert_map else _convert_map +if op_name in identity_list: +pass +elif op_name in convert_map: +op_name, attrs = convert_map[op_name](attrs) +else: +raise NotImplementedError("Operator {} not implemented.".format(op_name)) +op = getattr(symbol, op_name, None) +if not op: +raise RuntimeError("Unable to map op_name {} to sym".format(op_name)) +return op, attrs + +class GraphProto(object): +"""A helper class for handling mxnet symbol copying from pb2.GraphProto. +Definition: https://github.com/onnx/onnx/blob/master/onnx/onnx.proto +""" +def __init__(self): +self._nodes = {} +self._params = {} +self._renames = {} +self._num_input = 0 +self._num_param = 0 + +def from_onnx(self, graph): +"""Construct symbol from onnx graph. +The inputs from onnx graph is vague, only providing "1", "2"... +For convenience, we rename the `real` input names to "input_0", +"input_1"... And renaming parameters to "param_0", "param_1"... + +Parameters +-- +graph : onnx protobuf object +The loaded onnx graph + +Returns +--- +sym :symbol.Symbol +The returned mxnet symbol +params : dict +A dict of name: nd.array pairs, used as pretrained weights +""" +# parse network inputs, aka parameters +for init_tensor in graph.initializer: +if not init_tensor.name.strip(): +raise ValueError("Tensor's name is required.") +self._params[init_tensor.name] = self._parse_array(init_tensor) + +# converting GraphProto message +for i in graph.input: +if i.name in self._params: +# i is a param instead of input +name_param = 'param_{}'.format(self._num_param) +self._num_param += 1 +self._params[name_param] = self._params.pop(i.name) +self._nodes[name_param] = symbol.Variable(name=name_param, + shape=self._params[name_param].shape) +self._renames[i.name] = name_param +else: +name_input = 'input_{}'.format(self._num_input) +self._num_input += 1 +self._nodes[name_input] = symbol.Variable(name=name_input) +self._renames[i.name] = name_input + +# constructing nodes, nodes are stored as directed acyclic graph +# converting NodeProto message +for node in graph.node: +op_name = node.op_type
[GitHub] anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet
anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet URL: https://github.com/apache/incubator-mxnet/pull/9892#discussion_r171135598 ## File path: python/mxnet/contrib/serde/_import/tests/onnx_backend_test.py ## @@ -0,0 +1,73 @@ +# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. +# Licensed under the Apache License, Version 2.0 (the "License"). +# You may not use this file except in compliance with the License. +# A copy of the License is located at +# http://www.apache.org/licenses/LICENSE-2.0 +# or in the "license" file accompanying this file. This file is distributed +# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either +# express or implied. See the License for the specific language governing +# permissions and limitations under the License. +"""onnx test backend wrapper""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import unittest + +import onnx.backend.test +from onnx_mxnet import backend as mxnet_backend + +# This is a pytest magic variable to load extra plugins +pytest_plugins = 'onnx.backend.test.report' Review comment: will change it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet
anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet URL: https://github.com/apache/incubator-mxnet/pull/9892#discussion_r171135545 ## File path: python/mxnet/contrib/serde/_import/backend.py ## @@ -0,0 +1,131 @@ +# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. +# Licensed under the Apache License, Version 2.0 (the "License"). +# You may not use this file except in compliance with the License. +# A copy of the License is located at +# http://www.apache.org/licenses/LICENSE-2.0 +# or in the "license" file accompanying this file. This file is distributed +# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either +# express or implied. See the License for the specific language governing +# permissions and limitations under the License. + +# coding: utf-8 +# pylint: disable=too-many-locals,invalid-name +"""backend wrapper for onnx test infrastructure""" +from collections import namedtuple +from onnx.backend.base import Backend +from .import_onnx import GraphProto +from .backend_rep import MXNetBackendRep +from import context +from import module +from import ndarray as nd + +# Using these functions for onnx test infrastructure. +# Implemented by following onnx docs guide: +# https://github.com/onnx/onnx/blob/master/docs/Implementing%20an%20ONNX%20backend.md +# MXNetBackend class will take an ONNX model with inputs, perform a computation, +# and then return the output. + +class MXNetBackend(Backend): +"""MXNet backend for ONNX""" Review comment: This class is used by the onnx backend test framework here - https://github.com/anirudhacharya/incubator-mxnet/blob/serde/python/mxnet/contrib/serde/_import/tests/onnx_backend_test.py For testing the import functionality, we intend to use ONNX's backend test framework as described here - https://github.com/onnx/onnx/blob/master/docs/OnnxBackendTest.md This is done to ensure that tests will be shared across different frameworks and that our module will be sync with onnx's standards and definition for various operators. This class will be used by the test framework to run operators on mxnet backend. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet
anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet URL: https://github.com/apache/incubator-mxnet/pull/9892#discussion_r171134425 ## File path: python/mxnet/contrib/__init__.py ## @@ -28,3 +28,4 @@ from . import tensorboard from . import text +from . import serde Review comment: Will change the module name from serde to onnx. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet
anirudhacharya commented on a change in pull request #9892: [WIP] Serde Module for Import/Export of models between Onnx and Mxnet URL: https://github.com/apache/incubator-mxnet/pull/9892#discussion_r171134319 ## File path: python/mxnet/contrib/serde/_export/__init__.py ## @@ -0,0 +1,4 @@ +import onnx + +def export_model(sym, params): +pass Review comment: Will delete this module for now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] DickJC123 commented on issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ?
DickJC123 commented on issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ? URL: https://github.com/apache/incubator-mxnet/issues/9909#issuecomment-369108436 Aren't you always creating a dataiter with an internal rng seeded by seed=1? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #9882: Add force_deterministic option for sparse embedding
eric-haibin-lin commented on a change in pull request #9882: Add force_deterministic option for sparse embedding URL: https://github.com/apache/incubator-mxnet/pull/9882#discussion_r171134195 ## File path: src/operator/tensor/indexing_op.cu ## @@ -60,6 +60,75 @@ struct AddTakeGradRspGPUKernel { } }; +/* + * \brief kernel for backward computation for take, executed with deterministic order + * \param thread_id the thread id + * \param out the output gradient data + * \param lookup_table the table to lookup the position of an id in gradient array + * \param sorted_data the sorted data input + * \param original_idx the original indices of the sorted data input + * \param ograd head gradient + * \param row_length the output dimension + * \param num_threads_per_row the number of threads to process a row together + * \param SZ the number of features a thread is responsible for + */ +template +struct AddTakeGradRspDeterministicKernel { + template + __device__ __forceinline__ static void Map(int thread_id, + DType* out, + const nnvm::dim_t* lookup_table, + const nnvm::dim_t* sorted_data, + const nnvm::dim_t data_size, + const nnvm::dim_t* original_idx, + const DType* ograd, + const nnvm::dim_t row_length, + const nnvm::dim_t num_threads_per_row) { +using nnvm::dim_t; +int tid = thread_id / num_threads_per_row; +const int feature_start = thread_id % num_threads_per_row * SZ; +int num_features = SZ; +if (feature_start + num_features > row_length) { + num_features = row_length - feature_start; +} +if (tid == 0 || sorted_data[tid - 1] != sorted_data[tid]) { + DType acc[SZ]; + #pragma unroll + for (int i = 0; i < SZ; i++) { +acc[i] = 0; + } + const dim_t data = sorted_data[tid]; + const dim_t row_id = lookup_table[data]; + const dim_t out_offset = row_id * row_length + feature_start; + do { +const dim_t idx = original_idx[tid]; +const dim_t ograd_offset = idx * row_length + feature_start; +for (int i = 0; i < num_features; i++) { + acc[i] += ograd[ograd_offset + i]; +} +tid++; + } while (tid < data_size && sorted_data[tid - 1] == sorted_data[tid]); + for (int i = 0; i < num_features; i++) { +out[out_offset + i] = acc[i]; Review comment: should be += instead This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KeyKy closed issue #9680: Is there a video classification distributed training example (demo)?
KeyKy closed issue #9680: Is there a video classification distributed training example (demo)? URL: https://github.com/apache/incubator-mxnet/issues/9680 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #9882: Add force_deterministic option for sparse embedding
eric-haibin-lin commented on a change in pull request #9882: Add force_deterministic option for sparse embedding URL: https://github.com/apache/incubator-mxnet/pull/9882#discussion_r171132851 ## File path: src/operator/tensor/indexing_op.cu ## @@ -103,13 +172,136 @@ void SparseEmbeddingOpForwardRspImpl(const OpContext& ctx, } } +template +void SparseEmbeddingDeterministicKernelLaunch(const OpContext& ctx, + const TBlob& ograd, + const TBlob& data, + const OpReqType req, + const NDArray& output) { + using namespace mshadow; + using namespace mxnet_op; + using namespace expr; + using namespace rowsparse; + using nnvm::dim_t; + mshadow::Stream *s = ctx.get_stream(); + const dim_t num_rows = output.shape()[0]; + const dim_t row_length = output.shape()[1]; + const dim_t data_size = static_cast(data.shape_.Size()); + // temp resource declarations + dim_t* lookup_table = NULL; + void* temp_storage = NULL; + dim_t* sorted_data = NULL; + dim_t* original_idx = NULL; + // calculate number of bytes for temp resources + size_t lookup_table_bytes = num_rows * sizeof(dim_t); + size_t sorted_data_storage_bytes = data_size * sizeof(dim_t); + size_t original_idx_storage_bytes = data_size * sizeof(dim_t); + size_t sort_workspace_size = SortByKeyWorkspaceSize(data_size); + size_t unique_workspace_bytes = 0; + // estimate unique temp space + IType* data_ptr = data.dptr(); + size_t *null_ptr = nullptr; + cub::DeviceSelect::Unique(NULL, unique_workspace_bytes, data_ptr, data_ptr, +null_ptr, data_size, Stream::GetStream(s)); + // One more space reserved for unique count + size_t temp_workspace_bytes = std::max(unique_workspace_bytes, + sort_workspace_size); + size_t total_storage_bytes = lookup_table_bytes + sorted_data_storage_bytes + + original_idx_storage_bytes + temp_workspace_bytes; + + // request resource and split it. layout is: + // lookup_table, sorted_data, original_idx, temp_storage + Tensor workspace = ctx.requested[0] + .get_space_typed (Shape1(total_storage_bytes), s); + lookup_table = reinterpret_cast (workspace.dptr_); + sorted_data = reinterpret_cast (workspace.dptr_ + lookup_table_bytes); + original_idx = reinterpret_cast (workspace.dptr_ + lookup_table_bytes + + sorted_data_storage_bytes); + temp_storage = workspace.dptr_ + total_storage_bytes - temp_workspace_bytes; + + // make a copy of the data, to be sorted + TBlob sorted_data_blob(sorted_data, Shape1(data_size), gpu::kDevMask); + auto sorted_data_tensor = sorted_data_blob.FlatTo1D (s); + mxnet_op::copy(s, sorted_data_blob, data); + + // generate original idx + Tensor original_idx_tensor(original_idx, Shape1(data_size), s); + Kernel ::Launch(s, data_size, 1, static_cast(0), + static_cast(1), kWriteTo, original_idx); + // sort data with its original idx + int num_bits = ilog2(num_rows - 1); + char* temp_storage_ptr = reinterpret_cast (temp_storage); + Tensor temp_storage_tensor(temp_storage_ptr, + Shape1(sort_workspace_size), s); + SortByKey(sorted_data_tensor, original_idx_tensor, true, +_storage_tensor, 0, num_bits); + + // compute unique row ids based on sorted values. + output.CheckAndAllocAuxData(kIdx, Shape1(data_size + 1)); + + // fill row_idx array of output matrix, using the row_flg values + RType* grad_row_idx = output.aux_data(kIdx).dptr(); + cub::DeviceSelect::Unique(temp_storage_ptr, unique_workspace_bytes, sorted_data, + grad_row_idx, grad_row_idx + data_size, data_size, Stream::GetStream(s)); + + dim_t nnr = 0; + CUDA_CALL(cudaMemcpy(, grad_row_idx + data_size, sizeof(RType), + cudaMemcpyDeviceToHost)); + CHECK_EQ(output.shape().ndim(), 2) << "Unexcepted ndim"; + output.CheckAndAllocData(Shape2(nnr, output.shape()[1])); + output.set_aux_shape(kIdx, Shape1(nnr)); + + // generate lookup table + Kernel ::Launch(s, nnr, lookup_table, grad_row_idx); + + // accumulate gradients + DType* grad_data = output.data().dptr(); Review comment: Yes. I should not have removed it. Will update This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pharish93 commented on issue #9690: Possible memory leak with de-convolution operator in CPU mode
pharish93 commented on issue #9690: Possible memory leak with de-convolution operator in CPU mode URL: https://github.com/apache/incubator-mxnet/issues/9690#issuecomment-369105917 1. I couldn't repeat the issue after changing the work space size to 4096, I have run it for about 1.5 days now , 2. https://github.com/pharish93/FaceDetection/tree/code_restructure is the code I was working on ... Face_3D_Models/face_3d/symbol/ gives the symbols file This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pharish93 closed issue #9690: Possible memory leak with de-convolution operator in CPU mode
pharish93 closed issue #9690: Possible memory leak with de-convolution operator in CPU mode URL: https://github.com/apache/incubator-mxnet/issues/9690 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on issue #9881: Inconsistent weight decay logics in multiple optimizers
eric-haibin-lin commented on issue #9881: Inconsistent weight decay logics in multiple optimizers URL: https://github.com/apache/incubator-mxnet/issues/9881#issuecomment-369105116 @sxjscience supposedly it's used in all optimizers AdaDelta doesn't multiply wd and lr. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #9910: bug fix for contrib wikitext2
szha commented on issue #9910: bug fix for contrib wikitext2 URL: https://github.com/apache/incubator-mxnet/pull/9910#issuecomment-369103006 Thanks to @cgraywang for reporting the issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha opened a new pull request #9910: bug fix for contrib wikitext2
szha opened a new pull request #9910: bug fix for contrib wikitext2 URL: https://github.com/apache/incubator-mxnet/pull/9910 ## Description ## Fix a bug that causes the wikitext corpus to have wrong iteration order. ## Checklist ## ### Essentials ### - [x] Passed code style checking (`make lint`) - [x] Changes are complete (i.e. I finished coding on this PR) - [x] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [x] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - [x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [x] Fix wikitext2 sample iteration order This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy commented on issue #9680: Is there a video classification distributed training example (demo)?
sandeep-krishnamurthy commented on issue #9680: Is there a video classification distributed training example (demo)? URL: https://github.com/apache/incubator-mxnet/issues/9680#issuecomment-369096496 https://discuss.mxnet.io/ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KeyKy commented on issue #9680: Is there a video classification distributed training example (demo)?
KeyKy commented on issue #9680: Is there a video classification distributed training example (demo)? URL: https://github.com/apache/incubator-mxnet/issues/9680#issuecomment-369095929 https://github.com/apache/incubator-mxnet/issues/discuss.mxnet.io -> returns not found This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sxjscience commented on a change in pull request #9904: Improve workspace in convolution/deconvolution
sxjscience commented on a change in pull request #9904: Improve workspace in convolution/deconvolution URL: https://github.com/apache/incubator-mxnet/pull/9904#discussion_r171124416 ## File path: src/operator/convolution_v1-inl.h ## @@ -344,9 +348,7 @@ class ConvolutionV1Op : public Operator { shape_dstunit_[1], shape_dstunit_[2] * nstep_); index_t required_size = scol.Size() + sdst.Size(); -CHECK_GE(param_.workspace, required_size) - << "\nMinimum workspace size: " << required_size * sizeof(DType) << " Bytes\n" - << "Given: " << param_.workspace * sizeof(DType) << " Bytes"; +param_.workspace = required_size; Review comment: No need because it's deprecated. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sxjscience commented on a change in pull request #9904: Improve workspace in convolution/deconvolution
sxjscience commented on a change in pull request #9904: Improve workspace in convolution/deconvolution URL: https://github.com/apache/incubator-mxnet/pull/9904#discussion_r171124525 ## File path: src/operator/convolution_v1-inl.h ## @@ -344,9 +348,7 @@ class ConvolutionV1Op : public Operator { shape_dstunit_[1], shape_dstunit_[2] * nstep_); index_t required_size = scol.Size() + sdst.Size(); -CHECK_GE(param_.workspace, required_size) - << "\nMinimum workspace size: " << required_size * sizeof(DType) << " Bytes\n" - << "Given: " << param_.workspace * sizeof(DType) << " Bytes"; +param_.workspace = required_size; Review comment: I change it because I'm doing a global search for `param_.workspace` and changing all the occurrence. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #9904: Improve workspace in convolution/deconvolution
piiswrong commented on a change in pull request #9904: Improve workspace in convolution/deconvolution URL: https://github.com/apache/incubator-mxnet/pull/9904#discussion_r171123906 ## File path: src/operator/convolution_v1-inl.h ## @@ -344,9 +348,7 @@ class ConvolutionV1Op : public Operator { shape_dstunit_[1], shape_dstunit_[2] * nstep_); index_t required_size = scol.Size() + sdst.Size(); -CHECK_GE(param_.workspace, required_size) - << "\nMinimum workspace size: " << required_size * sizeof(DType) << " Bytes\n" - << "Given: " << param_.workspace * sizeof(DType) << " Bytes"; +param_.workspace = required_size; Review comment: do you actually need to change this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #9904: Improve workspace in convolution/deconvolution
piiswrong commented on a change in pull request #9904: Improve workspace in convolution/deconvolution URL: https://github.com/apache/incubator-mxnet/pull/9904#discussion_r171123906 ## File path: src/operator/convolution_v1-inl.h ## @@ -344,9 +348,7 @@ class ConvolutionV1Op : public Operator { shape_dstunit_[1], shape_dstunit_[2] * nstep_); index_t required_size = scol.Size() + sdst.Size(); -CHECK_GE(param_.workspace, required_size) - << "\nMinimum workspace size: " << required_size * sizeof(DType) << " Bytes\n" - << "Given: " << param_.workspace * sizeof(DType) << " Bytes"; +param_.workspace = required_size; Review comment: do you actually need to optimize this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ?
marcoabreu commented on issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ? URL: https://github.com/apache/incubator-mxnet/issues/9909#issuecomment-369092457 @DickJC123 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ?
marcoabreu commented on issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ? URL: https://github.com/apache/incubator-mxnet/issues/9909#issuecomment-369092457 @DickJC123 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] CoinCheung opened a new issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ?
CoinCheung opened a new issue #9909: mx.random.seed(seed) does not work for mx.io.ImageRecordIter() ? URL: https://github.com/apache/incubator-mxnet/issues/9909 ## Description The random seed of mx.io.ImageRecordIter() cannot be changed with mx.random.seed(seed). ## Environment info (Required) ``` Version : 3.6.4 Compiler : GCC 7.2.1 20171224 Build: ('default', 'Jan 5 2018 02:35:40') Arch : ('64bit', '') Pip Info--- Version : 9.0.1 Directory: /usr/lib/python3.6/site-packages/pip --MXNet Info--- Version : 1.1.0 Directory: /home/coin/.local/lib/python3.6/site-packages/mxnet Commit Hash : 07a83a0325a3d782513a04f47d711710972cb144 --System Info-- Platform : Linux-4.14.15-1-ARCH-x86_64-with-arch system : Linux node : Arch-R720 release : 4.14.15-1-ARCH version : #1 SMP PREEMPT Tue Jan 23 21:49:25 UTC 2018 --Hardware Info-- machine : x86_64 processor: Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s):1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz Stepping:9 CPU MHz: 900.142 CPU max MHz: 3500. CPU min MHz: 800. BogoMIPS:4993.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache:256K L3 cache:6144K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti retpoline rsb_ctxsw tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp ``` Package used (Python/R/Scala/Julia): (I'm using python) ## Minimum reproducible example (If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.) ``` import mxnet as mx import core.io as io import random seed = 1 data_record='./datasets/train_list.rec' shape=(3,30,100) label_width=4 batch_size=128 dataiter = mx.io.ImageRecordIter( path_imgrec=data_record, data_shape=shape, label_width=label_width, shuffle=True, seed = seed, batch_size=batch_size ) for i in range(3): batch = dataiter.next() # here set seed each time executing seed = random.randint(0, 5000) print(seed) mx.random.seed(seed) batch = dataiter.next() # on my platform, the printed number stays same each time print(batch.data[0][20][2][15][50]) ``` ## Steps to reproduce (Paste the commands you ran that produced the error.) 1. Just change the parameters in the code to define a dataiter with some .rec file, and save the python script as xxx.py 2. run the python script: python xxx.py several times, and the printed seed changes each time while the printed element in the batch stays same each time. ## What have you tried to solve it? 1. use reset() method after each
[GitHub] sxjscience commented on issue #9904: Improve workspace in convolution/deconvolution
sxjscience commented on issue #9904: Improve workspace in convolution/deconvolution URL: https://github.com/apache/incubator-mxnet/pull/9904#issuecomment-369091667 I think currently there's no special test on that and we will directly raise a runtime OOM error. :sweat_smile: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.
marcoabreu commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r171123124 ## File path: src/ndarray/ndarray.cc ## @@ -375,7 +375,45 @@ void NDArray::Chunk::Reorder2Default() { CheckAndAlloc(def_pd.get_size()); // TODO(zhengda) We need to avoid memory copy here. memcpy(shandle.dptr, def_mem->get_data_handle(), def_pd.get_size()); - mkl_mem_.reset(new mkldnn::memory(def_pd, shandle.dptr)); + mkl_mem_ = nullptr; +} + +void NDArray::Chunk::MKLDNNDataReorder(const mkldnn::memory::primitive_desc ) { + // If the memory already uses the specified layout, don't do anything. + if (mkl_mem_ != nullptr && mkl_mem_->get_primitive_desc() == pd) +return; + auto _pd = pd; + auto _desc = _pd.desc(); + auto def_format = GetDefaultFormat(_desc); + // If the memory is default, don't do anything. + if (def_format == _desc.data.format && IsDefault()) +return; + // If the specified layout is default, we should use Reorder2Default. + if (def_format == _desc.data.format) { +Reorder2Default(); +return; + } + + std::shared_ptr new_mem(new mkldnn::memory(pd)); + std::shared_ptr old_mem; + if (IsDefault()) { +auto def_pd = GetPrimitiveDesc(pd, def_format); +old_mem.reset(new mkldnn::memory(def_pd, shandle.dptr)); + } else { +old_mem = this->mkl_mem_; + } + CHECK(old_mem->get_primitive_desc().desc().data.ndims == _desc.data.ndims); + + // This may be called in MKLDNN operators. We can't use MKLDNNStream here. + std::vector net; + net.push_back(mkldnn::reorder(*old_mem, *new_mem)); + mkldnn::stream(mkldnn::stream::kind::eager).submit(net).wait(); + + CHECK(shandle.size >= pd.get_size()); + CheckAndAlloc(pd.get_size()); + // TODO(zhengda) We need to avoid memory copy here. Review comment: Open TODO This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.
marcoabreu commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r171123210 ## File path: src/ndarray/ndarray.cc ## @@ -1017,6 +1017,7 @@ inline void CopyFromToDnsImpl(const NDArray& from, const NDArray& to, RunContext // with Copy(). NDArray tmp_from = from; if (tmp_from.IsMKLDNNData()) { + // TODO(zhengda) tmp_from should be cached. Review comment: Open TODO This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: workaround for install page display issue (#9902)
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 48749a5 workaround for install page display issue (#9902) 48749a5 is described below commit 48749a5d43864a41653ccd8746cdccf1477b2ae4 Author: Aaron MarkhamAuthorDate: Tue Feb 27 17:41:34 2018 -0800 workaround for install page display issue (#9902) --- docs/build_version_doc/AddVersion.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/build_version_doc/AddVersion.py b/docs/build_version_doc/AddVersion.py index 2c9ee22..c4d088a 100755 --- a/docs/build_version_doc/AddVersion.py +++ b/docs/build_version_doc/AddVersion.py @@ -57,6 +57,9 @@ if __name__ == '__main__': for name in files: if not name.endswith('.html'): continue +if 'install' in path: +print("Skipping this path: {}".format(path)) +continue with open(os.path.join(path, name), 'r') as html_file: content = bs(html_file, 'html.parser') navbar = content.find(id="main-nav") @@ -74,7 +77,7 @@ if __name__ == '__main__': outstr = str(content).replace('', '<').replace('', '>') # Fix link if args.current_version == tag_list[0]: -print("Fixing" + os.path.join(path, name)) +print("Fixing " + os.path.join(path, name)) outstr = outstr.replace('https://mxnet.io', 'https://mxnet.incubator.apache.org') outstr = outstr.replace('http://mxnet.io', 'https://mxnet.incubator.apache.org') else: -- To stop receiving notification emails like this one, please contact marcoab...@apache.org.
[GitHub] marcoabreu closed pull request #9902: workaround for install page display issue
marcoabreu closed pull request #9902: workaround for install page display issue URL: https://github.com/apache/incubator-mxnet/pull/9902 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/build_version_doc/AddVersion.py b/docs/build_version_doc/AddVersion.py index 2c9ee22bf4..c4d088a4b0 100755 --- a/docs/build_version_doc/AddVersion.py +++ b/docs/build_version_doc/AddVersion.py @@ -57,6 +57,9 @@ for name in files: if not name.endswith('.html'): continue +if 'install' in path: +print("Skipping this path: {}".format(path)) +continue with open(os.path.join(path, name), 'r') as html_file: content = bs(html_file, 'html.parser') navbar = content.find(id="main-nav") @@ -74,7 +77,7 @@ outstr = str(content).replace('', '<').replace('', '>') # Fix link if args.current_version == tag_list[0]: -print("Fixing" + os.path.join(path, name)) +print("Fixing " + os.path.join(path, name)) outstr = outstr.replace('https://mxnet.io', 'https://mxnet.incubator.apache.org') outstr = outstr.replace('http://mxnet.io', 'https://mxnet.incubator.apache.org') else: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9902: workaround for install page display issue
marcoabreu commented on issue #9902: workaround for install page display issue URL: https://github.com/apache/incubator-mxnet/pull/9902#issuecomment-369091077 Merging as a workaround. A proper fix should be submitted asap This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9904: Improve workspace in convolution/deconvolution
marcoabreu commented on issue #9904: Improve workspace in convolution/deconvolution URL: https://github.com/apache/incubator-mxnet/pull/9904#issuecomment-369090964 Out of curiosity, do we have any tests to verify that we're actually staying inside these bounds? This could be quite interesting for edge devices since OutOfMemory issues are quite present there. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9908: Update build status link to new CI in the README.md file
marcoabreu commented on issue #9908: Update build status link to new CI in the README.md file URL: https://github.com/apache/incubator-mxnet/pull/9908#issuecomment-369090682 Ah yeah I didn't add the Jenkins build status plugin yet. This is on my ToDo list but moved to later since it's a config change. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sampathchanda commented on issue #9907: Error while building Mxnet from source on MacOS Sierra
sampathchanda commented on issue #9907: Error while building Mxnet from source on MacOS Sierra URL: https://github.com/apache/incubator-mxnet/issues/9907#issuecomment-369090300 @anirudhacharya Refer to #9903 for getting a fix to this issue. I will try to submit a PR for the same. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mbaijal opened a new pull request #9908: Update build status link to new CI in the README.md file
mbaijal opened a new pull request #9908: Update build status link to new CI in the README.md file URL: https://github.com/apache/incubator-mxnet/pull/9908 ## Description ## The top level README file still points to the old CI (builds.apache.org) status page. I have updated this to the new CI. Note: This README.md still contains two more broken links to build.apache.org which should be updated. (These are logos from the apache build page but I do not know what they used to be) ## Checklist ## ### Essentials ### - [ ] Passed code style checking (`make lint`) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] updated one link in the README.md file ## Comments ## Two more broken links exist in this file. If I am unable to fix it as a part of this PR, I will create a github issue for it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin closed pull request #9895: updated version to 1.1.0
eric-haibin-lin closed pull request #9895: updated version to 1.1.0 URL: https://github.com/apache/incubator-mxnet/pull/9895 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/_static/mxnet-theme/index.html b/docs/_static/mxnet-theme/index.html index d22e2541903..3b48832a03c 100644 --- a/docs/_static/mxnet-theme/index.html +++ b/docs/_static/mxnet-theme/index.html @@ -21,9 +21,9 @@ -Apache MXNet 1.0 Released -We're excited to announce the release of MXNet 1.0! Check out the release notes for latest updates. -https://github.com/apache/incubator-mxnet/releases/tag/1.0.0;>Learn More +Apache MXNet 1.1.0 Released +We're excited to announce the release of MXNet 1.1.0! Check out the release notes for latest updates. +https://github.com/apache/incubator-mxnet/releases/tag/1.1.0;>Learn More MXNet Model Server This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: updated version to 1.1.0 (#9895)
This is an automated email from the ASF dual-hosted git repository. haibin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 9761f21 updated version to 1.1.0 (#9895) 9761f21 is described below commit 9761f212788455429e9110847ca8d0d1c0f34164 Author: thinksanky <31976455+thinksa...@users.noreply.github.com> AuthorDate: Tue Feb 27 17:34:28 2018 -0800 updated version to 1.1.0 (#9895) --- docs/_static/mxnet-theme/index.html | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/_static/mxnet-theme/index.html b/docs/_static/mxnet-theme/index.html index d22e254..3b48832 100644 --- a/docs/_static/mxnet-theme/index.html +++ b/docs/_static/mxnet-theme/index.html @@ -21,9 +21,9 @@ -Apache MXNet 1.0 Released -We're excited to announce the release of MXNet 1.0! Check out the release notes for latest updates. -https://github.com/apache/incubator-mxnet/releases/tag/1.0.0;>Learn More +Apache MXNet 1.1.0 Released +We're excited to announce the release of MXNet 1.1.0! Check out the release notes for latest updates. +https://github.com/apache/incubator-mxnet/releases/tag/1.1.0;>Learn More MXNet Model Server -- To stop receiving notification emails like this one, please contact hai...@apache.org.
[GitHub] marcoabreu commented on issue #9841: Update versions of python dependencies
marcoabreu commented on issue #9841: Update versions of python dependencies URL: https://github.com/apache/incubator-mxnet/pull/9841#issuecomment-369089102 LGTM. I have retriggered CI This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9263: Fixes #9210: Cosine Loss Formula
marcoabreu commented on issue #9263: Fixes #9210: Cosine Loss Formula URL: https://github.com/apache/incubator-mxnet/pull/9263#issuecomment-369088699 Ups, sorry. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9888: get runtime error when compile and install
marcoabreu commented on issue #9888: get runtime error when compile and install URL: https://github.com/apache/incubator-mxnet/issues/9888#issuecomment-368584661 Hm I'm not very familiar with Numpy, so I'm afraid I can't help you here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9263: Fixes #9210: Cosine Loss Formula
marcoabreu commented on issue #9263: Fixes #9210: Cosine Loss Formula URL: https://github.com/apache/incubator-mxnet/pull/9263#issuecomment-369088740 So are we good to merge then? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu closed issue #9408: [CI] Merging is not possible because you have unmerged files.
marcoabreu closed issue #9408: [CI] Merging is not possible because you have unmerged files. URL: https://github.com/apache/incubator-mxnet/issues/9408 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #9408: [CI] Merging is not possible because you have unmerged files.
marcoabreu commented on issue #9408: [CI] Merging is not possible because you have unmerged files. URL: https://github.com/apache/incubator-mxnet/issues/9408#issuecomment-369088529 Thank you for reminding me. This was due to a permission issue on CI side. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ykim362 commented on issue #9906: Add CPU optimized docker with MKL-DNN
ykim362 commented on issue #9906: Add CPU optimized docker with MKL-DNN URL: https://github.com/apache/incubator-mxnet/pull/9906#issuecomment-369081547 @kimjanik @ashokei This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudhacharya opened a new issue #9907: Error while building Mxnet from source on MacOS Sierra
anirudhacharya opened a new issue #9907: Error while building Mxnet from source on MacOS Sierra URL: https://github.com/apache/incubator-mxnet/issues/9907 ## Description I am unable to build mxnet from source in a new conda environment on my Mac based on this documentation - https://mxnet.incubator.apache.org/install/index.html1 While building the script errors out in the middle with the error message - Error: homebrew/science was deprecated. This tap is now empty as all its formulae were migrated. The source repo for homebrew/science has been deprecated ( See here - https://github.com/Homebrew/homebrew-science/issues/6365). But the script in mxnet documentation still points to it. We need an alternate source to fetch all the libraries and other packages that we were previously fetching from homebrew/science. ## Environment info (Required) --Python Info-- ('Version :', '2.7.14') ('Compiler :', 'GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)') ('Build:', ('default', 'Dec 25 2017 01:18:54')) ('Arch :', ('64bit', '')) Pip Info--- ('Version :', '9.0.1') ('Directory:', '/Users/aanirud/anaconda2/envs/onnx/lib/python2.7/site-packages/pip') --MXNet Info--- ('Version :', '1.0.0') ('Directory:', '/Users/aanirud/anaconda2/envs/onnx/lib/python2.7/site-packages/mxnet') ('Commit Hash :', '25720d0e3c29232a37e2650f3ba3a2454f9367bb') --System Info-- ('Platform :', 'Darwin-16.7.0-x86_64-i386-64bit') ('system :', 'Darwin') ('node :', '8c85904b0bf4.ant.amazon.com') ('release :', '16.7.0') ('version :', 'Darwin Kernel Version 16.7.0: Thu Jan 11 22:59:40 PST 2018; root:xnu-3789.73.8~1/RELEASE_X86_64') --Hardware Info-- ('machine :', 'x86_64') ('processor:', 'i386') machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz --Network Test-- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0091 sec, LOAD: 0.4961 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0095 sec, LOAD: 0.2989 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0197 sec, LOAD: 0.1999 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0166 sec, LOAD: 0.0561 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0100 sec, LOAD: 0.0339 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0574 sec, LOAD: 0.1784 sec. Package used (Python/R/Scala/Julia): I am using Python 2.7.14 ## Build info (Required if built from source) Failing while trying to build from source. Compiler (gcc/clang/mingw/visual studio): Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 9.0.0 (clang-900.0.39.2) Target: x86_64-apple-darwin16.7.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin MXNet commit hash: b8ae967b3c7b34f0e4b7cb8ac651ae5b282c43e2 Build config: (Paste the content of config.mk, or the build command.) ## Error Message: Error: homebrew/science was deprecated. This tap is now empty as all its formulae were migrated. ## Minimum reproducible example 1. $ curl -O https://raw.githubusercontent.com/dmlc/mxnet/master/setup-utils/install-mxnet-osx-python.sh 2. chmod 744 install-mxnet-osx-python.sh 3. bash install-mxnet-osx-python.sh ## Steps to reproduce 1. Run the above set of commands ## What have you tried to solve it? 1. As mentioned in the source repo for homebrew/science here - https://github.com/Homebrew/homebrew-science/issues/6365 the package has been deprecated, but the script in mxnet documentation still points to it. We need an alternate source to fetch all the libraries and other packages that we were previously fetching from homebrew/science. 2. I had also started a thread on the discussion forum here, but got no reply - https://discuss.mxnet.io/t/mxnet-source-build-on-macos-sierra/670 This is an automated message from the
[GitHub] ykim362 opened a new pull request #9906: Add CPU optimized docker with MKL-DNN
ykim362 opened a new pull request #9906: Add CPU optimized docker with MKL-DNN URL: https://github.com/apache/incubator-mxnet/pull/9906 ## Description ## Adding a new docker input file (mkl) to be utilized for making CPU optimized docker file. ## Checklist ## ### Essentials ### - [ ] Passed code style checking (`make lint`) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Added a new docker input file. (docker/Dockerfiles/Dockerfile.in.lib.mkl). Ran the command './tool.sh build python mkl' ## Comments ## - This is backward compatible. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 commented on issue #9475: OpenCV Error: Assertion failed (dst.cols < SHRT_MAX && dst.rows < SHRT_MAX && src.cols < SHRT_MAX && src.rows < SHRT_MAX) in remap, file /home/travis/bui
anirudh2290 commented on issue #9475: OpenCV Error: Assertion failed (dst.cols < SHRT_MAX && dst.rows < SHRT_MAX && src.cols < SHRT_MAX && src.rows < SHRT_MAX) in remap, file /home/travis/build/dmlc/mxnet-distro/deps/opencv-3.3.0/modules/imgproc/src/imgwarp.cpp, line 4944 terminate called after throwing an instance of 'cv::Exception' URL: https://github.com/apache/incubator-mxnet/issues/9475#issuecomment-369077045 Currently, MXNet only catches dmlc::Error and for other exceptions the process is terminated. Catching exception from dependent libraries will require more work, as it requires changing the c api guard code and the exception mapping and testing for different front-ends. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhaodongsun commented on issue #9713: a fatal error occurred in asynchronous engine operation
zhaodongsun commented on issue #9713: a fatal error occurred in asynchronous engine operation URL: https://github.com/apache/incubator-mxnet/issues/9713#issuecomment-369076650 @Roshrini The issue was solved with smaller batch size This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhaodongsun closed issue #9713: a fatal error occurred in asynchronous engine operation
zhaodongsun closed issue #9713: a fatal error occurred in asynchronous engine operation URL: https://github.com/apache/incubator-mxnet/issues/9713 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ehsanmok opened a new issue #9905: Add DePool/UpPool for Gluon
ehsanmok opened a new issue #9905: Add DePool/UpPool for Gluon URL: https://github.com/apache/incubator-mxnet/issues/9905 There's no UpPooling layer in Gluon MXNet which is mostly needed after Conv1DTranspose and that makes impl of various conv-autoencoder difficult. It'd be great if you could add it soon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sxjscience commented on issue #9904: Improve workspace in convolution/deconvolution
sxjscience commented on issue #9904: Improve workspace in convolution/deconvolution URL: https://github.com/apache/incubator-mxnet/pull/9904#issuecomment-369071591 @pharish93 Would you like to try if this patch solves your problem? This PR automatically enlarges the workspace to make sure that the deconvolution/convolution can be run with batch_size=1. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sampathchanda commented on issue #9226: Deferred Initialization Error after a forward pass
sampathchanda commented on issue #9226: Deferred Initialization Error after a forward pass URL: https://github.com/apache/incubator-mxnet/issues/9226#issuecomment-369070930 Turned out that I was using not using some layers in the forward function, that were already defined under the blocks scope. Fixed now! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sxjscience opened a new pull request #9904: Improve workspace in convolution/deconvolution
sxjscience opened a new pull request #9904: Improve workspace in convolution/deconvolution URL: https://github.com/apache/incubator-mxnet/pull/9904 ## Description ## Revise the description of the workspace parameter. Also, refine the workspace after the effective batch size is determined. Should fix https://github.com/apache/incubator-mxnet/issues/9690 ## Checklist ## ### Essentials ### - [ ] Passed code style checking (`make lint`) - [x] Changes are complete (i.e. I finished coding on this PR) - [x] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [x] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - [x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [x] Set workspace to be the same as the required_size - [x] Revise doc This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #9274: Is compilation on 32 bit supported?
sandeep-krishnamurthy closed issue #9274: Is compilation on 32 bit supported? URL: https://github.com/apache/incubator-mxnet/issues/9274 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #9543: Variable Length Support for cuDNN RNN
piiswrong commented on issue #9543: Variable Length Support for cuDNN RNN URL: https://github.com/apache/incubator-mxnet/issues/9543#issuecomment-369065780 I don't think any one is working on this. This can be added as an option of sym.RNN. when use_mask=True, RNN can take an extra argument. ping @DickJC123 @ptrendx again This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #9543: Variable Length Support for cuDNN RNN
piiswrong commented on issue #9543: Variable Length Support for cuDNN RNN URL: https://github.com/apache/incubator-mxnet/issues/9543#issuecomment-369065780 I don't think any one is working on this. This can be added as an option of sym.RNN. when use_mask=True, RNN can take an extra argument. ping @DickJC123 again This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #9465: Package actualization mxnetR for windows
sandeep-krishnamurthy closed issue #9465: Package actualization mxnetR for windows URL: https://github.com/apache/incubator-mxnet/issues/9465 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #9455: An error occurred while calculating the square of ndarray by using gpu context
sandeep-krishnamurthy closed issue #9455: An error occurred while calculating the square of ndarray by using gpu context URL: https://github.com/apache/incubator-mxnet/issues/9455 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdantu commented on issue #9274: Is compilation on 32 bit supported?
vdantu commented on issue #9274: Is compilation on 32 bit supported? URL: https://github.com/apache/incubator-mxnet/issues/9274#issuecomment-369064351 @sandeep-krishnamurthy : Sorry for the above .. Please label this "Build" , "Question" and close this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdantu commented on issue #9274: Is compilation on 32 bit supported?
vdantu commented on issue #9274: Is compilation on 32 bit supported? URL: https://github.com/apache/incubator-mxnet/issues/9274#issuecomment-369064351 @sandeep-krishnamurthy : Sorry for the above .. Please label this "Build" and close this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdantu commented on issue #9408: [CI] Merging is not possible because you have unmerged files.
vdantu commented on issue #9408: [CI] Merging is not possible because you have unmerged files. URL: https://github.com/apache/incubator-mxnet/issues/9408#issuecomment-369064131 @marcoabreu : Are you still seeing this conflicts? @sandeep-krishnamurthy : Please label it as "CI". This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #9357: can group2ctx be used in multi-machine model parallel situation?
sandeep-krishnamurthy closed issue #9357: can group2ctx be used in multi-machine model parallel situation? URL: https://github.com/apache/incubator-mxnet/issues/9357 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #9507: Segmentation Fault
sandeep-krishnamurthy closed issue #9507: Segmentation Fault URL: https://github.com/apache/incubator-mxnet/issues/9507 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #9842: Custom Function Shape Inference
piiswrong commented on issue #9842: Custom Function Shape Inference URL: https://github.com/apache/incubator-mxnet/issues/9842#issuecomment-369063273 Actually. This is not a MXNet bug. Although the error message is not clear. What happened is Conv2D block relies on the mx.sym.Convolution operator to figure out weight shape from data. Adding a custom op to weight blocks that shape inference path. You can solve this by specifying the in_channels argument for Conv2D. We should improve the error message and report "Deferred initialization failed because xx's shape cannot be inferred" @sxjscience This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #9509: SphereFace
sandeep-krishnamurthy closed issue #9509: SphereFace URL: https://github.com/apache/incubator-mxnet/issues/9509 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy commented on issue #9325: AttributeError: function 'MXGetLastError' not found
sandeep-krishnamurthy commented on issue #9325: AttributeError: function 'MXGetLastError' not found URL: https://github.com/apache/incubator-mxnet/issues/9325#issuecomment-369062655 This is the right install guide - https://mxnet.incubator.apache.org/install/index.html This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sxjscience closed issue #9317: why's the function asnumpy() so slow?
sxjscience closed issue #9317: why's the function asnumpy() so slow? URL: https://github.com/apache/incubator-mxnet/issues/9317 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdantu commented on issue #9274: Is compilation on 32 bit supported?
vdantu commented on issue #9274: Is compilation on 32 bit supported? URL: https://github.com/apache/incubator-mxnet/issues/9274#issuecomment-369061792 @nehaljwani : Does this solve your issue? @sandeep-krishnamurthy : Please label this as "Compilation Errors" and close this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services