[GitHub] asitstands commented on issue #11268: A binary RBM example
asitstands commented on issue #11268: A binary RBM example URL: https://github.com/apache/incubator-mxnet/pull/11268#issuecomment-408584999 Now the log-likelihoods of the test and training data are reported at the completion of each epoch. They are estimated using AIS. README shows some samples generated from the RBM. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #11268: A binary RBM example
szha commented on issue #11268: A binary RBM example URL: https://github.com/apache/incubator-mxnet/pull/11268#issuecomment-408584838 @asitstands thanks for updating the PR. @yifeim would you mind taking another pass at this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch v1.2.0 updated: update 1.2.0. announcement (#11917)
This is an automated email from the ASF dual-hosted git repository. anirudh2290 pushed a commit to branch v1.2.0 in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/v1.2.0 by this push: new d8a4f5a update 1.2.0. announcement (#11917) d8a4f5a is described below commit d8a4f5aceefe4adf1ff092981fee505678fdac3d Author: Aaron Markham AuthorDate: Fri Jul 27 20:59:53 2018 -0700 update 1.2.0. announcement (#11917) --- docs/_static/mxnet-theme/index.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/_static/mxnet-theme/index.html b/docs/_static/mxnet-theme/index.html index 005bc88..3647e23 100644 --- a/docs/_static/mxnet-theme/index.html +++ b/docs/_static/mxnet-theme/index.html @@ -9,7 +9,7 @@ Install -Learn More +Learn More @@ -26,9 +26,9 @@ http://gluon-crash-course.mxnet.io/;>Learn More -MXNet 1.2.0.rc0 Released -We're excited to announce the release of MXNet 1.2.0.rc0! Check out the release notes for latest updates. -https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.2.0+Release+Notes;>Learn More +MXNet 1.2.1 Released +We're excited to announce the release of MXNet 1.2.1! Check out the release notes for latest updates. +https://github.com/apache/incubator-mxnet/releases/tag/1.2.1;>Learn More Introducing the Scala Inference API
[GitHub] anirudh2290 closed pull request #11917: update home page for 1.2.1 announcement
anirudh2290 closed pull request #11917: update home page for 1.2.1 announcement URL: https://github.com/apache/incubator-mxnet/pull/11917 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/_static/mxnet-theme/index.html b/docs/_static/mxnet-theme/index.html index 005bc88f255..3647e23a736 100644 --- a/docs/_static/mxnet-theme/index.html +++ b/docs/_static/mxnet-theme/index.html @@ -9,7 +9,7 @@ Install -Learn More +Learn More @@ -26,9 +26,9 @@ A 60-minute Gluon Crash Course http://gluon-crash-course.mxnet.io/;>Learn More -MXNet 1.2.0.rc0 Released -We're excited to announce the release of MXNet 1.2.0.rc0! Check out the release notes for latest updates. -https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.2.0+Release+Notes;>Learn More +MXNet 1.2.1 Released +We're excited to announce the release of MXNet 1.2.1! Check out the release notes for latest updates. +https://github.com/apache/incubator-mxnet/releases/tag/1.2.1;>Learn More Introducing the Scala Inference API This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #9388: Official version of pretrained MobileNet/ShuffleNet/NASNet is available?
szha commented on issue #9388: Official version of pretrained MobileNet/ShuffleNet/NASNet is available? URL: https://github.com/apache/incubator-mxnet/issues/9388#issuecomment-408571404 @jmnie it's included in the latest versions of mxnet. ``` from mxnet import gluon net = gluon.model_zoo.vision.resnet50_v1(pretrained=True) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on issue #11325: [MXNET-703] TensorRT runtime integration
zheng-da commented on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408570652 I understand this is an experimental integration. It changes the way of using MXNet (users have to pass parameters with `shared_buffer` when binding in the executor and it doesn't support module and Gluon hybridize). If these problems will be fixed later in the next PRs, the PR looks fine to me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 closed pull request #11630: Fix flaky test test_deconvolution
anirudh2290 closed pull request #11630: Fix flaky test test_deconvolution URL: https://github.com/apache/incubator-mxnet/pull/11630 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/operator/linalg_impl.h b/src/operator/linalg_impl.h index 08d2add28eb..c0ae97ad3a4 100644 --- a/src/operator/linalg_impl.h +++ b/src/operator/linalg_impl.h @@ -169,23 +169,52 @@ void linalg_gemm(const Tensor inline \ -void linalg_gemm(const Tensor& A, const Tensor& B, \ - const Tensor& C, DType alpha, DType beta, \ - bool tA, bool tB, Stream *s) { \ - using namespace mxnet; \ - using mshadow::gpu; \ - CHECK_NOTNULL(s); \ - check_gemm(A, B, C, alpha, beta, tA, tB); \ - CUBLAS_CALL(cublas##fname(Stream::GetBlasHandle(s), \ -(tB ? CUBLAS_OP_T : CUBLAS_OP_N), \ -(tA ? CUBLAS_OP_T : CUBLAS_OP_N), \ -C.size(1), C.size(0), (tB ? B.size(1) : B.size(0)), \ -, B.dptr_, B.stride_, A.dptr_, A.stride_, \ -, C.dptr_, C.stride_)) \ +#define LINALG_GPU_GEMM(fname, DType) \ + template <> \ + inline void linalg_gemm( \ + const Tensor& A, const Tensor& B, \ + const Tensor& C, DType alpha, DType beta, bool tA,\ + bool tB, Stream* s) { \ +using namespace mxnet; \ +using mshadow::gpu;\ +CHECK_NOTNULL(s); \ +check_gemm(A, B, C, alpha, beta, tA, tB); \ +CUBLAS_CALL(cublas##fname( \ +Stream::GetBlasHandle(s), (tB ? CUBLAS_OP_T : CUBLAS_OP_N), \ +(tA ? CUBLAS_OP_T : CUBLAS_OP_N), C.size(1), C.size(0),\ +(tB ? B.size(1) : B.size(0)), , B.dptr_, B.stride_, A.dptr_, \ +A.stride_, , C.dptr_, C.stride_)) \ + } + +// Use cublasSgemmEx when it is available (CUDA >= 7.5). Resolves precision issues with +// cublasSgemm. Please see https://github.com/apache/incubator-mxnet/pull/11630 +#if CUDA_VERSION >= 7050 +template <> +inline void linalg_gemm(const Tensor& A, +const Tensor& B, +const Tensor& C, float alpha, +float beta, bool tA, bool tB, +Stream* s) { + using namespace mxnet; + using mshadow::gpu; + CHECK_NOTNULL(s); + check_gemm(A, B, C, alpha, beta, tA, tB); +#if CUDA_VERSION >= 8000 + cudaDataType_t full_datatype = CUDA_R_32F; +#else + cublasDataType_t full_datatype = CUBLAS_DATA_FULL; +#endif + CUBLAS_CALL(cublasSgemmEx( + Stream::GetBlasHandle(s), (tB ? CUBLAS_OP_T : CUBLAS_OP_N), + (tA ? CUBLAS_OP_T : CUBLAS_OP_N), C.size(1), C.size(0), + (tB ? B.size(1) : B.size(0)), , B.dptr_, full_datatype, B.stride_, + A.dptr_, full_datatype, A.stride_, , C.dptr_, full_datatype, + C.stride_)) } + +#else LINALG_GPU_GEMM(Sgemm, float) +#endif LINALG_GPU_GEMM(Dgemm, double) // Version where matrix rows are given by first axis. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 edited a comment on issue #11913: Unexpectedly poor copy() performance
rahul003 edited a comment on issue #11913: Unexpectedly poor copy() performance URL: https://github.com/apache/incubator-mxnet/issues/11913#issuecomment-408567946 Just going by the script, could you put a waitall before your first time() call to ensure we don't factor in time to create the array? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #11919: Accuracy changes with number of GPUs
rahul003 commented on issue #11919: Accuracy changes with number of GPUs URL: https://github.com/apache/incubator-mxnet/issues/11919#issuecomment-408570462 You would need to change your learning rate based on the total batch size, generally proportional to the batch size (as the number of steps the training takes halves in the latter case). Try using lr 0.02 for the latter case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 6d5f070 Bump the publish timestamp. 6d5f070 is described below commit 6d5f070f89b5aa347b057ddae4ab51432f973b86 Author: mxnet-ci AuthorDate: Sat Jul 28 00:45:46 2018 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..b304434 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Sat Jul 28 00:45:46 UTC 2018
[GitHub] mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408569907 @piiswrong Sorry, I meant [this update](https://github.com/mkolod/incubator-mxnet/commit/2a114665ce9342dbb808d9de63cda99fe209a415). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andrewfayres commented on issue #11885: Fix JNI custom op code from deregistering the operator fixes #10438
andrewfayres commented on issue #11885: Fix JNI custom op code from deregistering the operator fixes #10438 URL: https://github.com/apache/incubator-mxnet/pull/11885#issuecomment-408569886 It's more a question of exactly what you want to test. We've got tests for custom operators already and there's already work going on to verify model backward compatibility. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod removed a comment on issue #11325: [MXNET-703] TensorRT runtime integration
mkolod removed a comment on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408569730 @piiswrong Sorry, I meant [this update](https://github.com/mkolod/incubator-mxnet/commit/84015a5be82b9097aaed94bac7b74efb177be26f), not the one above. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408569730 @piiswrong Sorry, I meant [this update](https://github.com/mkolod/incubator-mxnet/commit/84015a5be82b9097aaed94bac7b74efb177be26f), not the one above. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408569285 @piiswrong [Here](https://github.com/mkolod/incubator-mxnet/commit/84015a5be82b9097aaed94bac7b74efb177be26f) is the update. Now, when `MXNET_USE_TENSORRT=1` and `grad_req != 'null'`, a user will get a warning, and execution will proceed without TensorRT. The TensorRT pass will only run if both `MXNET_USE_TENSORRT=1` and `grad_req='null'`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #11913: Unexpectedly poor copy() performance
rahul003 commented on issue #11913: Unexpectedly poor copy() performance URL: https://github.com/apache/incubator-mxnet/issues/11913#issuecomment-408567946 Just going by the script, could you put a waitall before your first time() call to ensure we don't factor in time to create the array. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kpmurali opened a new pull request #11921: [MXNET-711] Added updated logos to the powered by page
kpmurali opened a new pull request #11921: [MXNET-711] Added updated logos to the powered by page URL: https://github.com/apache/incubator-mxnet/pull/11921 ## Description ## Updating logos to the powered by page ## Checklist ## ### Changes ### - [x] Added updated logos to the powered by page This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #11855: Distributed learning with Async update does not work.
rahul003 commented on issue #11855: Distributed learning with Async update does not work. URL: https://github.com/apache/incubator-mxnet/issues/11855#issuecomment-408565519 What optimizer are you using? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on a change in pull request #11910: Improving documentation and error messages for Async distributed training with Gluon
rahul003 commented on a change in pull request #11910: Improving documentation and error messages for Async distributed training with Gluon URL: https://github.com/apache/incubator-mxnet/pull/11910#discussion_r205923132 ## File path: docs/faq/distributed_training.md ## @@ -73,6 +73,13 @@ These can be passed as arguments to the iterator. You can look at [example/gluon/image_classification.py](https://github.com/apache/incubator-mxnet/blob/master/example/gluon/image_classification.py) to see an example usage. +### Updating weights +KVStore server supports two modes, one which aggregates the gradients and updates the weights using those gradients, and second where the server only aggregates gradients. In the latter case, when a worker process pulls from kvstore, it gets the aggregated gradients. The worker then uses these gradients and applies the weights locally. + +When using Gluon there is an option to choose between these modes by passing `update_on_kvstore` variable when you create the [Trainer](https://mxnet.incubator.apache.org/versions/master/api/python/gluon/gluon.html#mxnet.gluon.Trainer) object. Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on a change in pull request #11910: Improving documentation and error messages for Async distributed training with Gluon
rahul003 commented on a change in pull request #11910: Improving documentation and error messages for Async distributed training with Gluon URL: https://github.com/apache/incubator-mxnet/pull/11910#discussion_r205921780 ## File path: python/mxnet/gluon/trainer.py ## @@ -187,6 +187,11 @@ def _init_kvstore(self): arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params} kvstore, update_on_kvstore = _create_kvstore(config['kvstore'], len(self._contexts), arg_arrays) +if kvstore and 'async' in kvstore.type and config['update_on_kvstore'] is not None\ Review comment: If the user does not set that variable explicitly (default way), then I set it to the right value. If the user explicitly sets it to false, then raised the error. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on a change in pull request #11920: Remove fixed seed for test_sparse_nd_save_load
szha commented on a change in pull request #11920: Remove fixed seed for test_sparse_nd_save_load URL: https://github.com/apache/incubator-mxnet/pull/11920#discussion_r205921827 ## File path: tests/python/unittest/test_sparse_ndarray.py ## @@ -534,7 +534,9 @@ def test_sparse_nd_pickle(): assert same(a.asnumpy(), b.asnumpy()) -@with_seed(0) +# @kalyc: Getting rid of fixed seed as flakiness could not be reproduced +# tracked at https://github.com/apache/incubator-mxnet/issues/11741 +@with_seed() Review comment: I see This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #11919: Accuracy changes with number of GPUs
szha commented on issue #11919: Accuracy changes with number of GPUs URL: https://github.com/apache/incubator-mxnet/issues/11919#issuecomment-408563667 Since the actual batch size differs by 2x it's not surprising that accuracy can be different. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc commented on a change in pull request #11920: Remove fixed seed for test_sparse_nd_save_load
kalyc commented on a change in pull request #11920: Remove fixed seed for test_sparse_nd_save_load URL: https://github.com/apache/incubator-mxnet/pull/11920#discussion_r205921622 ## File path: tests/python/unittest/test_sparse_ndarray.py ## @@ -534,7 +534,9 @@ def test_sparse_nd_pickle(): assert same(a.asnumpy(), b.asnumpy()) -@with_seed(0) +# @kalyc: Getting rid of fixed seed as flakiness could not be reproduced +# tracked at https://github.com/apache/incubator-mxnet/issues/11741 +@with_seed() Review comment: See comments by @haojin2 above This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on a change in pull request #11920: Remove fixed seed for test_sparse_nd_save_load
szha commented on a change in pull request #11920: Remove fixed seed for test_sparse_nd_save_load URL: https://github.com/apache/incubator-mxnet/pull/11920#discussion_r205921457 ## File path: tests/python/unittest/test_sparse_ndarray.py ## @@ -534,7 +534,9 @@ def test_sparse_nd_pickle(): assert same(a.asnumpy(), b.asnumpy()) -@with_seed(0) +# @kalyc: Getting rid of fixed seed as flakiness could not be reproduced +# tracked at https://github.com/apache/incubator-mxnet/issues/11741 +@with_seed() Review comment: no need to add comment if not flaky. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] apeforest commented on issue #11841: All the tests in tools/coreml package are failing
apeforest commented on issue #11841: All the tests in tools/coreml package are failing URL: https://github.com/apache/incubator-mxnet/issues/11841#issuecomment-408562366 Added tests to only load the CoreML model without running prediction. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy closed issue #8545: Incorrect results from R 3.4.2 in MNIST
nswamy closed issue #8545: Incorrect results from R 3.4.2 in MNIST URL: https://github.com/apache/incubator-mxnet/issues/8545 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy closed issue #10147: Mxnet R Package Installation Doc Bug
nswamy closed issue #10147: Mxnet R Package Installation Doc Bug URL: https://github.com/apache/incubator-mxnet/issues/10147 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] harryprince opened a new issue #10147: Mxnet R Package Installation Doc Bug
harryprince opened a new issue #10147: Mxnet R Package Installation Doc Bug URL: https://github.com/apache/incubator-mxnet/issues/10147 Wrong: ``` # current repo doc: https://github.com/apache/incubator-mxnet/tree/master/R-package cran <- getOption("repos") cran["dmlc"] <- "https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/CRAN/; options(repos = cran) install.packages("mxnet",dependencies = T) ``` Correct: ``` cran <- getOption("repos") cran["dmlc"] <- "https://s3-us-west-2.amazonaws.com/apache-mxnet/R/CRAN/; options(repos = cran) install.packages("mxnet",dependencies = T) ``` ## Reference https://stackoverflow.com/questions/43872455/mxnet-package-installation-in-r This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy closed issue #10147: Mxnet R Package Installation Doc Bug
nswamy closed issue #10147: Mxnet R Package Installation Doc Bug URL: https://github.com/apache/incubator-mxnet/issues/10147 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy closed issue #10928: Optimizers memory usage
nswamy closed issue #10928: Optimizers memory usage URL: https://github.com/apache/incubator-mxnet/issues/10928 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy commented on issue #10928: Optimizers memory usage
nswamy commented on issue #10928: Optimizers memory usage URL: https://github.com/apache/incubator-mxnet/issues/10928#issuecomment-408561315 closing issue as the referenced PR seems to be resolving it, feel free to open a new issue if you still find problems. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy commented on issue #11822: Install from pre-built binaries failing
nswamy commented on issue #11822: Install from pre-built binaries failing URL: https://github.com/apache/incubator-mxnet/issues/11822#issuecomment-408560973 closing this issue as it seems to be resolved. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy closed issue #11822: Install from pre-built binaries failing
nswamy closed issue #11822: Install from pre-built binaries failing URL: https://github.com/apache/incubator-mxnet/issues/11822 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc commented on issue #11741: test_sparse_ndarray.test_sparse_nd_save_load has fixed seed that can mask flakiness
kalyc commented on issue #11741: test_sparse_ndarray.test_sparse_nd_save_load has fixed seed that can mask flakiness URL: https://github.com/apache/incubator-mxnet/issues/11741#issuecomment-408560463 For reference, initially seed was set to 0 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy closed issue #8792: Different training performance between mxnet v. 0.11.0 and v. 0.12.1
nswamy closed issue #8792: Different training performance between mxnet v. 0.11.0 and v. 0.12.1 URL: https://github.com/apache/incubator-mxnet/issues/8792 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy commented on issue #8792: Different training performance between mxnet v. 0.11.0 and v. 0.12.1
nswamy commented on issue #8792: Different training performance between mxnet v. 0.11.0 and v. 0.12.1 URL: https://github.com/apache/incubator-mxnet/issues/8792#issuecomment-408560141 @VGalata Closing this issue. @anirudhacharya I think @anirudhacharya meant he tried from the master(v1.3 is coming out soon). Please create a new Issue if you find issues. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nswamy closed issue #7196: [R] make sure all optimizers work
nswamy closed issue #7196: [R] make sure all optimizers work URL: https://github.com/apache/incubator-mxnet/issues/7196 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudhacharya commented on issue #8792: Different training performance between mxnet v. 0.11.0 and v. 0.12.1
anirudhacharya commented on issue #8792: Different training performance between mxnet v. 0.11.0 and v. 0.12.1 URL: https://github.com/apache/incubator-mxnet/issues/8792#issuecomment-408559047 @VGalata I tried with the latest v1.3( source build) and was not able to reproduce this issue with the example you provided. Can you please verify and reopen the issue if problem persists. @nswamy please close this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] haojin2 commented on issue #11920: Remove fixed seed for test_sparse_nd_save_load
haojin2 commented on issue #11920: Remove fixed seed for test_sparse_nd_save_load URL: https://github.com/apache/incubator-mxnet/pull/11920#issuecomment-408558498 For this kind of case please refer my changes to #11888 and add the link to the tracking issue in the code so that we can backtrack if it happens to fail again. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudhacharya commented on issue #7196: [R] make sure all optimizers work
anirudhacharya commented on issue #7196: [R] make sure all optimizers work URL: https://github.com/apache/incubator-mxnet/issues/7196#issuecomment-408557567 fixed in this - https://github.com/apache/incubator-mxnet/pull/11374 @nswamy please close. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc commented on issue #11741: test_sparse_ndarray.test_sparse_nd_save_load has fixed seed that can mask flakiness
kalyc commented on issue #11741: test_sparse_ndarray.test_sparse_nd_save_load has fixed seed that can mask flakiness URL: https://github.com/apache/incubator-mxnet/issues/11741#issuecomment-408557099 Unable to reproduce issue with 1 runs, opened PR - https://github.com/apache/incubator-mxnet/pull/11920 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc opened a new pull request #11920: Remove fixed seed for test_sparse_nd_save_load
kalyc opened a new pull request #11920: Remove fixed seed for test_sparse_nd_save_load URL: https://github.com/apache/incubator-mxnet/pull/11920 ## Description ## Remove fixed seed for test_sparse_nd_save_load Unable to reproduce flakiness of test - ran 1 times ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [X] Changes are complete (i.e. I finished coding on this PR) - [X] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [X] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [X] Remove fixed seed for test_sparse_ndarray:test_sparse_nd_save_load ## Comments ## - Related issue - https://github.com/apache/incubator-mxnet/issues/11741 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] abhinavs95 opened a new issue #11919: Accuracy changes with number of GPUs
abhinavs95 opened a new issue #11919: Accuracy changes with number of GPUs URL: https://github.com/apache/incubator-mxnet/issues/11919 ## Description I trained same SqueezeNet model with same hyper-parameters and dataset on p3.8xlarge and p3.16xlarge with same AMI but got ~3% lower accuracies on p3.16xlarge. I used same batch size per GPU but effective batch size is 2x in p3.16xlarge due to 2x number of GPUs. ## Environment info (Required) p3.8xlarge ``` --Python Info-- Version : 3.6.6 Compiler : GCC 7.2.0 Build: ('default', 'Jun 28 2018 17:14:51') Arch : ('64bit', '') Pip Info--- Version : 10.0.1 Directory: /home/ubuntu/anaconda3/envs/gln/lib/python3.6/site-packages/pip --MXNet Info--- Version : 1.3.0 Directory: /home/ubuntu/anaconda3/envs/gln/lib/python3.6/site-packages/mxnet Commit Hash : 65fee984437dcca3516912417e9430cf34ba7313 --System Info-- Platform : Linux-4.4.0-1062-aws-x86_64-with-debian-stretch-sid system : Linux node : ip-172-31-78-153 release : 4.4.0-1062-aws version : #71-Ubuntu SMP Fri Jun 15 10:07:39 UTC 2018 --Hardware Info-- machine : x86_64 processor: x86_64 Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):32 On-line CPU(s) list: 0-31 Thread(s) per core:2 Core(s) per socket:16 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family:6 Model: 79 Model name:Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Stepping: 1 CPU MHz: 1972.070 CPU max MHz: 3000. CPU min MHz: 1200. BogoMIPS: 4600.11 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt --Network Test-- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0032 sec, LOAD: 0.3394 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1960 sec, LOAD: 0.3322 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1599 sec, LOAD: 0.5460 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0474 sec, LOAD: 0.7632 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0039 sec, LOAD: 0.1079 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0043 sec, LOAD: 0.0531 sec. ``` p3.16xlarge ``` --Python Info-- Version : 3.6.6 Compiler : GCC 7.2.0 Build: ('default', 'Jun 28 2018 17:14:51') Arch : ('64bit', '') Pip Info--- Version : 10.0.1 Directory: /home/ubuntu/anaconda3/envs/gluon/lib/python3.6/site-packages/pip --MXNet Info--- Version : 1.3.0 Directory: /home/ubuntu/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet Commit Hash : 3051c49e3454df3b5f8909d3d76c6213d13539ad --System Info-- Platform : Linux-4.4.0-1062-aws-x86_64-with-debian-stretch-sid system : Linux node : ip-172-31-45-182 release : 4.4.0-1062-aws version : #71-Ubuntu SMP Fri Jun 15 10:07:39 UTC 2018 --Hardware Info-- machine : x86_64 processor: x86_64 Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):64 On-line CPU(s) list: 0-63 Thread(s) per core:2 Core(s) per socket:16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family:6 Model: 79 Model name:Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Stepping: 1 CPU MHz: 1581.609 CPU max MHz: 3000. CPU min MHz: 1200. BogoMIPS: 4600.07 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i
[GitHub] haojin2 commented on issue #11867: we need to update the doc of scatter_nd
haojin2 commented on issue #11867: we need to update the doc of scatter_nd URL: https://github.com/apache/incubator-mxnet/issues/11867#issuecomment-408550838 @zheng-da Fix is in #11918 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ptrendx commented on issue #11886: Improve error message of cudnn operators
ptrendx commented on issue #11886: Improve error message of cudnn operators URL: https://github.com/apache/incubator-mxnet/pull/11886#issuecomment-408548357 Sounds good. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] haojin2 commented on issue #11886: Improve error message of cudnn operators
haojin2 commented on issue #11886: Improve error message of cudnn operators URL: https://github.com/apache/incubator-mxnet/pull/11886#issuecomment-408545460 @ptrendx Okay, so how does: ``` N algorithms with minimum memory requirement M bytes have been tried. Workspace size is set to X bytes, please consider reducing the batch/model size or increasing workspace size. ``` look to you? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408543792 @piiswrong Sounds good, I'll address this right away. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: make skiptest work (#11889)
This is an automated email from the ASF dual-hosted git repository. nswamy pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new a8c8737 make skiptest work (#11889) a8c8737 is described below commit a8c873742c25a6cd4b78c6a4d8e1026378fda77d Author: Lanking AuthorDate: Fri Jul 27 14:24:15 2018 -0700 make skiptest work (#11889) --- Makefile | 10 +- scala-package/core/pom.xml | 6 +++--- scala-package/examples/pom.xml | 6 +++--- scala-package/infer/pom.xml| 6 +++--- scala-package/pom.xml | 1 + 5 files changed, 15 insertions(+), 14 deletions(-) diff --git a/Makefile b/Makefile index 88f7dd9..18661aa 100644 --- a/Makefile +++ b/Makefile @@ -608,7 +608,7 @@ scalaintegrationtest: scalainstall: (cd $(ROOTDIR)/scala-package; \ - mvn install -P$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) -DskipTests -Dcxx="$(CXX)" \ + mvn install -P$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) -DskipTests=true -Dcxx="$(CXX)" \ -Dbuild.platform="$(SCALA_PKG_PROFILE)" \ -Dcflags="$(CFLAGS)" -Dldflags="$(LDFLAGS)" \ -Dlddeps="$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a") @@ -617,23 +617,23 @@ scalarelease-dryrun: (cd $(ROOTDIR)/scala-package; \ mvn release:clean release:prepare -DdryRun=true -DautoVersionSubmodules=true \ -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \ - -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) + -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests=true\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) scalarelease-prepare: (cd $(ROOTDIR)/scala-package; \ mvn release:clean release:prepare -DautoVersionSubmodules=true \ -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \ - -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) + -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests=true\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) scalarelease-perform: (cd $(ROOTDIR)/scala-package; \ mvn release:perform -DautoVersionSubmodules=true \ -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \ - -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) + -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests=true\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) scaladeploy: (cd $(ROOTDIR)/scala-package; \ - mvn deploy -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \-DskipTests -Dcxx="$(CXX)" \ + mvn deploy -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \-DskipTests=true -Dcxx="$(CXX)" \ -Dbuild.platform="$(SCALA_PKG_PROFILE)" \ -Dcflags="$(CFLAGS)" -Dldflags="$(LDFLAGS)" \ -Dlddeps="$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a") diff --git a/scala-package/core/pom.xml b/scala-package/core/pom.xml index 134e0a5..1606197 100644 --- a/scala-package/core/pom.xml +++ b/scala-package/core/pom.xml @@ -17,13 +17,13 @@ unittest -false +false integrationtest -true +true @@ -74,7 +74,7 @@ org.scalatest scalatest-maven-plugin - ${skiptest} + ${skipTests} -Djava.library.path=${project.parent.basedir}/native/${platform}/target \ -Dlog4j.configuration=file://${project.basedir}/src/test/resources/log4j.properties diff --git a/scala-package/examples/pom.xml b/scala-package/examples/pom.xml index 9a98f74..d24785b 100644 --- a/scala-package/examples/pom.xml +++ b/scala-package/examples/pom.xml @@ -17,13 +17,13 @@ unittest -true +true integrationtest -false +false @@ -134,7 +134,7 @@ org.scalatest
[GitHub] nswamy closed pull request #11889: [MXNET-319] make skiptest work for Scala
nswamy closed pull request #11889: [MXNET-319] make skiptest work for Scala URL: https://github.com/apache/incubator-mxnet/pull/11889 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/Makefile b/Makefile index 88f7dd9278c..18661aa6984 100644 --- a/Makefile +++ b/Makefile @@ -608,7 +608,7 @@ scalaintegrationtest: scalainstall: (cd $(ROOTDIR)/scala-package; \ - mvn install -P$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) -DskipTests -Dcxx="$(CXX)" \ + mvn install -P$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) -DskipTests=true -Dcxx="$(CXX)" \ -Dbuild.platform="$(SCALA_PKG_PROFILE)" \ -Dcflags="$(CFLAGS)" -Dldflags="$(LDFLAGS)" \ -Dlddeps="$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a") @@ -617,23 +617,23 @@ scalarelease-dryrun: (cd $(ROOTDIR)/scala-package; \ mvn release:clean release:prepare -DdryRun=true -DautoVersionSubmodules=true \ -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \ - -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) + -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests=true\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) scalarelease-prepare: (cd $(ROOTDIR)/scala-package; \ mvn release:clean release:prepare -DautoVersionSubmodules=true \ -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \ - -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) + -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests=true\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) scalarelease-perform: (cd $(ROOTDIR)/scala-package; \ mvn release:perform -DautoVersionSubmodules=true \ -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \ - -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) + -Darguments=""-Dbuild\.platform=\""$(SCALA_PKG_PROFILE)\""\ -DskipTests=true\ -Dcflags=\""$(CFLAGS)\""\ -Dcxx=\""$(CXX)\""\ -Dldflags=\""$(LDFLAGS)\""\ -Dlddeps=\""$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a\) scaladeploy: (cd $(ROOTDIR)/scala-package; \ - mvn deploy -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \-DskipTests -Dcxx="$(CXX)" \ + mvn deploy -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \-DskipTests=true -Dcxx="$(CXX)" \ -Dbuild.platform="$(SCALA_PKG_PROFILE)" \ -Dcflags="$(CFLAGS)" -Dldflags="$(LDFLAGS)" \ -Dlddeps="$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a") diff --git a/scala-package/core/pom.xml b/scala-package/core/pom.xml index 134e0a59da1..16061979f7c 100644 --- a/scala-package/core/pom.xml +++ b/scala-package/core/pom.xml @@ -17,13 +17,13 @@ unittest -false +false integrationtest -true +true @@ -74,7 +74,7 @@ org.scalatest scalatest-maven-plugin - ${skiptest} + ${skipTests} -Djava.library.path=${project.parent.basedir}/native/${platform}/target \ -Dlog4j.configuration=file://${project.basedir}/src/test/resources/log4j.properties diff --git a/scala-package/examples/pom.xml b/scala-package/examples/pom.xml index 9a98f74e4e2..d24785b0e87 100644 --- a/scala-package/examples/pom.xml +++ b/scala-package/examples/pom.xml @@ -17,13 +17,13 @@ unittest -true +true integrationtest -false +false @@ -134,7 +134,7 @@ org.scalatest scalatest-maven-plugin - ${skiptest} + ${skipTests} -Djava.library.path=${project.parent.basedir}/native/${platform}/target \ -Dlog4j.configuration=file://${project.basedir}/src/test/resources/log4j.properties diff --git
[GitHub] nswamy commented on issue #11885: Fix JNI custom op code from deregistering the operator fixes #10438
nswamy commented on issue #11885: Fix JNI custom op code from deregistering the operator fixes #10438 URL: https://github.com/apache/incubator-mxnet/pull/11885#issuecomment-408542060 @andrewfayres is it possible to add some testing to this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] absalama commented on issue #11855: Distributed learning with Async update does not work.
absalama commented on issue #11855: Distributed learning with Async update does not work. URL: https://github.com/apache/incubator-mxnet/issues/11855#issuecomment-408541838 In trainer.py I changed the default value of **update_on_kvstore** from None to True in the __init__ method. The imageclassification.py initialises the trainer object so I assume that if the **update_on_kvstore** is True in the __init__ method then it should be set? We are using slurm cluster, and all nodes are sharing the same folder where mxnet source resides. So the any change should be seen by all worker. Do I need to set something extra for slurm configuration? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #11325: [MXNET-703] TensorRT runtime integration
piiswrong commented on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408537847 yes This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azai91 commented on issue #11896: subgraph TODO
azai91 commented on issue #11896: subgraph TODO URL: https://github.com/apache/incubator-mxnet/issues/11896#issuecomment-408535506 for task 2, what is the optimal layout for the weights? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205894176 ## File path: src/executor/graph_executor.cc ## @@ -941,6 +970,114 @@ void GraphExecutor::FinishInitGraph(nnvm::Symbol symbol, this->InitOpSegs(); } +/*! + * \brief This function is triggered after each tensorrt subgraph replacement pass. + * Reset arguments of GraphExecutor::Init(...) as some variables (weights and biases) + * are absorbed into the TRT engine it also it rerun attributes inferences accordingly + * to the new topology. + */ +Graph GraphExecutor::ReinitGraph(Graph&& g, const Context _ctx, + const std::map _map, + std::vector *in_arg_ctxes, + std::vector *arg_grad_ctxes, + std::vector *aux_state_ctxes, + std::vector *grad_req_types, + std::unordered_map *arg_shape_map, + std::unordered_map *arg_dtype_map, + std::unordered_map *arg_stype_map, + std::unordered_map *params_map) { + std::unordered_set to_remove_params; + for (auto& el : *params_map) { +to_remove_params.insert(el.first); + } + + DFSVisit(g.outputs, [_remove_params](const nnvm::NodePtr n) { +to_remove_params.erase(n->attrs.name); + }); + + for (auto& el : to_remove_params) { +params_map->erase(el); +arg_shape_map->erase(el); +arg_dtype_map->erase(el); +arg_stype_map->erase(el); + } + const auto = g.indexed_graph(); + num_forward_inputs_ = idx.input_nodes().size(); + in_arg_ctxes->resize(num_forward_inputs_ - idx.mutable_input_nodes().size()); Review comment: @zhengda I think it can, but we couldn't get it to work so far, due to the bind() method for module not taking in the shared_buffer, which is necessary for TensorRT engine builder to bake in the weights, which is something that TensorRT requires. Regarding the graph rewrite, note that this is taking place very early on in the bind process. There is shape inference hapening before the rewrite, but no memory allocation, etc., so I think from a data parallel perspective, it should work because the resource allocation isn't done before the rewrite, but after. Also, after the graph rewrite, shapes are determined again, so the bind process follows after the rewrite as if there were no rewrite. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205894970 ## File path: src/executor/graph_executor.cc ## @@ -1018,20 +1156,49 @@ void GraphExecutor::Init(nnvm::Symbol symbol, g.GetAttr("storage_type")); } + if (use_tensorrt_) { + #if MXNET_USE_TENSORRT + // check that this graph is inference-only + if (std::any_of(grad_req_types->begin(), grad_req_types->end(), +[](const OpReqType& op){return op != kNullOp;})) { + LOG(FATAL) << "MXNET_USE_TENSORRT set but graph is not inference-only. " +<< "If it is an inference graph, set grad_req to null during simple_bind call. " +<< "If it is a training graph, unset the MXNET_USE_TENSORRT env variable"; + } + if (shared_buffer->empty()) { +LOG(FATAL) << "MXNET_USE_TENSORRT = 1 but shared_buffer is empty." + << "Please provide weights and other parameters, such as " + << "BatchNorm moments, via the shared_buffer, during simple bind call."; + } + auto trt_groups = GetTrtCompatibleSubsets(g, shared_buffer); + for (auto trt_group : trt_groups) { +if (trt_group.size() > 1) { + g = ReplaceSubgraph(std::move(g), trt_group, shared_buffer); + g = ReinitGraph(std::move(g), default_ctx, ctx_map, in_arg_ctxes, arg_grad_ctxes, + aux_state_ctxes, grad_req_types, arg_shape_map, arg_dtype_map, + arg_stype_map, shared_buffer); Review comment: @Caenorst could you reply to @zheng-da's question above? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408533758 @piiswrong You mean to basically issue a warning and bypass instead of throwing an exception in other cases? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205894176 ## File path: src/executor/graph_executor.cc ## @@ -941,6 +970,114 @@ void GraphExecutor::FinishInitGraph(nnvm::Symbol symbol, this->InitOpSegs(); } +/*! + * \brief This function is triggered after each tensorrt subgraph replacement pass. + * Reset arguments of GraphExecutor::Init(...) as some variables (weights and biases) + * are absorbed into the TRT engine it also it rerun attributes inferences accordingly + * to the new topology. + */ +Graph GraphExecutor::ReinitGraph(Graph&& g, const Context _ctx, + const std::map _map, + std::vector *in_arg_ctxes, + std::vector *arg_grad_ctxes, + std::vector *aux_state_ctxes, + std::vector *grad_req_types, + std::unordered_map *arg_shape_map, + std::unordered_map *arg_dtype_map, + std::unordered_map *arg_stype_map, + std::unordered_map *params_map) { + std::unordered_set to_remove_params; + for (auto& el : *params_map) { +to_remove_params.insert(el.first); + } + + DFSVisit(g.outputs, [_remove_params](const nnvm::NodePtr n) { +to_remove_params.erase(n->attrs.name); + }); + + for (auto& el : to_remove_params) { +params_map->erase(el); +arg_shape_map->erase(el); +arg_dtype_map->erase(el); +arg_stype_map->erase(el); + } + const auto = g.indexed_graph(); + num_forward_inputs_ = idx.input_nodes().size(); + in_arg_ctxes->resize(num_forward_inputs_ - idx.mutable_input_nodes().size()); Review comment: @zhengda I think it can, but we couldn't get it to work so far, due to the bind() method for module not taking in the shared_buffer, which is necessary for TensorRT engine builder to bake in the weights, which is something that TensorRT requires. Regarding the graph rewrite, note that this is taking place very early on in the bind process. There is shape inference hapening before the rewrite, but no memory allocation, etc., so I think from a data parallel perspective, it should work because the resource allocation isn't done before the rewrite, but after. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
piiswrong commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205892192 ## File path: include/mxnet/executor.h ## @@ -152,14 +152,14 @@ class Executor { static Executor* SimpleBind(nnvm::Symbol symbol, Review comment: Also I think it's better to name the functions as InitTensorRT rather than reinitgraph This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
piiswrong commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205892000 ## File path: include/mxnet/executor.h ## @@ -152,14 +152,14 @@ class Executor { static Executor* SimpleBind(nnvm::Symbol symbol, Review comment: why not pass by value? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
piiswrong commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205891103 ## File path: include/mxnet/c_api.h ## @@ -1714,6 +1714,13 @@ MXNET_DLL int MXExecutorReshape(int partial_shaping, NDArrayHandle** aux_states, ExecutorHandle shared_exec, ExecutorHandle *out); + +/*! + * \brief get optimized graph from graph executor + */ +MXNET_DLL int MXExecutorGetOptimizedSymbol(ExecutorHandle handle, Review comment: I think it's better to expose it as a private member of executor This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #11592: Flaky Test Issue of GPU Operator
szha commented on issue #11592: Flaky Test Issue of GPU Operator URL: https://github.com/apache/incubator-mxnet/issues/11592#issuecomment-408525537 I just had the exact same error in another PR. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11482/17/pipeline/859#step-1530-log-1018 @zhanghang1989 what did you do to resolve the problem? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: [MXNET-344] Add more operators to onnx import (#11856)
This is an automated email from the ASF dual-hosted git repository. zhreshold pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 4bbf15c [MXNET-344] Add more operators to onnx import (#11856) 4bbf15c is described below commit 4bbf15c85d300801f6f880f7abe4628e68ced2f7 Author: Anirudh AuthorDate: Fri Jul 27 13:02:44 2018 -0700 [MXNET-344] Add more operators to onnx import (#11856) * add more ops * use dict.get * add list comprehensive * retrigger CI due to unrelated flaky test failure --- .../mxnet/contrib/onnx/onnx2mx/_import_helper.py | 26 ++-- .../mxnet/contrib/onnx/onnx2mx/_op_translations.py | 73 +- tests/python-pytest/onnx/import/test_cases.py | 39 +++- 3 files changed, 116 insertions(+), 22 deletions(-) diff --git a/python/mxnet/contrib/onnx/onnx2mx/_import_helper.py b/python/mxnet/contrib/onnx/onnx2mx/_import_helper.py index c19f0f2..c44403d 100644 --- a/python/mxnet/contrib/onnx/onnx2mx/_import_helper.py +++ b/python/mxnet/contrib/onnx/onnx2mx/_import_helper.py @@ -20,8 +20,9 @@ """Operator attributes conversion""" from ._op_translations import identity, random_uniform, random_normal from ._op_translations import add, subtract, multiply, divide, absolute, negative, add_n -from ._op_translations import tanh -from ._op_translations import ceil, floor +from ._op_translations import tanh, arccos, arcsin, arctan, _cos, _sin, _tan +from ._op_translations import softplus, shape, gather, lp_pooling +from ._op_translations import ceil, floor, hardsigmoid, global_lppooling from ._op_translations import concat from ._op_translations import leaky_relu, _elu, _prelu, softmax, fully_connected from ._op_translations import global_avgpooling, global_maxpooling, linalg_gemm @@ -30,12 +31,13 @@ from ._op_translations import dropout, local_response_norm, conv, deconv from ._op_translations import reshape, cast, split, _slice, transpose, squeeze, flatten from ._op_translations import reciprocal, squareroot, power, exponent, _log, unsqueeze from ._op_translations import reduce_max, reduce_mean, reduce_min, reduce_sum -from ._op_translations import reduce_prod, avg_pooling, max_pooling +from ._op_translations import reduce_prod, avg_pooling, max_pooling, instance_norm from ._op_translations import argmax, argmin, maximum, minimum from ._op_translations import clip, reduce_log_sum, reduce_log_sum_exp -from ._op_translations import reduce_sum_square, reduce_l2, max_roi_pooling, instance_norm +from ._op_translations import reduce_sum_square, reduce_l1, reduce_l2, max_roi_pooling from ._op_translations import log_softmax, softsign, lesser, greater, equal from ._op_translations import logical_and, logical_or, logical_xor, logical_not +from ._op_translations import mean # convert_map defines maps of ONNX operator names to converter functor(callable) # defined in the op_translations module. @@ -77,6 +79,7 @@ _convert_map = { 'FC': fully_connected, 'GlobalAveragePool' : global_avgpooling, 'GlobalMaxPool' : global_maxpooling, +'GlobalLpPool' : global_lppooling, 'Gemm' : linalg_gemm, 'LRN' : local_response_norm, 'Dropout' : dropout, @@ -113,6 +116,7 @@ _convert_map = { 'ReduceLogSum' : reduce_log_sum, 'ReduceLogSumExp' : reduce_log_sum_exp, 'ReduceSumSquare' : reduce_sum_square, +'ReduceL1' : reduce_l1, 'ReduceL2' : reduce_l2, 'MaxRoiPool': max_roi_pooling, 'InstanceNormalization' : instance_norm, @@ -124,5 +128,17 @@ _convert_map = { 'And' : logical_and, 'Xor' : logical_xor, 'Not' : logical_not, -'Or': logical_or +'Or': logical_or, +'Mean' : mean, +'Acos' : arccos, +'Asin' : arcsin, +'Atan' : arctan, +'Cos' : _cos, +'Sin' : _sin, +'Softplus' : softplus, +'Tan' : _tan, +'Shape' : shape, +'Gather': gather, +'HardSigmoid' : hardsigmoid, +'LpPool': lp_pooling } diff --git a/python/mxnet/contrib/onnx/onnx2mx/_op_translations.py b/python/mxnet/contrib/onnx/onnx2mx/_op_translations.py index aa37856..4d1e956 100644 --- a/python/mxnet/contrib/onnx/onnx2mx/_op_translations.py +++ b/python/mxnet/contrib/onnx/onnx2mx/_op_translations.py @@ -80,6 +80,13 @@ def divide(attrs, inputs, proto_obj): return op_value, new_attr, inputs return 'broadcast_div', new_attr, inputs +def mean(attrs, inputs, proto_obj): +"""Mean of all the input tensors.""" +concat_input = [symbol.expand_dims(op_input, axis=0) for op_input in inputs] +
[GitHub] zhreshold closed pull request #11856: [MXNET-344] Add more operators to onnx import
zhreshold closed pull request #11856: [MXNET-344] Add more operators to onnx import URL: https://github.com/apache/incubator-mxnet/pull/11856 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/python/mxnet/contrib/onnx/onnx2mx/_import_helper.py b/python/mxnet/contrib/onnx/onnx2mx/_import_helper.py index c19f0f2cb24..c44403d4992 100644 --- a/python/mxnet/contrib/onnx/onnx2mx/_import_helper.py +++ b/python/mxnet/contrib/onnx/onnx2mx/_import_helper.py @@ -20,8 +20,9 @@ """Operator attributes conversion""" from ._op_translations import identity, random_uniform, random_normal from ._op_translations import add, subtract, multiply, divide, absolute, negative, add_n -from ._op_translations import tanh -from ._op_translations import ceil, floor +from ._op_translations import tanh, arccos, arcsin, arctan, _cos, _sin, _tan +from ._op_translations import softplus, shape, gather, lp_pooling +from ._op_translations import ceil, floor, hardsigmoid, global_lppooling from ._op_translations import concat from ._op_translations import leaky_relu, _elu, _prelu, softmax, fully_connected from ._op_translations import global_avgpooling, global_maxpooling, linalg_gemm @@ -30,12 +31,13 @@ from ._op_translations import reshape, cast, split, _slice, transpose, squeeze, flatten from ._op_translations import reciprocal, squareroot, power, exponent, _log, unsqueeze from ._op_translations import reduce_max, reduce_mean, reduce_min, reduce_sum -from ._op_translations import reduce_prod, avg_pooling, max_pooling +from ._op_translations import reduce_prod, avg_pooling, max_pooling, instance_norm from ._op_translations import argmax, argmin, maximum, minimum from ._op_translations import clip, reduce_log_sum, reduce_log_sum_exp -from ._op_translations import reduce_sum_square, reduce_l2, max_roi_pooling, instance_norm +from ._op_translations import reduce_sum_square, reduce_l1, reduce_l2, max_roi_pooling from ._op_translations import log_softmax, softsign, lesser, greater, equal from ._op_translations import logical_and, logical_or, logical_xor, logical_not +from ._op_translations import mean # convert_map defines maps of ONNX operator names to converter functor(callable) # defined in the op_translations module. @@ -77,6 +79,7 @@ 'FC': fully_connected, 'GlobalAveragePool' : global_avgpooling, 'GlobalMaxPool' : global_maxpooling, +'GlobalLpPool' : global_lppooling, 'Gemm' : linalg_gemm, 'LRN' : local_response_norm, 'Dropout' : dropout, @@ -113,6 +116,7 @@ 'ReduceLogSum' : reduce_log_sum, 'ReduceLogSumExp' : reduce_log_sum_exp, 'ReduceSumSquare' : reduce_sum_square, +'ReduceL1' : reduce_l1, 'ReduceL2' : reduce_l2, 'MaxRoiPool': max_roi_pooling, 'InstanceNormalization' : instance_norm, @@ -124,5 +128,17 @@ 'And' : logical_and, 'Xor' : logical_xor, 'Not' : logical_not, -'Or': logical_or +'Or': logical_or, +'Mean' : mean, +'Acos' : arccos, +'Asin' : arcsin, +'Atan' : arctan, +'Cos' : _cos, +'Sin' : _sin, +'Softplus' : softplus, +'Tan' : _tan, +'Shape' : shape, +'Gather': gather, +'HardSigmoid' : hardsigmoid, +'LpPool': lp_pooling } diff --git a/python/mxnet/contrib/onnx/onnx2mx/_op_translations.py b/python/mxnet/contrib/onnx/onnx2mx/_op_translations.py index aa37856ffad..4d1e9561230 100644 --- a/python/mxnet/contrib/onnx/onnx2mx/_op_translations.py +++ b/python/mxnet/contrib/onnx/onnx2mx/_op_translations.py @@ -80,6 +80,13 @@ def divide(attrs, inputs, proto_obj): return op_value, new_attr, inputs return 'broadcast_div', new_attr, inputs +def mean(attrs, inputs, proto_obj): +"""Mean of all the input tensors.""" +concat_input = [symbol.expand_dims(op_input, axis=0) for op_input in inputs] +concat_sym = symbol.concat(*concat_input, dim=0) +mean_sym = symbol.mean(concat_sym, axis=0) +return mean_sym, attrs, inputs + def logical_and(attrs, inputs, proto_obj): """Logical and of two input arrays.""" return 'broadcast_logical_and', attrs, inputs @@ -186,6 +193,10 @@ def sigmoid(attrs, inputs, proto_obj): """Computes elementwise sigmoid of the input array""" return 'sigmoid', attrs, inputs +def hardsigmoid(attrs, inputs, proto_obj): +"""Computes elementwise hard sigmoid of the input array""" +return 'hard_sigmoid', attrs, inputs + def relu(attrs, inputs, proto_obj):
[GitHub] ssttevee edited a comment on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered
ssttevee edited a comment on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered URL: https://github.com/apache/incubator-mxnet/issues/11914#issuecomment-408487363 Sorry, I meant to say that it doesn't crash with higher a data length like `--data_length=100 --batch_size=2`. The crashes seem to happen at arbitrary the data length and batch size values. It wouldn't make much sense for it to be a memory limit anyways, since none of the parameters gets anywhere close to 1 GB of memory where as both my GPUs have well over 1 GB of memory. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ssttevee edited a comment on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered
ssttevee edited a comment on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered URL: https://github.com/apache/incubator-mxnet/issues/11914#issuecomment-408487363 Sorry, I meant to say that it doesn't crash with higher a data length like `--data_length=100 --batch_size=2`. The crashes seem to happen at arbitrary data length and batch size values. It wouldn't make much sense for it to be a memory limit anyways, since none of the parameters gets anywhere close to 1 GB of memory where as both my GPUs have well over 1 GB of memory. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ssttevee edited a comment on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered
ssttevee edited a comment on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered URL: https://github.com/apache/incubator-mxnet/issues/11914#issuecomment-408487363 Sorry, I meant to say that it doesn't crash with higher a data length like `--data_length=100 --batch_size=2`. The crashes don't seem to have a direct of correlation the actual data length. It wouldn't make much sense for it to be a memory limit anyways, since none of the parameters gets anywhere close to 1 GB of memory where as both my GPUs have well over 1 GB of memory. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #11834: Fix mxnet ctc_loss bug
szha commented on issue #11834: Fix mxnet ctc_loss bug URL: https://github.com/apache/incubator-mxnet/pull/11834#issuecomment-408521183 @HawkAaron thanks for the fix, and @Jerryzcn thanks for the review This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: Fix mxnet ctc_loss bug (#11834)
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 2bddf6f Fix mxnet ctc_loss bug (#11834) 2bddf6f is described below commit 2bddf6f039e94506d11a6539b0e921e5440e09eb Author: Mingkun Huang AuthorDate: Sat Jul 28 03:46:35 2018 +0800 Fix mxnet ctc_loss bug (#11834) * fix ctc_loss GPU bug * add blank_label parameter for CTCLoss * Revert "add blank_label parameter for CTCLoss" This reverts commit aab11f7575580f88f5f27be14466d0deb4b4c456. --- src/operator/contrib/ctc_include/detail/gpu_ctc.h | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/src/operator/contrib/ctc_include/detail/gpu_ctc.h b/src/operator/contrib/ctc_include/detail/gpu_ctc.h index 8015b39..2c521b5 100644 --- a/src/operator/contrib/ctc_include/detail/gpu_ctc.h +++ b/src/operator/contrib/ctc_include/detail/gpu_ctc.h @@ -411,12 +411,7 @@ GpuCTC::compute_log_probs(const ProbT* const activations) { denoms_, out_dim_, num_elements); // compute denominators for softmax -denoms_handle = reduce_with_axis( -F( -log_probs_handle - -broadcast<0>(reduce_with_axis(log_probs_handle, 1), - log_probs_handle.shape_)), -1); +denoms_handle = reduce_with_axis(F(log_probs_handle), 1); // Kernel launch to calculate probabilities compute_log_probs_kernel<<>>
[GitHub] szha closed pull request #11834: Fix mxnet ctc_loss bug
szha closed pull request #11834: Fix mxnet ctc_loss bug URL: https://github.com/apache/incubator-mxnet/pull/11834 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/operator/contrib/ctc_include/detail/gpu_ctc.h b/src/operator/contrib/ctc_include/detail/gpu_ctc.h index 8015b39c437..2c521b5abb5 100644 --- a/src/operator/contrib/ctc_include/detail/gpu_ctc.h +++ b/src/operator/contrib/ctc_include/detail/gpu_ctc.h @@ -411,12 +411,7 @@ GpuCTC::compute_log_probs(const ProbT* const activations) { denoms_, out_dim_, num_elements); // compute denominators for softmax -denoms_handle = reduce_with_axis( -F( -log_probs_handle - -broadcast<0>(reduce_with_axis(log_probs_handle, 1), - log_probs_handle.shape_)), -1); +denoms_handle = reduce_with_axis(F(log_probs_handle), 1); // Kernel launch to calculate probabilities compute_log_probs_kernel<<>> This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] haojin2 opened a new pull request #11918: Improve scatter_nd doc
haojin2 opened a new pull request #11918: Improve scatter_nd doc URL: https://github.com/apache/incubator-mxnet/pull/11918 ## Description ## Address #11867 ## Checklist ## ### Essentials ### - [x] Changes are complete (i.e. I finished coding on this PR) - [x] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [x] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [x] Improve doc for scatter_nd (using tf version as reference) ## Comments ## @zheng-da This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
zheng-da commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205874962 ## File path: src/executor/graph_executor.cc ## @@ -941,6 +970,114 @@ void GraphExecutor::FinishInitGraph(nnvm::Symbol symbol, this->InitOpSegs(); } +/*! + * \brief This function is triggered after each tensorrt subgraph replacement pass. + * Reset arguments of GraphExecutor::Init(...) as some variables (weights and biases) + * are absorbed into the TRT engine it also it rerun attributes inferences accordingly + * to the new topology. + */ +Graph GraphExecutor::ReinitGraph(Graph&& g, const Context _ctx, + const std::map _map, + std::vector *in_arg_ctxes, + std::vector *arg_grad_ctxes, + std::vector *aux_state_ctxes, + std::vector *grad_req_types, + std::unordered_map *arg_shape_map, + std::unordered_map *arg_dtype_map, + std::unordered_map *arg_stype_map, + std::unordered_map *params_map) { + std::unordered_set to_remove_params; + for (auto& el : *params_map) { +to_remove_params.insert(el.first); + } + + DFSVisit(g.outputs, [_remove_params](const nnvm::NodePtr n) { +to_remove_params.erase(n->attrs.name); + }); + + for (auto& el : to_remove_params) { +params_map->erase(el); +arg_shape_map->erase(el); +arg_dtype_map->erase(el); +arg_stype_map->erase(el); + } + const auto = g.indexed_graph(); + num_forward_inputs_ = idx.input_nodes().size(); + in_arg_ctxes->resize(num_forward_inputs_ - idx.mutable_input_nodes().size()); Review comment: If you change the graph like that, does it work with mxnet module? I checked your tests. They are all tested with symbols. Can you test with module? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy commented on a change in pull request #11910: Improving documentation and error messages for Async distributed training with Gluon
sandeep-krishnamurthy commented on a change in pull request #11910: Improving documentation and error messages for Async distributed training with Gluon URL: https://github.com/apache/incubator-mxnet/pull/11910#discussion_r205874324 ## File path: docs/faq/distributed_training.md ## @@ -73,6 +73,13 @@ These can be passed as arguments to the iterator. You can look at [example/gluon/image_classification.py](https://github.com/apache/incubator-mxnet/blob/master/example/gluon/image_classification.py) to see an example usage. +### Updating weights +KVStore server supports two modes, one which aggregates the gradients and updates the weights using those gradients, and second where the server only aggregates gradients. In the latter case, when a worker process pulls from kvstore, it gets the aggregated gradients. The worker then uses these gradients and applies the weights locally. + +When using Gluon there is an option to choose between these modes by passing `update_on_kvstore` variable when you create the [Trainer](https://mxnet.incubator.apache.org/versions/master/api/python/gluon/gluon.html#mxnet.gluon.Trainer) object. Review comment: Example code snippet will be very easy for reader here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy commented on a change in pull request #11910: Improving documentation and error messages for Async distributed training with Gluon
sandeep-krishnamurthy commented on a change in pull request #11910: Improving documentation and error messages for Async distributed training with Gluon URL: https://github.com/apache/incubator-mxnet/pull/11910#discussion_r205874742 ## File path: python/mxnet/gluon/trainer.py ## @@ -187,6 +187,11 @@ def _init_kvstore(self): arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params} kvstore, update_on_kvstore = _create_kvstore(config['kvstore'], len(self._contexts), arg_arrays) +if kvstore and 'async' in kvstore.type and config['update_on_kvstore'] is not None\ Review comment: If we are forcing the user to set this param, why don't we set it inside the function itself as default value? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new b42dda6 Bump the publish timestamp. b42dda6 is described below commit b42dda620cb6d8befa972b10ae069d14ce272862 Author: mxnet-ci AuthorDate: Fri Jul 27 19:03:13 2018 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..af0375a --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Fri Jul 27 19:03:13 UTC 2018
[GitHub] zhreshold commented on issue #11872: "socket.error: [Errno 111] Connection refused" while training with multiple workers
zhreshold commented on issue #11872: "socket.error: [Errno 111] Connection refused" while training with multiple workers URL: https://github.com/apache/incubator-mxnet/issues/11872#issuecomment-408510384 I have figured out that the pre-fetch strategy for data loader is too aggressive which might cause the related issue with shared mem. The fix is included in https://github.com/apache/incubator-mxnet/pull/11908 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] haojin2 commented on a change in pull request #11873: [MXNET-582] Fix flaky test test_operator_gpu.test_batchnorm_with_type (follow-up)
haojin2 commented on a change in pull request #11873: [MXNET-582] Fix flaky test test_operator_gpu.test_batchnorm_with_type (follow-up) URL: https://github.com/apache/incubator-mxnet/pull/11873#discussion_r205865680 ## File path: tests/python/gpu/test_operator_gpu.py ## @@ -290,12 +289,12 @@ def test_batchnorm_with_type(): ] ctx_list_v2_3D = [ -{'ctx': mx.cpu(0), 'norm_data': (4, 2, 3, 5, 5), 'type_dict': {'norm_data': np.float16}}, -{'ctx': mx.cpu(0), 'norm_data': (4, 2, 3, 5, 5), 'type_dict': {'norm_data': np.float32}}, -{'ctx': mx.cpu(0), 'norm_data': (4, 2, 3, 5, 5), 'type_dict': {'norm_data': np.float64}}, -{'ctx': mx.gpu(0), 'norm_data': (4, 2, 3, 5, 5), 'type_dict': {'norm_data': np.float16}}, -{'ctx': mx.gpu(0), 'norm_data': (4, 2, 3, 5, 5), 'type_dict': {'norm_data': np.float32}}, -{'ctx': mx.gpu(0), 'norm_data': (4, 2, 3, 5, 5), 'type_dict': {'norm_data': np.float64}} +{'ctx': mx.cpu(0), 'norm_data': (3, 2, 3, 2, 3), 'type_dict': {'norm_data': np.float16}}, +{'ctx': mx.cpu(0), 'norm_data': (3, 2, 3, 2, 3), 'type_dict': {'norm_data': np.float32}}, +{'ctx': mx.cpu(0), 'norm_data': (3, 2, 3, 2, 3), 'type_dict': {'norm_data': np.float64}}, +{'ctx': mx.gpu(0), 'norm_data': (3, 2, 3, 2, 3), 'type_dict': {'norm_data': np.float16}}, +{'ctx': mx.gpu(0), 'norm_data': (3, 2, 3, 2, 3), 'type_dict': {'norm_data': np.float32}}, +{'ctx': mx.gpu(0), 'norm_data': (3, 2, 3, 2, 3), 'type_dict': {'norm_data': np.float64}} Review comment: Just got a chance to take a look at your reply, I'll dig into the 2 links you provided. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205860313 ## File path: src/executor/graph_executor.cc ## @@ -941,6 +970,114 @@ void GraphExecutor::FinishInitGraph(nnvm::Symbol symbol, this->InitOpSegs(); } +/*! + * \brief This function is triggered after each tensorrt subgraph replacement pass. + * Reset arguments of GraphExecutor::Init(...) as some variables (weights and biases) + * are absorbed into the TRT engine it also it rerun attributes inferences accordingly + * to the new topology. + */ +Graph GraphExecutor::ReinitGraph(Graph&& g, const Context _ctx, + const std::map _map, + std::vector *in_arg_ctxes, + std::vector *arg_grad_ctxes, + std::vector *aux_state_ctxes, + std::vector *grad_req_types, + std::unordered_map *arg_shape_map, + std::unordered_map *arg_dtype_map, + std::unordered_map *arg_stype_map, + std::unordered_map *params_map) { + std::unordered_set to_remove_params; + for (auto& el : *params_map) { +to_remove_params.insert(el.first); + } + + DFSVisit(g.outputs, [_remove_params](const nnvm::NodePtr n) { +to_remove_params.erase(n->attrs.name); + }); + + for (auto& el : to_remove_params) { +params_map->erase(el); +arg_shape_map->erase(el); +arg_dtype_map->erase(el); +arg_stype_map->erase(el); + } + const auto = g.indexed_graph(); + num_forward_inputs_ = idx.input_nodes().size(); + in_arg_ctxes->resize(num_forward_inputs_ - idx.mutable_input_nodes().size()); Review comment: @zheng-da Consider any network, such as VGG, ResNet, etc. For any subgraph that is extracted by the TensorRT pass, the weights need to be provided to TensorRT at TensorRT engine construction time. These weights then become "baked into" the engine. Once the subgraph is substituted by a TensorRT node, these graph inputs become part of the TensorRT engine and are no longer used by the NNVM graph explicitly. Hence, they need to be removed, in order not to waste memory, and to prevent the confusion where some inputs still exist in the NNVM graph, but are not used anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205860313 ## File path: src/executor/graph_executor.cc ## @@ -941,6 +970,114 @@ void GraphExecutor::FinishInitGraph(nnvm::Symbol symbol, this->InitOpSegs(); } +/*! + * \brief This function is triggered after each tensorrt subgraph replacement pass. + * Reset arguments of GraphExecutor::Init(...) as some variables (weights and biases) + * are absorbed into the TRT engine it also it rerun attributes inferences accordingly + * to the new topology. + */ +Graph GraphExecutor::ReinitGraph(Graph&& g, const Context _ctx, + const std::map _map, + std::vector *in_arg_ctxes, + std::vector *arg_grad_ctxes, + std::vector *aux_state_ctxes, + std::vector *grad_req_types, + std::unordered_map *arg_shape_map, + std::unordered_map *arg_dtype_map, + std::unordered_map *arg_stype_map, + std::unordered_map *params_map) { + std::unordered_set to_remove_params; + for (auto& el : *params_map) { +to_remove_params.insert(el.first); + } + + DFSVisit(g.outputs, [_remove_params](const nnvm::NodePtr n) { +to_remove_params.erase(n->attrs.name); + }); + + for (auto& el : to_remove_params) { +params_map->erase(el); +arg_shape_map->erase(el); +arg_dtype_map->erase(el); +arg_stype_map->erase(el); + } + const auto = g.indexed_graph(); + num_forward_inputs_ = idx.input_nodes().size(); + in_arg_ctxes->resize(num_forward_inputs_ - idx.mutable_input_nodes().size()); Review comment: @zheng-da Consider any network, such as VGG, ResNet, etc. For any subgraph that is extracted by the TensorRT pass, the weights need to be provided to TensorRT at TensorRT engine construction time. These weights then become "baked into" the engine. Once the subgraph is substituted by a TensorRT node, these graph inputs become part of the TensorRT engine and are no longer used by the NNVM graph explicitly. Hence, they need to be removed, in order not to waste memory, and to prevent the confusion where some inputs still exists in the NNVM graph, but are not used anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205860313 ## File path: src/executor/graph_executor.cc ## @@ -941,6 +970,114 @@ void GraphExecutor::FinishInitGraph(nnvm::Symbol symbol, this->InitOpSegs(); } +/*! + * \brief This function is triggered after each tensorrt subgraph replacement pass. + * Reset arguments of GraphExecutor::Init(...) as some variables (weights and biases) + * are absorbed into the TRT engine it also it rerun attributes inferences accordingly + * to the new topology. + */ +Graph GraphExecutor::ReinitGraph(Graph&& g, const Context _ctx, + const std::map _map, + std::vector *in_arg_ctxes, + std::vector *arg_grad_ctxes, + std::vector *aux_state_ctxes, + std::vector *grad_req_types, + std::unordered_map *arg_shape_map, + std::unordered_map *arg_dtype_map, + std::unordered_map *arg_stype_map, + std::unordered_map *params_map) { + std::unordered_set to_remove_params; + for (auto& el : *params_map) { +to_remove_params.insert(el.first); + } + + DFSVisit(g.outputs, [_remove_params](const nnvm::NodePtr n) { +to_remove_params.erase(n->attrs.name); + }); + + for (auto& el : to_remove_params) { +params_map->erase(el); +arg_shape_map->erase(el); +arg_dtype_map->erase(el); +arg_stype_map->erase(el); + } + const auto = g.indexed_graph(); + num_forward_inputs_ = idx.input_nodes().size(); + in_arg_ctxes->resize(num_forward_inputs_ - idx.mutable_input_nodes().size()); Review comment: @zheng-da Consider any network, such as VGG, ResNet, etc. For any subgraph that is extracted by the TensorRT pass, the weights need to be provided to TensorRT at TensorRT engine construction time. These weights then become "baked in" the engine. Once the subgraph is substituted by a TensorRT node, these graph inputs become part of the TensorRT engine and are no longer used by the NNVM graph explicitly. Hence, they need to be removed, in order not to waste memory, and to prevent the confusion where some inputs still exists in the NNVM graph, but are not used anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc commented on issue #11685: test_executor.test_bind has fixed seed that can mask flakiness
kalyc commented on issue #11685: test_executor.test_bind has fixed seed that can mask flakiness URL: https://github.com/apache/incubator-mxnet/issues/11685#issuecomment-408499091 Able to reproduce flaky test error - ``` def test_bind(): def check_bind(disable_bulk_exec): if disable_bulk_exec: prev_bulk_inf_val = mx.test_utils.set_env_var("MXNET_EXEC_BULK_EXEC_INFERENCE", "0", "1") prev_bulk_train_val = mx.test_utils.set_env_var("MXNET_EXEC_BULK_EXEC_TRAIN", "0", "1") nrepeat = 10 maxdim = 4 for repeat in range(nrepeat): for dim in range(1, maxdim): check_bind_with_uniform(lambda x, y: x + y, lambda g, x, y: (g, g), dim) check_bind_with_uniform(lambda x, y: x - y, lambda g, x, y: (g, -g), dim) check_bind_with_uniform(lambda x, y: x * y, lambda g, x, y: (y * g, x * g), dim) check_bind_with_uniform(lambda x, y: x / y, lambda g, x, y: (g / y, -x * g/ (y**2)), dim) check_bind_with_uniform(lambda x, y: np.maximum(x, y), lambda g, x, y: (g * (x>y), g * (y>x)), dim, sf=mx.symbol.maximum) check_bind_with_uniform(lambda x, y: np.minimum(x, y), lambda g, x, y: (g * (x check_bind(True) test_executor.py:117: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_executor.py:108: in check_bind sf=mx.symbol.maximum) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ uf = at 0x1078889b0>, gf = at 0x10e2852a8>, dim = 3, sf = , lshape = (3, 1, 1), rshape = (3, 1, 1) def check_bind_with_uniform(uf, gf, dim, sf=None, lshape=None, rshape=None): """check function consistency with uniform random numbers""" shape = tuple(np.random.randint(1, int(1000**(1.0/dim)), size=dim)) lhs = mx.symbol.Variable('lhs') rhs = mx.symbol.Variable('rhs') if sf is not None: ret = sf(lhs, rhs) else: ret = uf(lhs, rhs) assert ret.list_arguments() == ['lhs', 'rhs'] lshape = shape if lshape is None else lshape rshape = shape if rshape is None else rshape lhs_arr = mx.nd.array(np.random.uniform(-1, 1, lshape)) rhs_arr = mx.nd.array(np.random.uniform(-1, 1, rshape)) lhs_grad = mx.nd.empty(lshape) rhs_grad = mx.nd.empty(rshape) executor = ret.bind(mx.Context('cpu'), args=[lhs_arr, rhs_arr], args_grad=[lhs_grad, rhs_grad]) exec3 = ret.bind(mx.Context('cpu'), args=[lhs_arr, rhs_arr]) exec4 = ret.bind(mx.Context('cpu'), args={'rhs': rhs_arr, 'lhs': lhs_arr}, args_grad={'lhs': lhs_grad, 'rhs': rhs_grad}) executor.forward() exec3.forward() exec4.forward() out2 = executor.outputs[0].asnumpy() out1 = uf(lhs_arr.asnumpy(), rhs_arr.asnumpy()) out3 = exec3.outputs[0].asnumpy() out4 = exec4.outputs[0].asnumpy() assert reldiff(out1, out2) < 1e-6 assert reldiff(out1, out3) < 1e-6 assert reldiff(out1, out4) < 1e-6 # test gradient out_grad = mx.nd.array(np.ones(out2.shape)) lhs_grad2, rhs_grad2 = gf(out_grad.asnumpy(), lhs_arr.asnumpy(), rhs_arr.asnumpy()) executor.backward([out_grad]) > assert reldiff(lhs_grad.asnumpy(), lhs_grad2) < 1e-6 E assert nan < 1e-06 E+ where nan = reldiff(array([[[ 0.]],\n\n [[ 0.]],\n\n [[ 0.]]], dtype=float32), array([[[ 0.]],\n\n [[ 0.]],\n\n [[ 0.]]], dtype=float32)) E+where array([[[ 0.]],\n\n [[ 0.]],\n\n [[ 0.]]], dtype=float32) = >() E
[GitHub] safrooze commented on issue #11906: support 1D and 3D arrays in MKLDNN.
safrooze commented on issue #11906: support 1D and 3D arrays in MKLDNN. URL: https://github.com/apache/incubator-mxnet/issues/11906#issuecomment-408498543 3D tensors are common in audio signals, e.g. WaveNet. I specifically ran into this issue with WaveNet inference on CPU. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
zheng-da commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205852327 ## File path: src/executor/graph_executor.cc ## @@ -1018,20 +1156,49 @@ void GraphExecutor::Init(nnvm::Symbol symbol, g.GetAttr("storage_type")); } + if (use_tensorrt_) { + #if MXNET_USE_TENSORRT + // check that this graph is inference-only + if (std::any_of(grad_req_types->begin(), grad_req_types->end(), +[](const OpReqType& op){return op != kNullOp;})) { + LOG(FATAL) << "MXNET_USE_TENSORRT set but graph is not inference-only. " +<< "If it is an inference graph, set grad_req to null during simple_bind call. " +<< "If it is a training graph, unset the MXNET_USE_TENSORRT env variable"; + } + if (shared_buffer->empty()) { +LOG(FATAL) << "MXNET_USE_TENSORRT = 1 but shared_buffer is empty." + << "Please provide weights and other parameters, such as " + << "BatchNorm moments, via the shared_buffer, during simple bind call."; + } + auto trt_groups = GetTrtCompatibleSubsets(g, shared_buffer); + for (auto trt_group : trt_groups) { +if (trt_group.size() > 1) { + g = ReplaceSubgraph(std::move(g), trt_group, shared_buffer); + g = ReinitGraph(std::move(g), default_ctx, ctx_map, in_arg_ctxes, arg_grad_ctxes, + aux_state_ctxes, grad_req_types, arg_shape_map, arg_dtype_map, + arg_stype_map, shared_buffer); Review comment: why do you need to reinit the graph whenever a subgraph is replaced? Can you reinit outside the loop? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration
zheng-da commented on a change in pull request #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205851779 ## File path: src/executor/graph_executor.cc ## @@ -941,6 +970,114 @@ void GraphExecutor::FinishInitGraph(nnvm::Symbol symbol, this->InitOpSegs(); } +/*! + * \brief This function is triggered after each tensorrt subgraph replacement pass. + * Reset arguments of GraphExecutor::Init(...) as some variables (weights and biases) + * are absorbed into the TRT engine it also it rerun attributes inferences accordingly + * to the new topology. + */ +Graph GraphExecutor::ReinitGraph(Graph&& g, const Context _ctx, + const std::map _map, + std::vector *in_arg_ctxes, + std::vector *arg_grad_ctxes, + std::vector *aux_state_ctxes, + std::vector *grad_req_types, + std::unordered_map *arg_shape_map, + std::unordered_map *arg_dtype_map, + std::unordered_map *arg_stype_map, + std::unordered_map *params_map) { + std::unordered_set to_remove_params; + for (auto& el : *params_map) { +to_remove_params.insert(el.first); + } + + DFSVisit(g.outputs, [_remove_params](const nnvm::NodePtr n) { +to_remove_params.erase(n->attrs.name); + }); + + for (auto& el : to_remove_params) { +params_map->erase(el); +arg_shape_map->erase(el); +arg_dtype_map->erase(el); +arg_stype_map->erase(el); + } + const auto = g.indexed_graph(); + num_forward_inputs_ = idx.input_nodes().size(); + in_arg_ctxes->resize(num_forward_inputs_ - idx.mutable_input_nodes().size()); Review comment: Why does the number of inputs to a graph change? When partitioning a graph, do you put some inputs inside a subgraph and not exposed in the main graph? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aaronmarkham opened a new pull request #11917: update home page for 1.2.1 announcement
aaronmarkham opened a new pull request #11917: update home page for 1.2.1 announcement URL: https://github.com/apache/incubator-mxnet/pull/11917 ## Description ## Modifies the 1.2.0 branch to have the updated announcement. Also there's a url fix for 'why_mxnet' in there too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on issue #11664: Fall back when sparse arrays are passed to MKLDNN-enabled operators
zheng-da commented on issue #11664: Fall back when sparse arrays are passed to MKLDNN-enabled operators URL: https://github.com/apache/incubator-mxnet/pull/11664#issuecomment-408488135 Please benchmark the performance with this modification to make sure there isn't performance regression. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on a change in pull request #11664: Fall back when sparse arrays are passed to MKLDNN-enabled operators
zheng-da commented on a change in pull request #11664: Fall back when sparse arrays are passed to MKLDNN-enabled operators URL: https://github.com/apache/incubator-mxnet/pull/11664#discussion_r205847059 ## File path: src/operator/nn/activation.cc ## @@ -128,17 +130,25 @@ inline static bool BackwardActStorageType(const nnvm::NodeAttrs& attrs, bool ret = false; const ActivationParam& param = nnvm::get(attrs.parsed); #if (MXNET_USE_CUDNN == 1 || MXNET_USE_MKLDNN == 1) - if (param.act_type != activation::kReLU) { -CHECK_EQ(in_attrs->size(), 3U); -ret = ElemwiseStorageType<3, 1, false, false, false>(attrs, dev_mask, - dispatch_mode, - in_attrs, out_attrs); + bool should_continue = true; +#if MXNET_USE_MKLDNN == 1 + if (!(dev_mask == mshadow::cpu::kDevMask && SupportMKLDNNAct(param))) { +should_continue = false; + } +#endif + if (should_continue) { +if (param.act_type != activation::kReLU) { + CHECK_EQ(in_attrs->size(), 3U); + ret = ElemwiseStorageType<3, 1, false, false, false>( + attrs, dev_mask, dispatch_mode, in_attrs, out_attrs); Review comment: I still have the same question here. dispatch_mode can be kFComputeEx? ElemwiseStorageType only uses kFComputeEx for sparse storage type. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ssttevee commented on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered
ssttevee commented on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered URL: https://github.com/apache/incubator-mxnet/issues/11914#issuecomment-408487363 Sorry, I meant to put that it doesn't crash with higher a data length like `--data_length=100 --batch_size=2`. The crashes don't seem to have a direct of correlation the actual data length. It wouldn't make much sense for it to be a memory limit anyways, since none of the parameters gets anywhere close to 1 GB of memory where as both my GPUs have well over 1 GB of memory. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on issue #11906: support 1D and 3D arrays in MKLDNN.
zheng-da commented on issue #11906: support 1D and 3D arrays in MKLDNN. URL: https://github.com/apache/incubator-mxnet/issues/11906#issuecomment-408485145 I know mkldnn has 3D arrays, but not all mkldnn operators support 3D arrays. I believe the current implementation of mkldnn integration only allows 2D and 4D arrays. For example, mkldnn convolution only supports 2D kernel on 4D arrays. Currently, 1D convolution actually calls the native implementation. But if we add a fake dim to turn 1D conv into 2D conv, we can make substantial speedup. I think this applies to many other operators. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ptrendx commented on issue #11886: Improve error message of cudnn operators
ptrendx commented on issue #11886: Improve error message of cudnn operators URL: https://github.com/apache/incubator-mxnet/pull/11886#issuecomment-408484982 The thing is by default there is a limit on workspace that convolution may take (see here: http://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.Convolution `workspace` parameter, not sure how it is specified in Gluon) and sometimes you may have enough GPU memory but the workspace limit prevents choosing the algo anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #11916: [MXNET-371] Sphinx error reduction
szha commented on issue #11916: [MXNET-371] Sphinx error reduction URL: https://github.com/apache/incubator-mxnet/pull/11916#issuecomment-408484837 AWESOME! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration
mkolod commented on issue #11325: [MXNET-703] TensorRT runtime integration URL: https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408482195 @Roshrini It is in my opinion, but whether it is according to the committers, I assume that depends on what @piiswrong thinks, right? He asked the following question 2 days ago [here](https://github.com/apache/incubator-mxnet/pull/11325#pullrequestreview-140440732), and I answered it [here](https://github.com/apache/incubator-mxnet/pull/11325#issuecomment-408247485) in English and [here](https://github.com/mkolod/incubator-mxnet/commit/eebd373f0a8c863b96e7211311e50f6aa2ce9f13) in code. Whether this satisfies the MXNet committers is something I cannot answer, please check with @piiswrong. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aaronmarkham commented on issue #11916: [MXNET-371] Sphinx error reduction
aaronmarkham commented on issue #11916: [MXNET-371] Sphinx error reduction URL: https://github.com/apache/incubator-mxnet/pull/11916#issuecomment-408479831 @ThomasDelteil @thomelane @sandeep-krishnamurthy @kevinthesun @piiswrong @mli @nswamy @marcoabreu - you all use this, or have used it... please let me know if you have any suggestions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] haojin2 commented on issue #11886: Improve error message of cudnn operators
haojin2 commented on issue #11886: Improve error message of cudnn operators URL: https://github.com/apache/incubator-mxnet/pull/11886#issuecomment-408479199 @ptrendx How does the following message look to you? ``` N algorithms with minimum memory requirement M bytes have been tried. There are only X bytes available workspace on your GPU, please consider reduce the batch size or model size. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piyushghai edited a comment on issue #11626: [MXNET-651] MXNet Model Backwards Compatibility Checker
piyushghai edited a comment on issue #11626: [MXNET-651] MXNet Model Backwards Compatibility Checker URL: https://github.com/apache/incubator-mxnet/pull/11626#issuecomment-408271045 @marcoabreu The Jenkins CI [1] build is giving an error on : the import statement for MXNet under the Inference Stage of the JenkinsFileForMBCC. I'm not able to figure out why that's happening. Can you have a look at this 69843fbe4d6669c135d3ae85aa56df144bc6c076 and give a second eye opinion ? [1] : http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/test-backwards-compatibility-checker/detail/test-backwards-compatibility-checker/12/pipeline/4/ Edit --> fae44fe22e322d928fa735968476d38cfbf26e62 seems to have fixed the issue. [2] @marcoabreu We now need to add the IAM User policy for S3 bucket access. [2] : http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/test-backwards-compatibility-checker/detail/test-backwards-compatibility-checker/13/pipeline/4 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #7725: accuracy of cpp example is a constant value when training, no matter how many epochs trained!
sandeep-krishnamurthy closed issue #7725: accuracy of cpp example is a constant value when training, no matter how many epochs trained! URL: https://github.com/apache/incubator-mxnet/issues/7725 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy commented on issue #7725: accuracy of cpp example is a constant value when training, no matter how many epochs trained!
sandeep-krishnamurthy commented on issue #7725: accuracy of cpp example is a constant value when training, no matter how many epochs trained! URL: https://github.com/apache/incubator-mxnet/issues/7725#issuecomment-408477173 Resolving in favor of #8551 . Please reopen if issue still persists. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #11911: Illegal instruction (core dumped)
sandeep-krishnamurthy closed issue #11911: Illegal instruction (core dumped) URL: https://github.com/apache/incubator-mxnet/issues/11911 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy commented on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered
sandeep-krishnamurthy commented on issue #11914: NDArray.asscalar(): CUDA an illegal memory access was encountered URL: https://github.com/apache/incubator-mxnet/issues/11914#issuecomment-408476538 Since it is directly correlating with data length, looks like GPU ran out of memory? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #11915: The behavior of np.zeros_like(x) if x is a NDArray is unexpected.
sandeep-krishnamurthy closed issue #11915: The behavior of np.zeros_like(x) if x is a NDArray is unexpected. URL: https://github.com/apache/incubator-mxnet/issues/11915 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] haojin2 commented on issue #11900: Re-enabling randomized test_l2_normalization
haojin2 commented on issue #11900: Re-enabling randomized test_l2_normalization URL: https://github.com/apache/incubator-mxnet/pull/11900#issuecomment-408474433 @rahul003 It was added before my PR for support for fp16 merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services