[GitHub] marcoabreu commented on issue #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
marcoabreu commented on issue #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#issuecomment-396406745 @ThomasDelteil you can add them after the nightly tests have been migrated This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
marcoabreu commented on issue #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#issuecomment-39640 ??? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azai91 opened a new pull request #11232: [MXNET-498] Test MKLDNN backward operators
azai91 opened a new pull request #11232: [MXNET-498] Test MKLDNN backward operators URL: https://github.com/apache/incubator-mxnet/pull/11232 ## Description ## add backwards C++ unit tests for mkldnn operators ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] add unit test for backwards copy - [ ] add unit test for backwards act ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 opened a new pull request #11233: [MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version #11076
anirudh2290 opened a new pull request #11233: [MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version #11076 URL: https://github.com/apache/incubator-mxnet/pull/11233 Cherry picking the depthwise convolution perf improvement to 1.2 branch. @haojin2 did some experiments here: https://github.com/apache/incubator-mxnet/pull/11076 with mobilenet + imagenet dataset for multi-precision training and the throughput should be improving by 10x. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc commented on issue #11222: Bug in gluon.block.Block.LoadParams
kalyc commented on issue #11222: Bug in gluon.block.Block.LoadParams URL: https://github.com/apache/incubator-mxnet/issues/11222#issuecomment-396417348 @rocketbear Thanks for opening this issue. I'm labeling it so MXNet community members can help resolve it. @sandeep-krishnamurthy could you add label "bug" to this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lupesko commented on issue #8014: cross compile mxnet for android, without using Amalgamation?
lupesko commented on issue #8014: cross compile mxnet for android, without using Amalgamation? URL: https://github.com/apache/incubator-mxnet/issues/8014#issuecomment-396420268 @sandeep-krishnamurthy can you please add "Android" and "Build" labels, and remove the "Need Triage" label? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhreshold opened a new pull request #11235: fix loading params if ignore_extra is set
zhreshold opened a new pull request #11235: fix loading params if ignore_extra is set URL: https://github.com/apache/incubator-mxnet/pull/11235 ## Description ## Minor fix ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong opened a new pull request #11236: Improve hybridblock doc
piiswrong opened a new pull request #11236: Improve hybridblock doc URL: https://github.com/apache/incubator-mxnet/pull/11236 ## Description ## (Brief description on what this PR is about) ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ctcyang opened a new pull request #11237: Bring back MXNET_GPU_COPY_NTHREADS env variable
ctcyang opened a new pull request #11237: Bring back MXNET_GPU_COPY_NTHREADS env variable URL: https://github.com/apache/incubator-mxnet/pull/11237 ## Description ## (Brief description on what this PR is about) ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [x] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [x] Changes are complete (i.e. I finished coding on this PR) - [x] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [x] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [x] Restores MXNET_GPU_COPY_NTHREADS environmental variable - [x] This variable has been set to MXNET_GPU_WORKER_THREADS (default = 2) since MXNet v0.10.0, so env_var doc is changed to reflect that ## Comments ## - This MXNET_GPU_COPY_NTHREADS environmental variable was (accidentally) removed in commit 3b517f1 with the value taking the value of MXNET_GPU_WORKER_NTHREADS. Then, shortly afterwards the member variable was itself removed in this PR: https://github.com/apache/incubator-mxnet/pull/6773. Since then, it has been set to MXNET_GPU_WORKER_NTHREADS, which is 2. - Since this variable controls how many CUDA streams are launched in order to do copying from GPU to GPU, it might be good to bring it back so the user has another knob they can use to tune single-machine multi-GPU training. - Some experimental evidence that 2 may be better than 1. From running `tools/bandwidth/measure.py` on ResNet-50. ``` export MXNET_GPU_COPY_NTHREADS=1 INFO:root:iter 1, 0.033930 sec, 5.261591 GB/sec per gpu, error 0.00 INFO:root:iter 2, 0.033478 sec, 5.332636 GB/sec per gpu, error 0.00 INFO:root:iter 3, 0.044507 sec, 4.011239 GB/sec per gpu, error 0.00 INFO:root:iter 4, 0.034598 sec, 5.159997 GB/sec per gpu, error 0.00 INFO:root:iter 5, 0.038231 sec, 4.669686 GB/sec per gpu, error 0.00 export MXNET_GPU_COPY_NTHREADS=2 INFO:root:iter 1, 0.031341 sec, 5.696318 GB/sec per gpu, error 0.00 INFO:root:iter 2, 0.031575 sec, 5.654080 GB/sec per gpu, error 0.00 INFO:root:iter 3, 0.035124 sec, 5.082800 GB/sec per gpu, error 0.00 INFO:root:iter 4, 0.030341 sec, 5.884048 GB/sec per gpu, error 0.00 INFO:root:iter 5, 0.032198 sec, 5.544681 GB/sec per gpu, error 0.00 export MXNET_GPU_COPY_NTHREADS=4 INFO:root:iter 1, 0.046252 sec, 3.859864 GB/sec per gpu, error 0.00 INFO:root:iter 2, 0.034504 sec, 5.174116 GB/sec per gpu, error 0.00 INFO:root:iter 3, 0.039854 sec, 4.479529 GB/sec per gpu, error 0.00 INFO:root:iter 4, 0.034815 sec, 5.127841 GB/sec per gpu, error 0.00 INFO:root:iter 5, 0.034434 sec, 5.184613 GB/sec per gpu, error 0.00 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #10931: [MXNET-349] Histogram Operator
piiswrong commented on issue #10931: [MXNET-349] Histogram Operator URL: https://github.com/apache/incubator-mxnet/pull/10931#issuecomment-396427405 So symbolic is not supported? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#discussion_r194585947 ## File path: tests/jenkins/run_test_installation_docs.sh ## @@ -299,28 +317,53 @@ LINUX_PYTHON_GPU_END_LINENO=$(grep -n "END - Linux Python GPU Installation Instr set_instruction_set ${LINUX_PYTHON_GPU_START_LINENO} ${LINUX_PYTHON_GPU_END_LINENO} -# mxnet/base-cuda9 is a simple Docker Image with 'nvidia/cuda:9.0-cudnn7-devel' and 'apt-get install sudo'. +ubuntu_python_gpu_virtualenv() +{ +#$WORDTOREMOVE +echo +echo "### Testing Virtualenv ###" +echo "${virtualenv_commands}" +echo +#virtualenv_commands=${virtualenv_commands//$WORDTOREMOVE/} Review comment: Can you elaborate please? update: Did you just mean the commented line? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeremiedb commented on issue #10928: Optimizers memory usage
jeremiedb commented on issue #10928: Optimizers memory usage URL: https://github.com/apache/incubator-mxnet/issues/10928#issuecomment-396429822 As an additional pointer, the issue seems tied to R's NDArray being set to read only after an initial mutation. https://github.com/apache/incubator-mxnet/blob/master/R-package/src/ndarray.h#L86 @tqchen ? Basic comparative example between R and Python: R: ``` > state <- mx.nd.array(c(1,2,3,3)) > grad <- mx.nd.array(1:4) > weight <- mx.nd.array(1:4) > mx.nd.sgd.mom.update(weight, grad, state, lr = 0.1, momentum = 0.5, out = weight) [1] 1.4 2.8 4.2 5.1 > mx.nd.sgd.mom.update(weight, grad, state, lr = 0.1, momentum = 0.5, out = weight) Error in mx.nd.sgd.mom.update(weight, grad, state, lr = 0.1, momentum = 0.5, : ./ndarray.h:87: RCheck failed: ptr_->writable && !ptr_->moved Passing a read only NDArray to mutate function ``` Python: ``` state = mx.nd.array([1, 2, 3, 3]) grad = mx.nd.array([1,2,3,4]) weight = mx.nd.array([1,2,3,4]) print(nd.sgd_mom_update(weight, grad, state, momentum = 0.5, lr=0.1, out=weight)) print(nd.sgd_mom_update(weight, grad, state, momentum = 0.5, lr=0.1, out=weight)) [1.4 2.8 4.2 5.1] [1.5 3. 4.5 5.25] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
marcoabreu commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#discussion_r194587572 ## File path: tests/jenkins/run_test_installation_docs.sh ## @@ -299,28 +317,53 @@ LINUX_PYTHON_GPU_END_LINENO=$(grep -n "END - Linux Python GPU Installation Instr set_instruction_set ${LINUX_PYTHON_GPU_START_LINENO} ${LINUX_PYTHON_GPU_END_LINENO} -# mxnet/base-cuda9 is a simple Docker Image with 'nvidia/cuda:9.0-cudnn7-devel' and 'apt-get install sudo'. +ubuntu_python_gpu_virtualenv() +{ +#$WORDTOREMOVE +echo +echo "### Testing Virtualenv ###" +echo "${virtualenv_commands}" +echo +#virtualenv_commands=${virtualenv_commands//$WORDTOREMOVE/} Review comment: It's commented out, so we probably don't need it anymore, right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
marcoabreu commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#discussion_r194587514 ## File path: ci/docker/runtime_functions.sh ## @@ -591,6 +591,65 @@ build_docs() { popd } + +# Functions that run the nightly Tests: + +#Runs Apache RAT Check on MXNet Source for License Headers +nightly_test_rat_check() { +set -ex +#This Test fails without changing permissions Review comment: This means that the file has not been committed with the correct file permissions. Try chmod 0755 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #11027: Add standard ResNet data augmentation for ImageRecordIter
piiswrong commented on a change in pull request #11027: Add standard ResNet data augmentation for ImageRecordIter URL: https://github.com/apache/incubator-mxnet/pull/11027#discussion_r194592109 ## File path: src/io/image_aug_default.cc ## @@ -296,7 +436,27 @@ class DefaultImageAugmenter : public ImageAugmenter { param_.data_shape[2], param_.data_shape[1], prnd); cv::resize(res(roi), res, cv::Size(param_.data_shape[2], param_.data_shape[1]) , 0, 0, interpolation_method); -} else { +} else if (!random_resized_crop_exec) { + if (res.rows < param_.data_shape[1]) { +index_t new_cols = static_cast(static_cast(param_.data_shape[1]) / +static_cast(res.rows) * +static_cast(res.cols)); +int interpolation_method = GetInterMethod(param_.inter_method, res.cols, res.rows, Review comment: Looks like the three calls to GetInterMethod can be extracted outside This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on a change in pull request #11220: [MXNET-244][MXNET-523][ARM] improvements to ARMv7 based builds.
larroy commented on a change in pull request #11220: [MXNET-244][MXNET-523][ARM] improvements to ARMv7 based builds. URL: https://github.com/apache/incubator-mxnet/pull/11220#discussion_r194592412 ## File path: ci/docker/Dockerfile.build.armv6 ## @@ -30,17 +30,24 @@ FROM dockcross/linux-armv6 # extract ccache binary into latest context COPY --from=ccachebuilder /usr/local/bin/ccache /usr/local/bin/ccache +RUN apt-get update +RUN apt-get install -y unzip + ENV ARCH armv6l +ENV FC=/usr/bin/${CROSS_TRIPLE}-gfortran Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on a change in pull request #11220: [MXNET-244][MXNET-523][ARM] improvements to ARMv7 based builds.
larroy commented on a change in pull request #11220: [MXNET-244][MXNET-523][ARM] improvements to ARMv7 based builds. URL: https://github.com/apache/incubator-mxnet/pull/11220#discussion_r194592405 ## File path: ci/docker/Dockerfile.build.armv7 ## @@ -18,25 +18,30 @@ # # Dockerfile to build MXNet for Android ARMv7 -FROM ubuntu:16.04 as ccachebuilder +#FROM ubuntu:16.04 as ccachebuilder -COPY install/ubuntu_core.sh /work/ -RUN /work/ubuntu_core.sh -COPY install/ubuntu_ccache.sh /work/ -RUN /work/ubuntu_ccache.sh +#COPY install/ubuntu_core.sh /work/ +#RUN /work/ubuntu_core.sh +#COPY install/ubuntu_ccache.sh /work/ +#RUN /work/ubuntu_ccache.sh FROM dockcross/linux-armv7 -# extract ccache binary into latest context -COPY --from=ccachebuilder /usr/local/bin/ccache /usr/local/bin/ccache +ENV ARCH armv7l +ENV HOSTCC gcc +ENV TARGET ARMV7 +ENV FC /usr/bin/${CROSS_TRIPLE}-gfortran Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hetong007 commented on a change in pull request #11027: Add standard ResNet data augmentation for ImageRecordIter
hetong007 commented on a change in pull request #11027: Add standard ResNet data augmentation for ImageRecordIter URL: https://github.com/apache/incubator-mxnet/pull/11027#discussion_r194592398 ## File path: src/io/image_aug_default.cc ## @@ -218,10 +265,96 @@ class DefaultImageAugmenter : public ImageAugmenter { res = src; } +if (param_.random_resized_crop) { + // random resize crop + CHECK(param_.min_random_scale == 1.0f && Review comment: @piiswrong do you think this check is good enough? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] solin319 commented on issue #11165: ImageIter much slower than ImageRecordIter
solin319 commented on issue #11165: ImageIter much slower than ImageRecordIter URL: https://github.com/apache/incubator-mxnet/issues/11165#issuecomment-396436698 Yes, and I think some users may used to use raw jpeg data like me. But I can't find a performance method to read. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hcho3 commented on issue #11209: [WIP] implement var operator
hcho3 commented on issue #11209: [WIP] implement var operator URL: https://github.com/apache/incubator-mxnet/pull/11209#issuecomment-396444124 @piiswrong I've added tests. The `var` operator works, but the accuracy is not that good. I suspect that squaring small floating-point values causes precision loss. Let me come up with an alternative implementation that does not involve squaring. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] HuichuanLiu commented on issue #11149: Unreasonable performance of resnext models provided in model_zoo, evaluated by score.py
HuichuanLiu commented on issue #11149: Unreasonable performance of resnext models provided in model_zoo, evaluated by score.py URL: https://github.com/apache/incubator-mxnet/issues/11149#issuecomment-396461022 Thanks @lanking520 And here're some updates: 1. My experiments shows resnet-152 restored from gluon model_zoo and from the module symbol files require different preprocess. I didn't find any clear description about this in mxnet docs and it will be nice if you can add it, it's quite confusing for the green hands like me. 2. I got a higher accuracy from gluon model, comparing to [these statistics](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/README.md). Is it another inconsistence between the module and the gluon model? Details: I replaced resnext-101 with resnet-152 in score.py and received acc=~0.765, exactly the same as the [doc shows](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) Then I repeated the same procedure, i.e. the same data and the same mx.io.RecordIter setting, but loaded the resnet-152 model with gluon API(), instead of the default module symbol files. ``` from mxnet.gluon.model_zoo.vision.resnet import get_resnet net = get_resnet(version=2, num_layers=152, pretrained=True, root='./', ctx=ctx[1]) ``` This leaded to broken predictions, it gives 916 after argmax for all samples, because of unnormalized input. Next I added a standard preprocess according to the [gluon model](http://mxnet.incubator.apache.org/versions/1.2.0/api/python/gluon/model_zoo.html) > All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferrably happen at preprocessing It takes the model to acc=0.773,about 0.012 higher than the [doc claims](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc commented on issue #8434: 'nnvm/c_api.h' file not found #include "nnvm/c_api.h"
kalyc commented on issue #8434: 'nnvm/c_api.h' file not found #include "nnvm/c_api.h" URL: https://github.com/apache/incubator-mxnet/issues/8434#issuecomment-396414016 Hello! Following up on this thread - @jinfagang has this issue been resolved? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on a change in pull request #10433: [MXNET-290] MKLDNN support for model quantization
zheng-da commented on a change in pull request #10433: [MXNET-290] MKLDNN support for model quantization URL: https://github.com/apache/incubator-mxnet/pull/10433#discussion_r194565318 ## File path: ci/docker/runtime_functions.sh ## @@ -382,7 +381,6 @@ unittest_ubuntu_python3_cpu() { #export MXNET_MKLDNN_DEBUG=1 # Ignored if not present export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0 nosetests-3.4 --verbose tests/python/unittest -nosetests-3.4 --verbose tests/python/quantization Review comment: i'm not sure if we should expose the backend to the users. ideally, we should switch backends directly. One thing we can do is to use context to switch backend. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy closed pull request #11186: Devel arm
larroy closed pull request #11186: Devel arm URL: https://github.com/apache/incubator-mxnet/pull/11186 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.gitignore b/.gitignore index d585672ab7d..416741a5e70 100644 --- a/.gitignore +++ b/.gitignore @@ -166,3 +166,7 @@ python/.eggs *DartConfiguration.tcl tests/Makefile tests/mxnet_unit_tests + +# generated wrappers for ccache +cc +cxx diff --git a/CMakeLists.txt b/CMakeLists.txt index e57c00b69e9..8a1765a0e67 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -321,14 +321,15 @@ endif() # ---[ OpenCV if(USE_OPENCV) - find_package(OpenCV QUIET COMPONENTS core highgui imgproc imgcodecs) + find_package(OpenCV COMPONENTS core highgui imgproc imgcodecs) if(NOT OpenCV_FOUND) # if not OpenCV 3.x, then imgcodecs are not found +message(STATUS "OpenCV imgcodecs missing") find_package(OpenCV REQUIRED COMPONENTS core highgui imgproc) endif() include_directories(SYSTEM ${OpenCV_INCLUDE_DIRS}) list(APPEND mxnet_LINKER_LIBS ${OpenCV_LIBS}) message(STATUS " OpenCV_LIBS=${OpenCV_LIBS}") - message(STATUS "OpenCV found (${OpenCV_CONFIG_PATH})") + message(STATUS "OpenCV ${OpenCV_VERSION} found (${OpenCV_CONFIG_PATH})") add_definitions(-DMXNET_USE_OPENCV=1) else(USE_OPENCV) message(STATUS "OpenCV Disabled") @@ -340,7 +341,11 @@ if(USE_OPENMP) find_package(OpenMP REQUIRED) # This should build on Windows, but there's some problem and I don't have a Windows box, so # could a Windows user please fix? - if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt AND SYSTEM_ARCHITECTURE STREQUAL "x86_64" AND NOT MSVC) + if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt + AND SYSTEM_ARCHITECTURE STREQUAL "x86_64" + AND NOT MSVC + AND NOT CMAKE_CROSSCOMPILING) + # Intel/llvm OpenMP: https://github.com/llvm-mirror/openmp set(OPENMP_STANDALONE_BUILD TRUE) set(LIBOMP_ENABLE_SHARED TRUE) @@ -648,7 +653,7 @@ if(USE_PLUGINS_WARPCTC) endif() -if(USE_OPENCV) +if(USE_OPENCV AND OpenCV_VERSION_MAJOR GREATER 2) add_executable(im2rec "tools/im2rec.cc") if(MSVC) target_link_libraries(im2rec mxnet) @@ -662,6 +667,9 @@ if(USE_OPENCV) ${nnvm_LINKER_LIBS} ${pslite_LINKER_LIBS} ) +else() +message(WARNING "OpenCV_VERSION_MAJOR: ${OpenCV_VERSION_MAJOR}, version 3 with imgcodecs \ +is required for im2rec, im2rec will not be available") endif() target_link_libraries(mxnet PUBLIC dmlc) diff --git a/Makefile b/Makefile index 03212841fa3..ff4446ab80c 100644 --- a/Makefile +++ b/Makefile @@ -477,7 +477,7 @@ endif $(PS_PATH)/build/libps.a: PSLITE PSLITE: - $(MAKE) CXX=$(CXX) DEPS_PATH=$(DEPS_PATH) -C $(PS_PATH) ps + $(MAKE) CXX="$(CXX)" DEPS_PATH="$(DEPS_PATH)" -C $(PS_PATH) ps $(DMLC_CORE)/libdmlc.a: DMLCCORE diff --git a/ci/README.md b/ci/README.md index 1c59a3af7c8..ca46434a30f 100644 --- a/ci/README.md +++ b/ci/README.md @@ -54,7 +54,7 @@ The artifacts are located in the build/ directory in the project root. In case ## Add a platform -To add a platform, you should add the appropiate dockerfile in +To add a platform, you should add the appropriate dockerfile in docker/Dockerfile.build. and add a shell function named build_ to the file docker/runtime_functions.sh with build instructions for that platform. @@ -63,3 +63,9 @@ instructions for that platform. Due to current limitations of the CMake build system creating artifacts in the source 3rdparty folder of the parent mxnet sources concurrent builds of different platforms is NOT SUPPORTED. + +## ccache +For all builds a directory from the host system is mapped where ccache will store cached +compiled object files (defaults to /tmp/ci_ccache). This will speed up rebuilds +significantly. You can set this directory explicitly by setting CCACHE_DIR environment +variable. All ccache instances are currently set to be 10 Gigabytes max in size. diff --git a/ci/build.py b/ci/build.py index e52fa794bc9..f4c1a3e8d99 100755 --- a/ci/build.py +++ b/ci/build.py @@ -33,13 +33,15 @@ import shutil import subprocess import sys +import tempfile from copy import deepcopy from itertools import chain from subprocess import call, check_call from typing import * +CCACHE_MAXSIZE = '10G' -def get_platforms(path: Optional[str]="docker"): +def get_platforms(path: Optional[str] = "docker"): """Get a list of architectures given our dockerfiles""" dockerfiles = glob.glob(os.path.join(path, "Dockerfile.build.*")) dockerfiles = list(filter(lambda x: x[-1] != '~', dockerfiles)) @@ -72,11 +74,11 @@ def build_docker(platform: str, docker_binary: str, registry: str) -> None: tag = get_docker_tag(platform=platform, registry=registry)
[GitHub] zheng-da commented on issue #11028: Pre-trained Shufflenet model fails during inference on mxnet-mkl==1.2.0
zheng-da commented on issue #11028: Pre-trained Shufflenet model fails during inference on mxnet-mkl==1.2.0 URL: https://github.com/apache/incubator-mxnet/issues/11028#issuecomment-396416255 @vrakesh could you check again if https://github.com/apache/incubator-mxnet/pull/11212 fixes the bug? I tried and it seems to work fine. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: [MXNET-62] add test against spark integration (#10462)
This is an automated email from the ASF dual-hosted git repository. liuyizhi pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new ed80ff2 [MXNET-62] add test against spark integration (#10462) ed80ff2 is described below commit ed80ff2c01ff54e82215bf03e8df942ea729a15e Author: Nan Zhu AuthorDate: Mon Jun 11 18:22:59 2018 -0700 [MXNET-62] add test against spark integration (#10462) * fix bug * temp * temp * temp * update * update * update * remove debugging stubs * remove unused * stylistic fix * fix typo * Pulled down update to submodule_dir * add test * retrigger it * sync 3rd party --- 3rdparty/ps-lite | 2 +- include/mxnet/kvstore.h| 1 + .../scala/org/apache/mxnet/optimizer/SGD.scala | 7 +- scala-package/pom.xml | 2 +- scala-package/spark/bin/run-mnist-example.sh | 9 +- scala-package/spark/pom.xml| 39 +- .../main/scala/org/apache/mxnet/spark/MXNet.scala | 7 +- .../scala/org/apache/mxnet/spark/MXNetParams.scala | 6 +- .../org/apache/mxnet/spark/ParameterServer.scala | 6 +- .../spark/example/ClassificationExample.scala | 1 + .../org/apache/mxnet/spark/MXNetGeneralSuite.scala | 69 ++ .../apache/mxnet/spark/SharedSparkContext.scala| 146 + src/kvstore/kvstore_dist.h | 3 +- src/kvstore/kvstore_dist_server.h | 3 +- 14 files changed, 282 insertions(+), 19 deletions(-) diff --git a/3rdparty/ps-lite b/3rdparty/ps-lite index a6dda54..8a76389 16 --- a/3rdparty/ps-lite +++ b/3rdparty/ps-lite @@ -1 +1 @@ -Subproject commit a6dda54604a07d1fb21b016ed1e3f4246b08222a +Subproject commit 8a763892a973afc1acd3d4b469d05bb338a83a6e diff --git a/include/mxnet/kvstore.h b/include/mxnet/kvstore.h index 4e99a9c..9e92207 100644 --- a/include/mxnet/kvstore.h +++ b/include/mxnet/kvstore.h @@ -229,6 +229,7 @@ class KVStore { CHECK(updater) << "invalid updater"; updater_ = updater; } + /*! * \brief set an updater with string keys * diff --git a/scala-package/core/src/main/scala/org/apache/mxnet/optimizer/SGD.scala b/scala-package/core/src/main/scala/org/apache/mxnet/optimizer/SGD.scala index c1b7259..e228e72 100644 --- a/scala-package/core/src/main/scala/org/apache/mxnet/optimizer/SGD.scala +++ b/scala-package/core/src/main/scala/org/apache/mxnet/optimizer/SGD.scala @@ -41,14 +41,15 @@ class SGD(val learningRate: Float = 0.01f, momentum: Float = 0.0f, */ override def update(index: Int, weight: NDArray, grad: NDArray, state: AnyRef): Unit = { // TODO(bing) implement wd_bias, wd_gamma, wd_beta (copy from python package) -var lr = - (if (lrScheduler != null) { +var lr = { + if (lrScheduler != null) { val scheduledLr = lrScheduler(numUpdate) updateCount(index) scheduledLr } else { this.learningRate - }) + } +} lr = getLr(index, lr) val wd = getWd(index, this.wd) diff --git a/scala-package/pom.xml b/scala-package/pom.xml index 9dcfa7c..cd5dba8 100644 --- a/scala-package/pom.xml +++ b/scala-package/pom.xml @@ -242,7 +242,7 @@ org.apache.maven.plugins maven-surefire-plugin -2.7 +2.19 true diff --git a/scala-package/spark/bin/run-mnist-example.sh b/scala-package/spark/bin/run-mnist-example.sh index 962c337..392d6c6 100755 --- a/scala-package/spark/bin/run-mnist-example.sh +++ b/scala-package/spark/bin/run-mnist-example.sh @@ -17,6 +17,8 @@ # specific language governing permissions and limitations # under the License. +set -x + CURR_DIR=$(cd `dirname $0`; pwd) SPARK_MODULE_DIR=$(cd $CURR_DIR/../; pwd) SCALA_PKG_DIR=$(cd $CURR_DIR/../../; pwd) @@ -35,10 +37,7 @@ SPARK_JAR=`find ${SPARK_MODULE_DIR}/target -name "*.jar" -type f -exec ls "{}" + SCALA_JAR=`find ${SCALA_PKG_DIR}/assembly/$OS/target -maxdepth 1 -name "*.jar" -type f -exec ls "{}" + | grep -v -E '(javadoc|sources)'` SPARK_OPTS+=" --name mxnet-spark-mnist" -SPARK_OPTS+=" --driver-memory 1g" -SPARK_OPTS+=" --executor-memory 1g" -SPARK_OPTS+=" --num-executors 2" -SPARK_OPTS+=" --executor-cores 1" +SPARK_OPTS+=" --driver-memory 2g" SPARK_OPTS+=" --jars ${SCALA_JAR}" # Download training and test set @@ -72,7 +71,7 @@ fi HOST=`hostname` -$SPARK_HOME/bin/spark-submit --master spark://$HOST:7077 \ +$SPARK_HOME/bin/spark-submit --master local[*] \ --class org.apache.mxnet.spark.example.ClassificationExample \ ${SPARK_OPTS} \ ${SPARK_JAR} \ diff --git a/scala-package/spark/pom.xml b/scala-package/spark/pom.xml index 281fad4..43ff1f7 100644 ---
[GitHub] larroy commented on issue #11226: Fix build.py when CCACHE_DIR is set.
larroy commented on issue #11226: Fix build.py when CCACHE_DIR is set. URL: https://github.com/apache/incubator-mxnet/pull/11226#issuecomment-396436042 @marcoabreu check it out for yourself, the caches are being rebuilt all the time: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11220/7/pipeline I re-enabled cache-from in this PR and it's rebuilding everything as it's happening to me locally. The behaviour I expect is with this flag disabled. I don't think this flag is documented well in docker, so I can't tell for sure we need it, but since you introduced it I never hit the cache. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on a change in pull request #11236: Improve hybridblock doc
szha commented on a change in pull request #11236: Improve hybridblock doc URL: https://github.com/apache/incubator-mxnet/pull/11236#discussion_r194606630 ## File path: docs/tutorials/gluon/hybrid.md ## @@ -105,23 +105,35 @@ Hybridize will speed up execution and save memory. If the top level layer is not a `HybridBlock`, you can still call `.hybridize()` on it and Gluon will try to hybridize its children layers instead. +`hybridize` also accepts many options for performance tuning. For example, you Review comment: many -> several? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #9844: Faulty bmod implementation
larroy commented on issue #9844: Faulty bmod implementation URL: https://github.com/apache/incubator-mxnet/issues/9844#issuecomment-396411300 duplicate of: https://github.com/apache/incubator-mxnet/issues/9853 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 closed pull request #11142: [MXNET-408] inplace ReLU activation (#10847)
anirudh2290 closed pull request #11142: [MXNET-408] inplace ReLU activation (#10847) URL: https://github.com/apache/incubator-mxnet/pull/11142 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/include/mxnet/base.h b/include/mxnet/base.h index 783002e6fa4..2619064a263 100644 --- a/include/mxnet/base.h +++ b/include/mxnet/base.h @@ -104,7 +104,7 @@ /*! \brief minor version */ #define MXNET_MINOR 2 /*! \brief patch version */ -#define MXNET_PATCH 0 +#define MXNET_PATCH 1 /*! \brief mxnet version */ #define MXNET_VERSION (MXNET_MAJOR*1 + MXNET_MINOR*100 + MXNET_PATCH) /*! \brief helper for making version number */ diff --git a/python/mxnet/libinfo.py b/python/mxnet/libinfo.py index 3220b5a3352..d8057c047a7 100644 --- a/python/mxnet/libinfo.py +++ b/python/mxnet/libinfo.py @@ -73,4 +73,4 @@ def find_lib_path(): # current version -__version__ = "1.2.0" +__version__ = "1.2.1" diff --git a/snapcraft.yaml b/snapcraft.yaml index 6aca20a4ebb..c10d54944b0 100644 --- a/snapcraft.yaml +++ b/snapcraft.yaml @@ -1,5 +1,5 @@ name: mxnet -version: '1.2.0' +version: '1.2.1' summary: MXNet is a deep learning framework designed for efficiency and flexibility. description: | MXNet is a deep learning framework designed for both efficiency and diff --git a/src/operator/nn/activation-inl.h b/src/operator/nn/activation-inl.h index 32a7a5ad617..a9f6dbeda89 100644 --- a/src/operator/nn/activation-inl.h +++ b/src/operator/nn/activation-inl.h @@ -83,7 +83,7 @@ struct hash { namespace mxnet { namespace op { -template +template void ActivationForward(const OpContext , const TBlob _data, const OpReqType , const TBlob _data) { using namespace mshadow; @@ -91,16 +91,16 @@ void ActivationForward(const OpContext , const TBlob _data, Stream *s = ctx.get_stream(); const size_t sz = in_data.shape_.Size(); if (sz) { -MXNET_ASSIGN_REQ_SWITCH(req, Req, { - mxnet_op::Kernel, xpu>::Launch( -s, sz, -out_data.dptr(), -in_data.dptr()); +MSHADOW_REAL_TYPE_SWITCH(in_data.type_flag_, DType, { + MXNET_ASSIGN_REQ_SWITCH(req, Req, { +mxnet_op::Kernel, xpu>::Launch( + s, sz, out_data.dptr(), in_data.dptr()); + }); }); } } -template +template void ActivationBackward(const OpContext , const TBlob _grad, const TBlob _data, const OpReqType , const TBlob _grad) { @@ -109,13 +109,12 @@ void ActivationBackward(const OpContext , const TBlob _grad, Stream *s = ctx.get_stream(); const size_t sz = out_data.shape_.Size(); if (sz) { -MXNET_ASSIGN_REQ_SWITCH(req, Req, { - mxnet_op::Kernel, Req>, xpu>::Launch( -s, sz, -in_grad.dptr(), -out_grad.dptr(), -out_data.dptr()); +MSHADOW_REAL_TYPE_SWITCH(out_grad.type_flag_, DType, { + MXNET_ASSIGN_REQ_SWITCH(req, Req, { +mxnet_op::Kernel, Req>, xpu>::Launch( +s, sz, in_grad.dptr(), out_grad.dptr(), out_data.dptr()); + }); }); } } @@ -123,72 +122,68 @@ void ActivationBackward(const OpContext , const TBlob _grad, template void ActivationComputeImpl(const ActivationParam , const OpContext , const TBlob , OpReqType req, const TBlob ) { - MSHADOW_REAL_TYPE_SWITCH(input.type_flag_, DType, { -switch (param.act_type) { - case activation::kReLU: -ActivationForward( -ctx, input, req, output); -break; - case activation::kSigmoid: -ActivationForward( -ctx, input, req, output); -break; - case activation::kTanh: -ActivationForward( -ctx, input, req, output); -break; - case activation::kSoftReLU: -ActivationForward( -ctx, input, req, output); -break; - case activation::kSoftSign: -ActivationForward( -ctx, input, req, output); -break; - default: -LOG(FATAL) << "unknown activation type"; -} - }); + switch (param.act_type) { +case activation::kReLU: + ActivationForward( + ctx, input, req, output); + break; +case activation::kSigmoid: + ActivationForward( + ctx, input, req, output); + break; +case activation::kTanh: + ActivationForward( + ctx, input, req, output); + break; +case activation::kSoftReLU: + ActivationForward( + ctx, input, req, output); + break; +case activation::kSoftSign: + ActivationForward( + ctx, input, req, output); + break; +default: + LOG(FATAL) << "unknown activation type"; + } } template void ActivationGradComputeImpl(const ActivationParam , const
[GitHub] zheng-da commented on issue #11212: cherry-pick bug fixes in MKLDNN for v1.2.0
zheng-da commented on issue #11212: cherry-pick bug fixes in MKLDNN for v1.2.0 URL: https://github.com/apache/incubator-mxnet/pull/11212#issuecomment-396416387 @anirudh2290 i have removed https://github.com/apache/incubator-mxnet/pull/10578 @pengzhao-intel it should work fine for https://github.com/apache/incubator-mxnet/issues/11028 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mbaijal opened a new pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
mbaijal opened a new pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827 ## Description ## @marcoabreu @gautamkmr Can you please review ! This PR does the following: 1. Added 2 new Jenkins pipeline jobs - one which runs tests on MXNet Source and other which runs tests on MXNet Binaries. 2. Several tests were running nightly on a local jenkins setup. I have modified these tests to the new format so they now run as a part of the new jenkinsfiles on the official CI 3. I have not modified any tests themselves but have only modified the design/format so that they run as a pipeline 4. This provides an easy mechanism for any one to add long running tests to run as a part of the nightly pipelines. While reviewing this code, you can look at this link where these pipelines are currently running on my local repo: http://jenkins.mxnet-ci.amazon-ml.com/view/Nightly%20Tests/ ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [x] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [x] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [x] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [x] Created a new Jenkinsfile to run nightly tests on source - [x] Created a new Jenkinsfile to run nightly tests on mxnet binaries - [x] Made some changes to a few test scripts to make them compatible with the new CI design (Image classification, compilation warnings, Installation guide etc) - [x] Added RAT check as a nightly test with a post run check on the logs - [x] Added new dockerfiles for some tests as needed (for example the JS test and Install guide) - [x] Added appropriate install scripts where needed for the newly added dockerfiles - [ ] Added Readmes/comments where necessary - [x] Fix Pip Install test - [ ] Add an instance with 4 GPUs and then uncomment the KVstore test ## Comments ## To the best of my knowledge, the 3 tests that I have changed are not being run elsewhere/ standalone and hence should not cause any regressions. To Do: 1. Need to remove pip test from the source pipeline - it is deprecated 2. There is a potential install issue that I need to test (build from source on a cpu only instance) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy commented on issue #11222: Bug in gluon.block.Block.LoadParams
sandeep-krishnamurthy commented on issue #11222: Bug in gluon.block.Block.LoadParams URL: https://github.com/apache/incubator-mxnet/issues/11222#issuecomment-396423636 Just summarizing the issue for simplifying: if name is not in params and ignore_extra=True This line is executed, which it should not-> params[name]._load_init(loaded[name], ctx) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on a change in pull request #11237: Bring back MXNET_GPU_COPY_NTHREADS env variable
rahul003 commented on a change in pull request #11237: Bring back MXNET_GPU_COPY_NTHREADS env variable URL: https://github.com/apache/incubator-mxnet/pull/11237#discussion_r194585918 ## File path: src/engine/threaded_engine_perdevice.cc ## @@ -194,6 +196,8 @@ class ThreadedEnginePerDevice : public ThreadedEngine { size_t cpu_worker_nthreads_; /*! \brief number of concurrent thread each gpu worker uses */ size_t gpu_worker_nthreads_; + /*! \brief number of concurrent thread each gpu copy worker uses */ + int gpu_copy_nthreads_; Review comment: Make this size_t as well to maintain uniformity This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] haojin2 commented on issue #10931: [MXNET-349] Histogram Operator
haojin2 commented on issue #10931: [MXNET-349] Histogram Operator URL: https://github.com/apache/incubator-mxnet/pull/10931#issuecomment-396427807 @piiswrong There's no backward pass for this function, I wonder if symbolic is necessary in this case? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#discussion_r194585947 ## File path: tests/jenkins/run_test_installation_docs.sh ## @@ -299,28 +317,53 @@ LINUX_PYTHON_GPU_END_LINENO=$(grep -n "END - Linux Python GPU Installation Instr set_instruction_set ${LINUX_PYTHON_GPU_START_LINENO} ${LINUX_PYTHON_GPU_END_LINENO} -# mxnet/base-cuda9 is a simple Docker Image with 'nvidia/cuda:9.0-cudnn7-devel' and 'apt-get install sudo'. +ubuntu_python_gpu_virtualenv() +{ +#$WORDTOREMOVE +echo +echo "### Testing Virtualenv ###" +echo "${virtualenv_commands}" +echo +#virtualenv_commands=${virtualenv_commands//$WORDTOREMOVE/} Review comment: Can you elaborate please? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #11027: Add standard ResNet data augmentation for ImageRecordIter
piiswrong commented on a change in pull request #11027: Add standard ResNet data augmentation for ImageRecordIter URL: https://github.com/apache/incubator-mxnet/pull/11027#discussion_r194591950 ## File path: src/io/image_aug_default.cc ## @@ -218,6 +257,80 @@ class DefaultImageAugmenter : public ImageAugmenter { res = src; } +if (param_.random_resized_crop) { Review comment: sanity checks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on issue #11226: Fix build.py when CCACHE_DIR is set.
larroy commented on issue #11226: Fix build.py when CCACHE_DIR is set. URL: https://github.com/apache/incubator-mxnet/pull/11226#issuecomment-396412245 @marcoabreu it doesn't work well for me with this cache-from, could you check? you introduced this flag and it was not working propperly for me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy closed pull request #11207: Devel android
larroy closed pull request #11207: Devel android URL: https://github.com/apache/incubator-mxnet/pull/11207 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.gitignore b/.gitignore index d585672ab7d..416741a5e70 100644 --- a/.gitignore +++ b/.gitignore @@ -166,3 +166,7 @@ python/.eggs *DartConfiguration.tcl tests/Makefile tests/mxnet_unit_tests + +# generated wrappers for ccache +cc +cxx diff --git a/CMakeLists.txt b/CMakeLists.txt index e57c00b69e9..92c59f33af4 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -15,7 +15,7 @@ mxnet_option(USE_NCCL "Use NVidia NCCL with CUDA" OFF) mxnet_option(USE_OPENCV "Build with OpenCV support" ON) mxnet_option(USE_OPENMP "Build with Openmp support" ON) mxnet_option(USE_CUDNN"Build with cudnn support" ON) # one could set CUDNN_ROOT for search path -mxnet_option(USE_SSE "Build with x86 SSE instruction support" ON) +mxnet_option(USE_SSE "Build with x86 SSE instruction support" ON IF NOT ARM) mxnet_option(USE_F16C "Build with x86 F16C instruction support" ON) # autodetects support if ON mxnet_option(USE_LAPACK "Build with lapack support" ON) mxnet_option(USE_MKL_IF_AVAILABLE "Use MKL if found" ON) @@ -321,14 +321,15 @@ endif() # ---[ OpenCV if(USE_OPENCV) - find_package(OpenCV QUIET COMPONENTS core highgui imgproc imgcodecs) + find_package(OpenCV COMPONENTS core highgui imgproc imgcodecs) if(NOT OpenCV_FOUND) # if not OpenCV 3.x, then imgcodecs are not found +message(STATUS "OpenCV imgcodecs missing") find_package(OpenCV REQUIRED COMPONENTS core highgui imgproc) endif() include_directories(SYSTEM ${OpenCV_INCLUDE_DIRS}) list(APPEND mxnet_LINKER_LIBS ${OpenCV_LIBS}) message(STATUS " OpenCV_LIBS=${OpenCV_LIBS}") - message(STATUS "OpenCV found (${OpenCV_CONFIG_PATH})") + message(STATUS "OpenCV ${OpenCV_VERSION} found (${OpenCV_CONFIG_PATH})") add_definitions(-DMXNET_USE_OPENCV=1) else(USE_OPENCV) message(STATUS "OpenCV Disabled") @@ -340,7 +341,11 @@ if(USE_OPENMP) find_package(OpenMP REQUIRED) # This should build on Windows, but there's some problem and I don't have a Windows box, so # could a Windows user please fix? - if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt AND SYSTEM_ARCHITECTURE STREQUAL "x86_64" AND NOT MSVC) + if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt + AND SYSTEM_ARCHITECTURE STREQUAL "x86_64" + AND NOT MSVC + AND NOT CMAKE_CROSSCOMPILING) + # Intel/llvm OpenMP: https://github.com/llvm-mirror/openmp set(OPENMP_STANDALONE_BUILD TRUE) set(LIBOMP_ENABLE_SHARED TRUE) @@ -360,7 +365,7 @@ if(USE_OPENMP) set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}") endif() endif() -elseif(UNIX) +elseif(UNIX AND NOT ANDROID) list(APPEND mxnet_LINKER_LIBS pthread) endif() @@ -648,7 +653,7 @@ if(USE_PLUGINS_WARPCTC) endif() -if(USE_OPENCV) +if(USE_OPENCV AND OpenCV_VERSION_MAJOR GREATER 2) add_executable(im2rec "tools/im2rec.cc") if(MSVC) target_link_libraries(im2rec mxnet) @@ -662,6 +667,9 @@ if(USE_OPENCV) ${nnvm_LINKER_LIBS} ${pslite_LINKER_LIBS} ) +else() +message(WARNING "OpenCV_VERSION_MAJOR: ${OpenCV_VERSION_MAJOR}, version 3 with imgcodecs \ +is required for im2rec, im2rec will not be available") endif() target_link_libraries(mxnet PUBLIC dmlc) diff --git a/Jenkinsfile b/Jenkinsfile index 28edda00959..61b63807703 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -458,6 +458,16 @@ try { } } } +}, +'Android / ARM64':{ + node('mxnetlinux-cpu') { +ws('workspace/android64') { + timeout(time: max_time, unit: 'MINUTES') { +init_git() +docker_run('android_arm64', 'build_android_arm64', false) + } +} + } } } // End of stage('Build') diff --git a/Makefile b/Makefile index 03212841fa3..ff4446ab80c 100644 --- a/Makefile +++ b/Makefile @@ -477,7 +477,7 @@ endif $(PS_PATH)/build/libps.a: PSLITE PSLITE: - $(MAKE) CXX=$(CXX) DEPS_PATH=$(DEPS_PATH) -C $(PS_PATH) ps + $(MAKE) CXX="$(CXX)" DEPS_PATH="$(DEPS_PATH)" -C $(PS_PATH) ps $(DMLC_CORE)/libdmlc.a: DMLCCORE diff --git a/ci/README.md b/ci/README.md index 1c59a3af7c8..ca46434a30f 100644 --- a/ci/README.md +++ b/ci/README.md @@ -54,7 +54,7 @@ The artifacts are located in the build/ directory in the project root. In case ## Add a platform -To add a platform, you should add the appropiate dockerfile in +To add a platform, you should add the appropriate dockerfile in docker/Dockerfile.build. and add a shell function named build_ to the file
[GitHub] haojin2 commented on issue #11076: [MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version
haojin2 commented on issue #11076: [MXNET-491] Use depthwise convolution by cuDNNv7 if available, updated version URL: https://github.com/apache/incubator-mxnet/pull/11076#issuecomment-396412093 Did some extra benchmarks and verified multi-precision training speed improvement on single V100 GPU with mobilenet + ImageNet dataset: before: INFO:root:Epoch[0] Batch [20]Speed: 95.60 samples/sec accuracy=0.013765 INFO:root:Epoch[0] Batch [40]Speed: 95.73 samples/sec accuracy=0.148047 INFO:root:Epoch[0] Batch [60]Speed: 95.73 samples/sec accuracy=0.865234 INFO:root:Epoch[0] Batch [80]Speed: 95.75 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [100] Speed: 95.72 samples/sec accuracy=1.00 after: INFO:root:Epoch[0] Batch [20]Speed: 1011.35 samples/sec accuracy=0.013765 INFO:root:Epoch[0] Batch [40]Speed: 1032.15 samples/sec accuracy=0.112109 INFO:root:Epoch[0] Batch [60]Speed: 1038.41 samples/sec accuracy=0.832812 INFO:root:Epoch[0] Batch [80]Speed: 1034.26 samples/sec accuracy=1.00 INFO:root:Epoch[0] Batch [100] Speed: 1032.14 samples/sec accuracy=1.00 @anirudh2290 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 opened a new pull request #11234: [MXNET-535] Add Warmup Learning Rate Scheduler and fix inconsistencies in LR Schedulers
rahul003 opened a new pull request #11234: [MXNET-535] Add Warmup Learning Rate Scheduler and fix inconsistencies in LR Schedulers URL: https://github.com/apache/incubator-mxnet/pull/11234 ## Description ## Adds warmup LR scheduler. Also fixes issues where base_lr is not taken by MultiFactorScheduler and FactorScheduler. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#discussion_r194585755 ## File path: docs/install/index.md ## @@ -84,7 +84,7 @@ $ wget https://bootstrap.pypa.io/get-pip.py && sudo python get-pip.py **Step 2** Install MXNet with OpenBLAS acceleration. ```bash -$ pip install mxnet +$ sudo pip install mxnet Review comment: @marcoabreu Comments about using `sudo -H` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#discussion_r194585654 ## File path: ci/docker/runtime_functions.sh ## @@ -591,6 +591,65 @@ build_docs() { popd } + +# Functions that run the nightly Tests: + +#Runs Apache RAT Check on MXNet Source for License Headers +nightly_test_rat_check() { +set -ex +#This Test fails without changing permissions Review comment: I get it for broken_link_checker too -> `./tests/nightly/broken_link_checker_test/broken_link_checker.sh: Permission denied` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudhacharya opened a new issue #11238: UX for ONNX Documentation is broken
anirudhacharya opened a new issue #11238: UX for ONNX Documentation is broken URL: https://github.com/apache/incubator-mxnet/issues/11238 ONNX documentation for [mxnet v1.2](http://mxnet.incubator.apache.org/versions/1.2.0/api/python/contrib/onnx.html#api-reference) has API Reference section populated. But the [docs page from master](http://mxnet.incubator.apache.org/versions/master/api/python/contrib/onnx.html#api-reference) has empty API reference. Creating this issue to track this. @aaronmarkham @spidyDev @Roshrini This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] liuzx32 commented on issue #11221: How to store a large distributed training model with KVStore?
liuzx32 commented on issue #11221: How to store a large distributed training model with KVStore? URL: https://github.com/apache/incubator-mxnet/issues/11221#issuecomment-396438516 @kalyc Thank you very much This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new bec3af9 Bump the publish timestamp. bec3af9 is described below commit bec3af9a9d8c5135391f2aef61802e4d5ce78f38 Author: mxnet-ci AuthorDate: Tue Jun 12 01:40:24 2018 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..cf9ea41 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Tue Jun 12 01:40:24 UTC 2018 -- To stop receiving notification emails like this one, please contact zhash...@apache.org.
[GitHub] anirudh2290 commented on issue #11127: add import_ for SymbolBlock
anirudh2290 commented on issue #11127: add import_ for SymbolBlock URL: https://github.com/apache/incubator-mxnet/pull/11127#issuecomment-396458073 Wasn't the final decision to provide deprecation warning for `save_params` and `load_params` and provide new api for `save_parameter` and `load_parameter` ? Can you please explain. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] idealboy commented on issue #8431: I can't figure out why ImageRecordIter use only 1 thread for decoding
idealboy commented on issue #8431: I can't figure out why ImageRecordIter use only 1 thread for decoding URL: https://github.com/apache/incubator-mxnet/issues/8431#issuecomment-396462694 I met the same problem too, I use openblas 0.2.20, too This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #11226: Fix build.py when CCACHE_DIR is set.
marcoabreu commented on issue #11226: Fix build.py when CCACHE_DIR is set. URL: https://github.com/apache/incubator-mxnet/pull/11226#issuecomment-396421777 Please elaborate more detailed what exactly does not work. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #62: Seed files for broken link checker job
marcoabreu commented on issue #62: Seed files for broken link checker job URL: https://github.com/apache/incubator-mxnet-site/pull/62#issuecomment-396430260 Could you elaborate what the output file is for? Please also add a README to the new folder. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] HuichuanLiu commented on issue #11149: Unreasonable performance of resnext models provided in model_zoo, evaluated by score.py
HuichuanLiu commented on issue #11149: Unreasonable performance of resnext models provided in model_zoo, evaluated by score.py URL: https://github.com/apache/incubator-mxnet/issues/11149#issuecomment-396461022 Thanks @lanking520 And here're some updates: 1. My experiments shows resnet-152 restored from gluon model_zoo and from the module symbol files require different preprocess. I didn't find any clear description about this in mxnet docs and it will be nice if you can add it, it's quite confusing for the green hands like me. 2. I got a higher accuracy from gluon model, comparing to [these statistics](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/README.md). Is it another inconsistence between the module and the gluon model? Details: I replaced resnext-101 with resnet-152 in score.py and received acc=~0.765, exactly the same as the [doc shows](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/README.md) Then I repeated the same procedure, i.e. the same data and the same mx.io.RecordIter setting, but loaded the resnet-152 model with gluon API(), instead of the default module symbol files. ``` from mxnet.gluon.model_zoo.vision.resnet import get_resnet net = get_resnet(version=2, num_layers=152, pretrained=True, root='./', ctx=ctx[1]) ``` This leaded to broken predictions, it gives 916 after argmax for all samples, because of unnormalized input. Next I added a standard preprocess according to the [gluon model](http://mxnet.incubator.apache.org/versions/1.2.0/api/python/gluon/model_zoo.html) > All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferrably happen at preprocessing It takes the model to acc=0.773,about 0.012 higher than the [doc claims](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] HuichuanLiu commented on issue #11149: Unreasonable performance of resnext models provided in model_zoo, evaluated by score.py
HuichuanLiu commented on issue #11149: Unreasonable performance of resnext models provided in model_zoo, evaluated by score.py URL: https://github.com/apache/incubator-mxnet/issues/11149#issuecomment-396461022 Thanks @lanking520 And here're some updates: 1. My experiments shows resnet-152 restored from gluon model_zoo and from the module symbol files require different preprocess. I didn't find any clear description about this in mxnet docs and it will be nice if you can add it, it's quite confusing for the green hands like me. 2. I got a higher accuracy from gluon model, comparing to [these statistics](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/README.md). Is it another inconsistence between the module and the gluon model? Or perhaps about the resnet version? Details: I replaced resnext-101 with resnet-152 in score.py and received acc=~0.765, exactly the same as the [doc shows](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/README.md) Then I repeated the same procedure, i.e. the same data and the same mx.io.RecordIter setting, but loaded the resnet-152 model with gluon API(), instead of the default module symbol files. ``` from mxnet.gluon.model_zoo.vision.resnet import get_resnet net = get_resnet(version=2, num_layers=152, pretrained=True, root='./', ctx=ctx[1]) ``` This leaded to broken predictions, it gives 916 after argmax for all samples, because of unnormalized input. Next I added a standard preprocess according to the [gluon model](http://mxnet.incubator.apache.org/versions/1.2.0/api/python/gluon/model_zoo.html) > All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferrably happen at preprocessing It takes the model to acc=0.773,about 0.012 higher than the [doc claims](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lupesko commented on issue #8453: [CoreML Converter] converted InceptionV3 failure
lupesko commented on issue #8453: [CoreML Converter] converted InceptionV3 failure URL: https://github.com/apache/incubator-mxnet/issues/8453#issuecomment-396419998 @sandeep-krishnamurthy can you also remove the "Need Triage" since this has been triaged? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong closed pull request #11112: Support for data iterators returning lists of batches
piiswrong closed pull request #2: Support for data iterators returning lists of batches URL: https://github.com/apache/incubator-mxnet/pull/2 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/example/image-classification/common/fit.py b/example/image-classification/common/fit.py index f5427feae2f..b646ef6edaa 100755 --- a/example/image-classification/common/fit.py +++ b/example/image-classification/common/fit.py @@ -159,8 +159,13 @@ def fit(args, network, data_loader, **kwargs): if args.test_io: tic = time.time() for i, batch in enumerate(train): -for j in batch.data: -j.wait_to_read() +if isinstance(batch, list): +for b in batch: +for j in b.data: +j.wait_to_read() +else: +for j in batch.data: +j.wait_to_read() if (i + 1) % args.disp_batches == 0: logging.info('Batch [%d]\tSpeed: %.2f samples/sec', i, args.disp_batches * args.batch_size / (time.time() - tic)) diff --git a/python/mxnet/executor_manager.py b/python/mxnet/executor_manager.py index 33c6c976271..825aa76e43c 100644 --- a/python/mxnet/executor_manager.py +++ b/python/mxnet/executor_manager.py @@ -286,10 +286,13 @@ def backward(self): for texec in self.train_execs: texec.backward() -def update_metric(self, metric, labels): +def update_metric(self, metric, labels, pre_sliced=False): """Update evaluation metric with label and current outputs.""" -for texec, islice in zip(self.train_execs, self.slices): -labels_slice = [label[islice] for label in labels] +for current_exec, (texec, islice) in enumerate(zip(self.train_execs, self.slices)): +if not pre_sliced: +labels_slice = [label[islice] for label in labels] +else: +labels_slice = labels[current_exec] metric.update(labels_slice, texec.outputs) class DataParallelExecutorManager(object): @@ -436,6 +439,6 @@ def backward(self): """Run backward on the current executor.""" self.curr_execgrp.backward() -def update_metric(self, metric, labels): +def update_metric(self, metric, labels, pre_sliced=False): """Update metric with the current executor.""" -self.curr_execgrp.update_metric(metric, labels) +self.curr_execgrp.update_metric(metric, labels, pre_sliced) diff --git a/python/mxnet/module/base_module.py b/python/mxnet/module/base_module.py index 8f5fd4ab854..4b7355ffa92 100644 --- a/python/mxnet/module/base_module.py +++ b/python/mxnet/module/base_module.py @@ -146,7 +146,8 @@ class BaseModule(object): - `get_outputs()`: get outputs of the previous forward operation. - `get_input_grads()`: get the gradients with respect to the inputs computed in the previous backward operation. -- `update_metric(metric, labels)`: update performance metric for the previous forward +- `update_metric(metric, labels, pre_sliced=False)`: update performance metric + for the previous forward computed results. - other properties (mostly for backward compatibility) @@ -249,7 +250,10 @@ def score(self, eval_data, eval_metric, num_batch=None, batch_end_callback=None, break self.prepare(eval_batch, sparse_row_id_fn=sparse_row_id_fn) self.forward(eval_batch, is_train=False) -self.update_metric(eval_metric, eval_batch.label) +if isinstance(eval_batch, list): +self.update_metric(eval_metric, [eb.label for eb in eval_batch], pre_sliced=True) +else: +self.update_metric(eval_metric, eval_batch.label) if batch_end_callback is not None: batch_end_params = BatchEndParam(epoch=epoch, @@ -517,7 +521,12 @@ def fit(self, train_data, eval_data=None, eval_metric='acc', except StopIteration: end_of_batch = True -self.update_metric(eval_metric, data_batch.label) +if isinstance(data_batch, list): +self.update_metric(eval_metric, + [db.label for db in data_batch], + pre_sliced=True) +else: +self.update_metric(eval_metric, data_batch.label) if monitor is not None: monitor.toc_print() @@ -943,7 +952,7 @@ def update(self): """ raise NotImplementedError() -def update_metric(self, eval_metric, labels): +def
[GitHub] leleamol opened a new pull request #62: Seed files for broken link checker job
leleamol opened a new pull request #62: Seed files for broken link checker job URL: https://github.com/apache/incubator-mxnet-site/pull/62 These are the seed files to for broken link checker nightly job. These files will be referenced by the nightly job to find regressions. @marcoabreu This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yzhliu closed pull request #10462: [MXNET-62] add test against spark integration
yzhliu closed pull request #10462: [MXNET-62] add test against spark integration URL: https://github.com/apache/incubator-mxnet/pull/10462 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/3rdparty/ps-lite b/3rdparty/ps-lite index a6dda54604a..8a763892a97 16 --- a/3rdparty/ps-lite +++ b/3rdparty/ps-lite @@ -1 +1 @@ -Subproject commit a6dda54604a07d1fb21b016ed1e3f4246b08222a +Subproject commit 8a763892a973afc1acd3d4b469d05bb338a83a6e diff --git a/include/mxnet/kvstore.h b/include/mxnet/kvstore.h index 4e99a9c861f..9e92207fb8d 100644 --- a/include/mxnet/kvstore.h +++ b/include/mxnet/kvstore.h @@ -229,6 +229,7 @@ class KVStore { CHECK(updater) << "invalid updater"; updater_ = updater; } + /*! * \brief set an updater with string keys * diff --git a/scala-package/core/src/main/scala/org/apache/mxnet/optimizer/SGD.scala b/scala-package/core/src/main/scala/org/apache/mxnet/optimizer/SGD.scala index c1b72591952..e228e7273d8 100644 --- a/scala-package/core/src/main/scala/org/apache/mxnet/optimizer/SGD.scala +++ b/scala-package/core/src/main/scala/org/apache/mxnet/optimizer/SGD.scala @@ -41,14 +41,15 @@ class SGD(val learningRate: Float = 0.01f, momentum: Float = 0.0f, */ override def update(index: Int, weight: NDArray, grad: NDArray, state: AnyRef): Unit = { // TODO(bing) implement wd_bias, wd_gamma, wd_beta (copy from python package) -var lr = - (if (lrScheduler != null) { +var lr = { + if (lrScheduler != null) { val scheduledLr = lrScheduler(numUpdate) updateCount(index) scheduledLr } else { this.learningRate - }) + } +} lr = getLr(index, lr) val wd = getWd(index, this.wd) diff --git a/scala-package/pom.xml b/scala-package/pom.xml index 9dcfa7ca27e..cd5dba85dfd 100644 --- a/scala-package/pom.xml +++ b/scala-package/pom.xml @@ -242,7 +242,7 @@ org.apache.maven.plugins maven-surefire-plugin -2.7 +2.19 true diff --git a/scala-package/spark/bin/run-mnist-example.sh b/scala-package/spark/bin/run-mnist-example.sh index 962c3375a9d..392d6c6a7cf 100755 --- a/scala-package/spark/bin/run-mnist-example.sh +++ b/scala-package/spark/bin/run-mnist-example.sh @@ -17,6 +17,8 @@ # specific language governing permissions and limitations # under the License. +set -x + CURR_DIR=$(cd `dirname $0`; pwd) SPARK_MODULE_DIR=$(cd $CURR_DIR/../; pwd) SCALA_PKG_DIR=$(cd $CURR_DIR/../../; pwd) @@ -35,10 +37,7 @@ SPARK_JAR=`find ${SPARK_MODULE_DIR}/target -name "*.jar" -type f -exec ls "{}" + SCALA_JAR=`find ${SCALA_PKG_DIR}/assembly/$OS/target -maxdepth 1 -name "*.jar" -type f -exec ls "{}" + | grep -v -E '(javadoc|sources)'` SPARK_OPTS+=" --name mxnet-spark-mnist" -SPARK_OPTS+=" --driver-memory 1g" -SPARK_OPTS+=" --executor-memory 1g" -SPARK_OPTS+=" --num-executors 2" -SPARK_OPTS+=" --executor-cores 1" +SPARK_OPTS+=" --driver-memory 2g" SPARK_OPTS+=" --jars ${SCALA_JAR}" # Download training and test set @@ -72,7 +71,7 @@ fi HOST=`hostname` -$SPARK_HOME/bin/spark-submit --master spark://$HOST:7077 \ +$SPARK_HOME/bin/spark-submit --master local[*] \ --class org.apache.mxnet.spark.example.ClassificationExample \ ${SPARK_OPTS} \ ${SPARK_JAR} \ diff --git a/scala-package/spark/pom.xml b/scala-package/spark/pom.xml index 281fad4056f..43ff1f78fe1 100644 --- a/scala-package/spark/pom.xml +++ b/scala-package/spark/pom.xml @@ -16,7 +16,44 @@ 1.6.3 - + + + osx-x86_64-cpu + +osx-x86_64-cpu + + + + linux-x86_64-cpu + +linux-x86_64-cpu + + + + linux-x86_64-gpu + +linux-x86_64-gpu + + + + + + +org.scalatest +scalatest-maven-plugin + + + -Djava.library.path=${project.parent.basedir}/native/${platform}/target \ + -Dlog4j.configuration=file://${project.basedir}/src/test/resources/log4j.properties + + + + +org.scalastyle +scalastyle-maven-plugin + + + org.apache.mxnet diff --git a/scala-package/spark/src/main/scala/org/apache/mxnet/spark/MXNet.scala b/scala-package/spark/src/main/scala/org/apache/mxnet/spark/MXNet.scala index 9720038afac..4952ca2626d 100644 --- a/scala-package/spark/src/main/scala/org/apache/mxnet/spark/MXNet.scala +++ b/scala-package/spark/src/main/scala/org/apache/mxnet/spark/MXNet.scala @@ -127,7 +127,8 @@ class MXNet extends Serializable { logger.info("Starting server ...") val server = new ParameterServer(params.runtimeClasspath, role = "server", -rootUri =
[GitHub] larroy commented on a change in pull request #11220: [MXNET-244][MXNET-523][ARM] improvements to ARMv7 based builds.
larroy commented on a change in pull request #11220: [MXNET-244][MXNET-523][ARM] improvements to ARMv7 based builds. URL: https://github.com/apache/incubator-mxnet/pull/11220#discussion_r194592426 ## File path: ci/build.py ## @@ -76,7 +76,7 @@ def build_docker(platform: str, docker_binary: str, registry: str) -> None: cmd = [docker_binary, "build", "-f", get_dockerfile(platform), "--build-arg", "USER_ID={}".format(os.getuid()), - "--cache-from", tag, + #"--cache-from", tag, Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
mbaijal commented on a change in pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827#discussion_r194605568 ## File path: tests/nightly/JenkinsfileForBinaries ## @@ -0,0 +1,111 @@ +// -*- mode: groovy -*- +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//This is a Jenkinsfile for nightly tests. The format and some functions have been picked up from the top-level Jenkinsfile + +err = null +mx_lib = 'lib/libmxnet.so, lib/libmxnet.a, 3rdparty/dmlc-core/libdmlc.a, 3rdparty/nnvm/lib/libnnvm.a' + +// pack libraries for later use +def pack_lib(name, libs=mx_lib) { + sh """ +echo "Packing ${libs} into ${name}" +echo ${libs} | sed -e 's/,/ /g' | xargs md5sum +""" + stash includes: libs, name: name +} + +// unpack libraries saved before +def unpack_lib(name, libs=mx_lib) { + unstash name + sh """ +echo "Unpacked ${libs} from ${name}" +echo ${libs} | sed -e 's/,/ /g' | xargs md5sum +""" +} + +def init_git() { + deleteDir() + retry(5) { +try { + timeout(time: 15, unit: 'MINUTES') { +checkout scm +sh 'git submodule update --init --recursive' +sh 'git clean -d -f' + } +} catch (exc) { + deleteDir() + error "Failed to fetch source codes with ${exc}" + sleep 2 +} + } +} + + +try { + stage('Build') { +parallel 'GPU: CUDA9.1+cuDNN7': { + node('mxnetlinux-cpu') { +ws('workspace/build-gpu') { + init_git() + sh "ci/build.py --platform ubuntu_build_cuda /work/runtime_functions.sh build_ubuntu_gpu_cuda91_cudnn7" + pack_lib('gpu', mx_lib) +} + } +} + } + + stage('NightlyTests'){ +parallel 'ImageClassification: GPU': { + node('mxnetlinux-gpu') { +ws('workspace/nt-ImageClassificationTest') { + init_git() + unpack_lib('gpu', mx_lib) + sh "ci/build.py --nvidiadocker --platform ubuntu_gpu /work/runtime_functions.sh nightly_test_image_classification" +} + } +}, +'KVStore_SingleNode: GPU': { + node('mxnetlinux-gpu') { +ws('workspace/nt-KVStoreTest') { + init_git() + unpack_lib('gpu', mx_lib) + //Note: This test is commented for now since it needs a p2/p3 instance Review comment: Whats the label ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xioryu opened a new issue #11240: rcnn example throws CUDNN_STATUS_BAD_PARAM when running under cudnn 6.0
xioryu opened a new issue #11240: rcnn example throws CUDNN_STATUS_BAD_PARAM when running under cudnn 6.0 URL: https://github.com/apache/incubator-mxnet/issues/11240 After I update the mxnet version from 1.1.0 to 1.2.0 and build the repository with CUDA 8.0.61 and cudnn 6.0, the rcnn training throws the following error when evaluating the rpn accuracy. check failed: e == cuDNN: CUDNN_STATUS_SUCCESS(3 vs. 0) cuDNN: CUDNN_STATUS_BAD_PARAM The error occurred when executing the following code [https://github.com/apache/incubator-mxnet/blob/ed80ff2c01ff54e82215bf03e8df942ea729a15e/example/rcnn/rcnn/core/metric.py#L51](url) Any ideas to address this without disabling cudnn or rolling to a former version? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: Fix a bug in sparse embedding operator (#11231)
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 4da51b2 Fix a bug in sparse embedding operator (#11231) 4da51b2 is described below commit 4da51b243e6f08bf53d27d96279998c8eaa5f039 Author: Haibin Lin AuthorDate: Mon Jun 11 20:41:24 2018 -0700 Fix a bug in sparse embedding operator (#11231) * add sparse block * add sparse embedding * add doc * lint * remove sparseblock * fix embedding --- src/operator/tensor/indexing_op.cu | 3 ++- tests/python/unittest/test_gluon.py | 14 ++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/src/operator/tensor/indexing_op.cu b/src/operator/tensor/indexing_op.cu index 593593a..b0ee05e 100644 --- a/src/operator/tensor/indexing_op.cu +++ b/src/operator/tensor/indexing_op.cu @@ -188,7 +188,8 @@ void SparseEmbeddingDeterministicKernelLaunch(const OpContext& ctx, // estimate unique temp space IType* data_ptr = data.dptr(); size_t *null_ptr = nullptr; - cub::DeviceSelect::Unique(NULL, unique_workspace_bytes, data_ptr, data_ptr, + // unique operations will be applied on sorted data + cub::DeviceSelect::Unique(NULL, unique_workspace_bytes, sorted_data, sorted_data, null_ptr, data_size, Stream::GetStream(s)); // One more space reserved for unique count size_t temp_workspace_bytes = std::max(unique_workspace_bytes, diff --git a/tests/python/unittest/test_gluon.py b/tests/python/unittest/test_gluon.py index bf1e0de..ced3063 100644 --- a/tests/python/unittest/test_gluon.py +++ b/tests/python/unittest/test_gluon.py @@ -753,8 +753,22 @@ def test_embedding(): y.backward() assert (layer.weight.grad().asnumpy()[:5] == 1).all() assert (layer.weight.grad().asnumpy()[5:] == 0).all() + +def check_embedding_large_input(sparse_grad): +embedding = mx.gluon.nn.Embedding(10, 1, sparse_grad=True) +embedding.initialize() +embedding.hybridize() +shape = (20481,) +with mx.autograd.record(): +emb_in = embedding(mx.nd.ones(shape)) +loss = emb_in.sum() +loss.backward() +assert embedding.weight.grad().data.sum().asscalar() == 20481 + check_embedding(True) check_embedding(False) +check_embedding_large_input(True) +check_embedding_large_input(False) @with_seed() def test_export(): -- To stop receiving notification emails like this one, please contact zhash...@apache.org.
[GitHub] szha closed pull request #11231: Fix a bug in sparse embedding operator
szha closed pull request #11231: Fix a bug in sparse embedding operator URL: https://github.com/apache/incubator-mxnet/pull/11231 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/operator/tensor/indexing_op.cu b/src/operator/tensor/indexing_op.cu index 593593ae4e3..b0ee05ea139 100644 --- a/src/operator/tensor/indexing_op.cu +++ b/src/operator/tensor/indexing_op.cu @@ -188,7 +188,8 @@ void SparseEmbeddingDeterministicKernelLaunch(const OpContext& ctx, // estimate unique temp space IType* data_ptr = data.dptr(); size_t *null_ptr = nullptr; - cub::DeviceSelect::Unique(NULL, unique_workspace_bytes, data_ptr, data_ptr, + // unique operations will be applied on sorted data + cub::DeviceSelect::Unique(NULL, unique_workspace_bytes, sorted_data, sorted_data, null_ptr, data_size, Stream::GetStream(s)); // One more space reserved for unique count size_t temp_workspace_bytes = std::max(unique_workspace_bytes, diff --git a/tests/python/unittest/test_gluon.py b/tests/python/unittest/test_gluon.py index bf1e0deb200..ced3063448b 100644 --- a/tests/python/unittest/test_gluon.py +++ b/tests/python/unittest/test_gluon.py @@ -753,8 +753,22 @@ def check_embedding(sparse_grad): y.backward() assert (layer.weight.grad().asnumpy()[:5] == 1).all() assert (layer.weight.grad().asnumpy()[5:] == 0).all() + +def check_embedding_large_input(sparse_grad): +embedding = mx.gluon.nn.Embedding(10, 1, sparse_grad=True) +embedding.initialize() +embedding.hybridize() +shape = (20481,) +with mx.autograd.record(): +emb_in = embedding(mx.nd.ones(shape)) +loss = emb_in.sum() +loss.backward() +assert embedding.weight.grad().data.sum().asscalar() == 20481 + check_embedding(True) check_embedding(False) +check_embedding_large_input(True) +check_embedding_large_input(False) @with_seed() def test_export(): This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mbaijal closed pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests.
mbaijal closed pull request #10827: [MXNET-405][WIP] Add 2 new pipelines to the Official CI and run nightly tests. URL: https://github.com/apache/incubator-mxnet/pull/10827 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu closed issue #9844: Faulty bmod implementation
marcoabreu closed issue #9844: Faulty bmod implementation URL: https://github.com/apache/incubator-mxnet/issues/9844 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc commented on issue #8014: cross compile mxnet for android, without using Amalgamation?
kalyc commented on issue #8014: cross compile mxnet for android, without using Amalgamation? URL: https://github.com/apache/incubator-mxnet/issues/8014#issuecomment-396414564 Hello @fye881, Thank You for submitting the issue. Were you able to resolve it? I am requesting the MXNet community to add label "Pending Requester Info" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kalyc commented on issue #11221: How to store a large distributed training model with KVStore?
kalyc commented on issue #11221: How to store a large distributed training model with KVStore? URL: https://github.com/apache/incubator-mxnet/issues/11221#issuecomment-396418155 Hi @liuzx32 there is an open discussion forum about mxnet on discuss.mxnet.io - you could post your query there to learn more. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] pengzhao-intel commented on issue #11212: cherry-pick bug fixes in MKLDNN for v1.2.0
pengzhao-intel commented on issue #11212: cherry-pick bug fixes in MKLDNN for v1.2.0 URL: https://github.com/apache/incubator-mxnet/pull/11212#issuecomment-396429297 @anirudh2290 The update of MKL-DNN in #10578 fixed the depthConv issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yzhliu commented on a change in pull request #11126: [MXNET-386] ongoing maintenance on NDArray
yzhliu commented on a change in pull request #11126: [MXNET-386] ongoing maintenance on NDArray URL: https://github.com/apache/incubator-mxnet/pull/11126#discussion_r194589659 ## File path: scala-package/macros/src/main/scala/org/apache/mxnet/APIDocGenerator.scala ## @@ -97,9 +97,11 @@ private[mxnet] object APIDocGenerator{ argDef += "name : String = null" argDef += "attr : Map[String, String] = null" } else { + argDef += "out : Option[NDArray] = None" returnType = "org.apache.mxnet.NDArrayFuncReturn" } -s"def ${func.name} (${argDef.mkString(", ")}) : ${returnType}" +val experimentalTag = "@Experimental" Review comment: yes IDE will pop out. Deprecated APIs log at compile time, not runtime. Just ... write real code and try. @nswamy This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on a change in pull request #11127: add import_ for SymbolBlock
ThomasDelteil commented on a change in pull request #11127: add import_ for SymbolBlock URL: https://github.com/apache/incubator-mxnet/pull/11127#discussion_r194591463 ## File path: python/mxnet/gluon/block.py ## @@ -317,8 +317,23 @@ def save_params(self, filename): arg_dict = {key : val._reduce() for key, val in params.items()} ndarray.save(filename, arg_dict) -def load_params(self, filename, ctx=None, allow_missing=False, -ignore_extra=False): +def save_params(self, filename): +"""[Deprecated] Please use save_parameters. + +Save parameters to file. + +filename : str +Path to file. +""" +warnings.warn("save_params is deprecated. Please use save_parameters.") Review comment: shall we add something about export? Something like "If you are using an hybridized model and want to serialize it to obtain the network structure and parameters, please refer to HybridBlock.export()" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hcho3 commented on issue #11209: [WIP] implement var operator
hcho3 commented on issue #11209: [WIP] implement var operator URL: https://github.com/apache/incubator-mxnet/pull/11209#issuecomment-396444124 @piiswrong I've added tests. The `var` operator works, but the accuracy is not that good. I suspect that squaring small floating-point values causes precision loss. Let me come up with an alternative implementation that does not involving squaring. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hcho3 commented on issue #11209: [WIP] implement var operator
hcho3 commented on issue #11209: [WIP] implement var operator URL: https://github.com/apache/incubator-mxnet/pull/11209#issuecomment-396444124 @piiswrong I've added tests. The `var` operator works, but the accuracy is not that good. I suspect that squaring small floating-point values cause precision loss. Let me come up with an alternative implementation that does not involving squaring. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] liumusicforever commented on issue #6474: Fine-tune the mxnet ssd get mismatchfrom.shape() error
liumusicforever commented on issue #6474: Fine-tune the mxnet ssd get mismatchfrom.shape() error URL: https://github.com/apache/incubator-mxnet/issues/6474#issuecomment-396463447 yes ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] juliusshufan commented on issue #10921: [MXNET-500]Test cases improvement for MKLDNN on Gluon
juliusshufan commented on issue #10921: [MXNET-500]Test cases improvement for MKLDNN on Gluon URL: https://github.com/apache/incubator-mxnet/pull/10921#issuecomment-396468490 @marcoabreu Thanks for your retriggering, seems the result is very similiar... too many gpu cases failed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ctcyang commented on a change in pull request #11237: Bring back MXNET_GPU_COPY_NTHREADS env variable
ctcyang commented on a change in pull request #11237: Bring back MXNET_GPU_COPY_NTHREADS env variable URL: https://github.com/apache/incubator-mxnet/pull/11237#discussion_r194586233 ## File path: src/engine/threaded_engine_perdevice.cc ## @@ -194,6 +196,8 @@ class ThreadedEnginePerDevice : public ThreadedEngine { size_t cpu_worker_nthreads_; /*! \brief number of concurrent thread each gpu worker uses */ size_t gpu_worker_nthreads_; + /*! \brief number of concurrent thread each gpu copy worker uses */ + int gpu_copy_nthreads_; Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lanking520 opened a new pull request #11239: [MXNET-319] Javadoc fix
lanking520 opened a new pull request #11239: [MXNET-319] Javadoc fix URL: https://github.com/apache/incubator-mxnet/pull/11239 ## Description ## This is the second proposed solution to this: #11123 Add Java doc support. @nswamy @yzhliu @andrewfayres Be aware, I have imported a third party library to generate the Scala doc. Please check the pom file carefully. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yajiedesign commented on issue #10564: Simplified CUDA language detection in cmake
yajiedesign commented on issue #10564: Simplified CUDA language detection in cmake URL: https://github.com/apache/incubator-mxnet/pull/10564#issuecomment-396458326 windows build is some problem. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sandeep-krishnamurthy closed issue #6474: Fine-tune the mxnet ssd get mismatchfrom.shape() error
sandeep-krishnamurthy closed issue #6474: Fine-tune the mxnet ssd get mismatchfrom.shape() error URL: https://github.com/apache/incubator-mxnet/issues/6474 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 commented on issue #11212: cherry-pick bug fixes in MKLDNN for v1.2.0
anirudh2290 commented on issue #11212: cherry-pick bug fixes in MKLDNN for v1.2.0 URL: https://github.com/apache/incubator-mxnet/pull/11212#issuecomment-396473032 @pengzhao-intel I understand depthwise convolution is widely used but it is too big a change to be added to the patch release. What do others think @zheng-da @piiswrong ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] liuzx32 closed issue #11221: How to store a large distributed training model with KVStore?
liuzx32 closed issue #11221: How to store a large distributed training model with KVStore? URL: https://github.com/apache/incubator-mxnet/issues/11221 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] liuzx32 commented on issue #11221: How to store a large distributed training model with KVStore?
liuzx32 commented on issue #11221: How to store a large distributed training model with KVStore? URL: https://github.com/apache/incubator-mxnet/issues/11221#issuecomment-396438516 @kalyc Thank you very much for https://discuss.mxnet.io/ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yzhliu commented on a change in pull request #11126: [MXNET-386] ongoing maintenance on NDArray
yzhliu commented on a change in pull request #11126: [MXNET-386] ongoing maintenance on NDArray URL: https://github.com/apache/incubator-mxnet/pull/11126#discussion_r194589659 ## File path: scala-package/macros/src/main/scala/org/apache/mxnet/APIDocGenerator.scala ## @@ -97,9 +97,11 @@ private[mxnet] object APIDocGenerator{ argDef += "name : String = null" argDef += "attr : Map[String, String] = null" } else { + argDef += "out : Option[NDArray] = None" returnType = "org.apache.mxnet.NDArrayFuncReturn" } -s"def ${func.name} (${argDef.mkString(", ")}) : ${returnType}" +val experimentalTag = "@Experimental" Review comment: yes IDE will pop out. Deprecated APIs log at compile time, not runtime. You can try. @nswamy This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] absalama commented on issue #11200: Training Imagenet on AlexNet failed with corrupted JPEG format images
absalama commented on issue #11200: Training Imagenet on AlexNet failed with corrupted JPEG format images URL: https://github.com/apache/incubator-mxnet/issues/11200#issuecomment-396473924 I did not verify the hash of the dataset. Before I created the rec files I deleted the data sets from the corrupted images. I used tools/im2rec to create the rec files The question is, how to verify the datasets or the record files before start training? **I would like to mention here that the training was working for almost 2 hours (with very low accuracy and speed)** Please have a look into the following logs INFO:root:Epoch[0] Batch [1600] Speed: 35.24 samples/sec accuracy=0.001172 INFO:root:Epoch[0] Batch [1620] Speed: 34.86 samples/sec accuracy=0.002734 INFO:root:Epoch[0] Batch [1640] Speed: 34.61 samples/sec accuracy=0.003906 INFO:root:Epoch[0] Batch [1660] Speed: 35.20 samples/sec accuracy=0.001563 INFO:root:Epoch[0] Batch [1680] Speed: 34.63 samples/sec accuracy=0.001172 INFO:root:Epoch[0] Batch [1700] Speed: 33.94 samples/sec accuracy=0.001953 INFO:root:Epoch[0] Batch [1720] Speed: 34.72 samples/sec accuracy=0.002344 INFO:root:Epoch[0] Batch [1740] Speed: 32.98 samples/sec accuracy=0.003906 INFO:root:Epoch[0] Batch [1760] Speed: 29.50 samples/sec accuracy=0.001953 INFO:root:Epoch[0] Batch [1780] Speed: 34.83 samples/sec accuracy=0.003125 Corrupt JPEG data: premature end of data segment terminate called after throwing an instance of 'dmlc::Error' what(): [17:20:52] src/recordio.cc:117: Check failed: p[0] == RecordIOWriter::kMagic Stack trace returned 6 entries: [bt] (0) /work/projects/Project00755/envs/mxnet_gpu/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x30cbe2) [0x2b9eb9b4abe2] [bt] (1) /work/projects/Project00755/envs/mxnet_gpu/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2f9e845) [0x2b9ebc7dc845] [bt] (2) /work/projects/Project00755/envs/mxnet_gpu/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2a34e98) [0x2b9ebc272e98] [bt] (3) /shared/apps/gcc/4.9.4/lib64/libgomp.so.1(+0xed36) [0x2b9ef5042d36] [bt] (4) /lib64/libpthread.so.0(+0x7e25) [0x2b9e8961fe25] [bt] (5) /lib64/libc.so.6(clone+0x6d) [0x2b9e8a03534d] /opt/slurm/current/var/spool/job7708145/slurm_script: line 28: 31099 Aborted (core dumped) python3 train_imagenet.py --gpus 0 --data-nthreads 8 --network alexnet --batch-size 128 --model mxnet_alexnet_single_gpu --lr 0.01 --optimizer sgd --mom 0.9 --wd 0.0005 --image-shape 3,227,227 --num-examples 1281144 --data-train "${IMAGENET_ROOT}/record_io/train/train.rec" --data-val "${IMAGENET_ROOT}/record_io/val/value.rec" --data-train-idx "${IMAGENET_ROOT}/record_io/train/train.idx" --data-val-idx "${IMAGENET_ROOT}/record_io/val/value.idx" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #11223: Allow specifying AdaGrad initial accumulator value
eric-haibin-lin commented on a change in pull request #11223: Allow specifying AdaGrad initial accumulator value URL: https://github.com/apache/incubator-mxnet/pull/11223#discussion_r194483682 ## File path: python/mxnet/optimizer.py ## @@ -1091,14 +1091,20 @@ class AdaGrad(Optimizer): -- eps: float, optional Small value to avoid division by 0. +initial_accumulator_value: float, default 0 +The Adagrad state is initially set to this value. """ -def __init__(self, eps=1e-7, **kwargs): +def __init__(self, eps=1e-7, initial_accumulator_value=0, **kwargs): Review comment: I think it comes from https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/training/adagrad.py#L46 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on issue #11219: Flaky test: test_gru_bidirectional
ThomasDelteil commented on issue #11219: Flaky test: test_gru_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/11219#issuecomment-396323801 duplicate here of this one https://github.com/apache/incubator-mxnet/issues/11202 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lanking520 commented on a change in pull request #11204: Scala inference memory leak fix
lanking520 commented on a change in pull request #11204: Scala inference memory leak fix URL: https://github.com/apache/incubator-mxnet/pull/11204#discussion_r194485202 ## File path: scala-package/core/src/main/scala/org/apache/mxnet/FeedForward.scala ## @@ -230,8 +230,11 @@ class FeedForward private( val padded = batch.pad val realSize = batchSize - padded for ((list, nd) <- outputs zip predExec.outputs) { -list += nd.slice(0, realSize).copy() +val ndSliced = nd.slice(0, realSize) +list += ndSliced.copy() +ndSliced.dispose() Review comment: @liuzx32 The way that Scala works is more like playing around shared pointers. When we create a NDArray, we request a pointer that point to a memory space on the C++ side. If we want to release this piece of memory, we need to call dispose somewhere in the Scala code in order to get it executed. However, if we forgot to do that, Scala/JVM will discard this pointer rather than helping us release the memory and that cause the memory leak. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: Replace the old adhoc method to iterate over gpu devices with new mx.context.num_gpus (#11227)
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 0dbba84 Replace the old adhoc method to iterate over gpu devices with new mx.context.num_gpus (#11227) 0dbba84 is described below commit 0dbba84c72f85e5db90161bd6282763472ce5f4f Author: Deokjae Lee <36436141+asitsta...@users.noreply.github.com> AuthorDate: Tue Jun 12 01:27:31 2018 +0900 Replace the old adhoc method to iterate over gpu devices with new mx.context.num_gpus (#11227) --- tests/python/unittest/test_random.py | 104 +++ 1 file changed, 43 insertions(+), 61 deletions(-) diff --git a/tests/python/unittest/test_random.py b/tests/python/unittest/test_random.py index 40723b2..7abbc99 100644 --- a/tests/python/unittest/test_random.py +++ b/tests/python/unittest/test_random.py @@ -293,31 +293,22 @@ def test_random_seed_setting_for_context(): samples_imp = [] samples_sym = [] # Collect random number samples from the generators of all devices, each seeded with the same number. -for dev_id in range(0, 16 if dev_type == 'gpu' else 1): -# Currently python API does not provide a method to get the number of gpu devices. -# Waiting for PR #10354, which provides the method, to be merged. -# As a temporal workaround, try first and catch the exception caused by the absence of the device with `dev_id`. -try: -with mx.Context(dev_type, dev_id): -ctx = mx.context.current_context() -seed = set_seed_variously_for_context(ctx, 1, num_temp_seeds, seed_to_test) - -# Check imperative. `multinomial` uses non-parallel rng. -rnds = mx.nd.random.multinomial(data=mx.nd.array(probs, dtype=dtype), shape=num_samples) -samples_imp.append(rnds.asnumpy()) - -# Check symbolic. `multinomial` uses non-parallel rng. -P = mx.sym.Variable("P") -X = mx.sym.random.multinomial(data=P, shape=num_samples, get_prob=False) -exe = X.bind(ctx, {"P": mx.nd.array(probs, dtype=dtype)}) -set_seed_variously_for_context(ctx, seed, num_temp_seeds, seed_to_test) -exe.forward() -samples_sym.append(exe.outputs[0].asnumpy()) -except mx.MXNetError as e: -if str(e).find("invalid device ordinal") != -1: -break -else: -raise e +for dev_id in range(0, mx.context.num_gpus() if dev_type == 'gpu' else 1): +with mx.Context(dev_type, dev_id): +ctx = mx.context.current_context() +seed = set_seed_variously_for_context(ctx, 1, num_temp_seeds, seed_to_test) + +# Check imperative. `multinomial` uses non-parallel rng. +rnds = mx.nd.random.multinomial(data=mx.nd.array(probs, dtype=dtype), shape=num_samples) +samples_imp.append(rnds.asnumpy()) + +# Check symbolic. `multinomial` uses non-parallel rng. +P = mx.sym.Variable("P") +X = mx.sym.random.multinomial(data=P, shape=num_samples, get_prob=False) +exe = X.bind(ctx, {"P": mx.nd.array(probs, dtype=dtype)}) +set_seed_variously_for_context(ctx, seed, num_temp_seeds, seed_to_test) +exe.forward() +samples_sym.append(exe.outputs[0].asnumpy()) # The samples should be identical across different gpu devices. for i in range(1, len(samples_imp)): assert same(samples_imp[i - 1], samples_imp[i]) @@ -333,42 +324,33 @@ def test_parallel_random_seed_setting_for_context(): samples_imp = [] samples_sym = [] # Collect random number samples from the generators of all devices, each seeded with the same number. -for dev_id in range(0, 16 if dev_type == 'gpu' else 1): -# Currently python API does not provide a method to get the number of gpu devices. -# Waiting for PR #10354, which provides the method, to be merged. -# As a temporal workaround, try first and catch the exception caused by the absence of the device with `dev_id`. -try: -with mx.Context(dev_type, dev_id): -ctx = mx.context.current_context() -# Avoid excessive test cpu runtimes. -num_temp_seeds = 25 if dev_type == 'gpu' else 1 -# To flush out a possible race condition, run multiple times. -for _ in range(20): -# Create enough samples such that we get a meaningful distribution. -
[GitHub] szha commented on a change in pull request #11041: gpu mem pool strategy
szha commented on a change in pull request #11041: gpu mem pool strategy URL: https://github.com/apache/incubator-mxnet/pull/11041#discussion_r194471060 ## File path: tests/python/unittest/test_sparse_operator.py ## @@ -16,7 +16,7 @@ # under the License. from mxnet.test_utils import * -from common import setup_module, with_seed +from common import setup_module, with_seed, teardown Review comment: applying this change would allow all tests within a module to finish before moving onto the next test, thus eliminating the case where side effect of tests in another module spills over to the next. In terms of testing practice, including a setup/teardown is common. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #11041: gpu mem pool strategy
marcoabreu commented on a change in pull request #11041: gpu mem pool strategy URL: https://github.com/apache/incubator-mxnet/pull/11041#discussion_r194471886 ## File path: tests/python/unittest/test_sparse_operator.py ## @@ -16,7 +16,7 @@ # under the License. from mxnet.test_utils import * -from common import setup_module, with_seed +from common import setup_module, with_seed, teardown Review comment: Yeah, but we're not actually using it in most files, right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on a change in pull request #11041: gpu mem pool strategy
szha commented on a change in pull request #11041: gpu mem pool strategy URL: https://github.com/apache/incubator-mxnet/pull/11041#discussion_r194472380 ## File path: tests/python/unittest/test_sparse_operator.py ## @@ -16,7 +16,7 @@ # under the License. from mxnet.test_utils import * -from common import setup_module, with_seed +from common import setup_module, with_seed, teardown Review comment: now we are This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #11041: gpu mem pool strategy
marcoabreu commented on a change in pull request #11041: gpu mem pool strategy URL: https://github.com/apache/incubator-mxnet/pull/11041#discussion_r194473467 ## File path: tests/python/unittest/test_sparse_operator.py ## @@ -16,7 +16,7 @@ # under the License. from mxnet.test_utils import * -from common import setup_module, with_seed +from common import setup_module, with_seed, teardown Review comment: Ah in common.py :) But isn't it sufficient to import it there? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on a change in pull request #11041: gpu mem pool strategy
szha commented on a change in pull request #11041: gpu mem pool strategy URL: https://github.com/apache/incubator-mxnet/pull/11041#discussion_r194478459 ## File path: tests/python/unittest/test_sparse_operator.py ## @@ -16,7 +16,7 @@ # under the License. from mxnet.test_utils import * -from common import setup_module, with_seed +from common import setup_module, with_seed, teardown Review comment: unfortunately no. it is the same case as setup_module This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on issue #11202: Flaky test Python 3: Win: test_operator.test_gru_bidirectional
ThomasDelteil commented on issue #11202: Flaky test Python 3: Win: test_operator.test_gru_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/11202#issuecomment-396323871 again http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10637/6/pipeline This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on issue #11219: Flaky test: test_gru_bidirectional
ThomasDelteil commented on issue #11219: Flaky test: test_gru_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/11219#issuecomment-396324270 This one is happening way too often, had it 3 times in 6 builds. Should we upgrade to bug? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on issue #11219: Flaky test: test_gru_bidirectional
ThomasDelteil commented on issue #11219: Flaky test: test_gru_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/11219#issuecomment-396324270 This one is happening way too often, had it 3 times in 6 builds This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong closed pull request #11197: Gluon sparse block and sparse embedding
piiswrong closed pull request #11197: Gluon sparse block and sparse embedding URL: https://github.com/apache/incubator-mxnet/pull/11197 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/api/python/gluon/contrib.md b/docs/api/python/gluon/contrib.md index bc3089fa878..877a294d9a1 100644 --- a/docs/api/python/gluon/contrib.md +++ b/docs/api/python/gluon/contrib.md @@ -35,6 +35,7 @@ In the rest of this document, we list routines provided by the `gluon.contrib` p Concurrent HybridConcurrent Identity +SparseEmbedding ``` ### Recurrent neural network @@ -55,6 +56,7 @@ In the rest of this document, we list routines provided by the `gluon.contrib` p Conv1DGRUCell Conv2DGRUCell Conv3DGRUCell +LSTMPCell ``` ### Data diff --git a/python/mxnet/gluon/contrib/nn/basic_layers.py b/python/mxnet/gluon/contrib/nn/basic_layers.py index eccdf18c1bb..1edef1476ee 100644 --- a/python/mxnet/gluon/contrib/nn/basic_layers.py +++ b/python/mxnet/gluon/contrib/nn/basic_layers.py @@ -18,10 +18,10 @@ # coding: utf-8 # pylint: disable= arguments-differ """Custom neural network layers in model_zoo.""" -__all__ = ['Concurrent', 'HybridConcurrent', 'Identity'] +__all__ = ['Concurrent', 'HybridConcurrent', 'Identity', 'SparseEmbedding'] from import nd -from ...block import HybridBlock +from ...block import HybridBlock, Block from ...nn import Sequential, HybridSequential class Concurrent(Sequential): @@ -110,3 +110,44 @@ def __init__(self, prefix=None, params=None): def hybrid_forward(self, F, x): return x + +class SparseEmbedding(Block): +r"""Turns non-negative integers (indexes/tokens) into dense vectors +of fixed size. eg. [4, 20] -> [[0.25, 0.1], [0.6, -0.2]] + +This SparseBlock is designed for distributed training with extremely large +input dimension. Both weight and gradient w.r.t. weight are `RowSparseNDArray`. + +Parameters +-- +input_dim : int +Size of the vocabulary, i.e. maximum integer index + 1. +output_dim : int +Dimension of the dense embedding. +dtype : str or np.dtype, default 'float32' +Data type of output embeddings. +weight_initializer : Initializer +Initializer for the `embeddings` matrix. + +Inputs: +- **data**: (N-1)-D tensor with shape: `(x1, x2, ..., xN-1)`. +Output: +- **out**: N-D tensor with shape: `(x1, x2, ..., xN-1, output_dim)`. +""" +def __init__(self, input_dim, output_dim, dtype='float32', + weight_initializer=None, **kwargs): +super(SparseEmbedding, self).__init__(**kwargs) +self._kwargs = {'input_dim': input_dim, 'output_dim': output_dim, +'dtype': dtype, 'sparse_grad': True} +self.weight = self.params.get('weight', shape=(input_dim, output_dim), + init=weight_initializer, dtype=dtype, + grad_stype='row_sparse', stype='row_sparse') + +def forward(self, x): +weight = self.weight.row_sparse_data(x) +return nd.Embedding(x, weight, name='fwd', **self._kwargs) + +def __repr__(self): +s = '{block_name}({input_dim} -> {output_dim}, {dtype})' +return s.format(block_name=self.__class__.__name__, +**self._kwargs) diff --git a/tests/python/unittest/test_gluon_contrib.py b/tests/python/unittest/test_gluon_contrib.py index 729ec8407f2..264ff1f5e53 100644 --- a/tests/python/unittest/test_gluon_contrib.py +++ b/tests/python/unittest/test_gluon_contrib.py @@ -19,7 +19,7 @@ import mxnet as mx from mxnet.gluon import contrib from mxnet.gluon import nn -from mxnet.gluon.contrib.nn import Concurrent, HybridConcurrent, Identity +from mxnet.gluon.contrib.nn import Concurrent, HybridConcurrent, Identity, SparseEmbedding from mxnet.test_utils import almost_equal from common import setup_module, with_seed import numpy as np @@ -185,13 +185,25 @@ def test_concurrent(): x.wait_to_read() x2.wait_to_read() - +@with_seed() def test_identity(): model = Identity() x = mx.nd.random.uniform(shape=(128, 33, 64)) mx.test_utils.assert_almost_equal(model(x).asnumpy(), x.asnumpy()) +@with_seed() +def test_sparse_embedding(): +layer = SparseEmbedding(10, 100) +layer.initialize() +trainer = mx.gluon.Trainer(layer.collect_params(), 'sgd') +x = mx.nd.array([3,4,2,0,1]) +with mx.autograd.record(): +y = layer(x) +y.backward() +assert (layer.weight.grad().asnumpy()[:5] == 1).all() +assert (layer.weight.grad().asnumpy()[5:] == 0).all() + def test_datasets(): wikitext2_train =
[incubator-mxnet] branch master updated: [WIP] Gluon sparse block and sparse embedding (#11197)
This is an automated email from the ASF dual-hosted git repository. jxie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 715457d [WIP] Gluon sparse block and sparse embedding (#11197) 715457d is described below commit 715457d94ebf8935e34dd6bd445b3ba3950fe9d4 Author: Haibin Lin AuthorDate: Mon Jun 11 10:43:40 2018 -0700 [WIP] Gluon sparse block and sparse embedding (#11197) * add sparse block * add sparse embedding * add doc * lint * remove sparseblock --- docs/api/python/gluon/contrib.md | 2 ++ python/mxnet/gluon/contrib/nn/basic_layers.py | 45 +-- tests/python/unittest/test_gluon_contrib.py | 16 -- 3 files changed, 59 insertions(+), 4 deletions(-) diff --git a/docs/api/python/gluon/contrib.md b/docs/api/python/gluon/contrib.md index bc3089f..877a294 100644 --- a/docs/api/python/gluon/contrib.md +++ b/docs/api/python/gluon/contrib.md @@ -35,6 +35,7 @@ In the rest of this document, we list routines provided by the `gluon.contrib` p Concurrent HybridConcurrent Identity +SparseEmbedding ``` ### Recurrent neural network @@ -55,6 +56,7 @@ In the rest of this document, we list routines provided by the `gluon.contrib` p Conv1DGRUCell Conv2DGRUCell Conv3DGRUCell +LSTMPCell ``` ### Data diff --git a/python/mxnet/gluon/contrib/nn/basic_layers.py b/python/mxnet/gluon/contrib/nn/basic_layers.py index eccdf18..1edef14 100644 --- a/python/mxnet/gluon/contrib/nn/basic_layers.py +++ b/python/mxnet/gluon/contrib/nn/basic_layers.py @@ -18,10 +18,10 @@ # coding: utf-8 # pylint: disable= arguments-differ """Custom neural network layers in model_zoo.""" -__all__ = ['Concurrent', 'HybridConcurrent', 'Identity'] +__all__ = ['Concurrent', 'HybridConcurrent', 'Identity', 'SparseEmbedding'] from import nd -from ...block import HybridBlock +from ...block import HybridBlock, Block from ...nn import Sequential, HybridSequential class Concurrent(Sequential): @@ -110,3 +110,44 @@ class Identity(HybridBlock): def hybrid_forward(self, F, x): return x + +class SparseEmbedding(Block): +r"""Turns non-negative integers (indexes/tokens) into dense vectors +of fixed size. eg. [4, 20] -> [[0.25, 0.1], [0.6, -0.2]] + +This SparseBlock is designed for distributed training with extremely large +input dimension. Both weight and gradient w.r.t. weight are `RowSparseNDArray`. + +Parameters +-- +input_dim : int +Size of the vocabulary, i.e. maximum integer index + 1. +output_dim : int +Dimension of the dense embedding. +dtype : str or np.dtype, default 'float32' +Data type of output embeddings. +weight_initializer : Initializer +Initializer for the `embeddings` matrix. + +Inputs: +- **data**: (N-1)-D tensor with shape: `(x1, x2, ..., xN-1)`. +Output: +- **out**: N-D tensor with shape: `(x1, x2, ..., xN-1, output_dim)`. +""" +def __init__(self, input_dim, output_dim, dtype='float32', + weight_initializer=None, **kwargs): +super(SparseEmbedding, self).__init__(**kwargs) +self._kwargs = {'input_dim': input_dim, 'output_dim': output_dim, +'dtype': dtype, 'sparse_grad': True} +self.weight = self.params.get('weight', shape=(input_dim, output_dim), + init=weight_initializer, dtype=dtype, + grad_stype='row_sparse', stype='row_sparse') + +def forward(self, x): +weight = self.weight.row_sparse_data(x) +return nd.Embedding(x, weight, name='fwd', **self._kwargs) + +def __repr__(self): +s = '{block_name}({input_dim} -> {output_dim}, {dtype})' +return s.format(block_name=self.__class__.__name__, +**self._kwargs) diff --git a/tests/python/unittest/test_gluon_contrib.py b/tests/python/unittest/test_gluon_contrib.py index 729ec84..264ff1f 100644 --- a/tests/python/unittest/test_gluon_contrib.py +++ b/tests/python/unittest/test_gluon_contrib.py @@ -19,7 +19,7 @@ from __future__ import print_function import mxnet as mx from mxnet.gluon import contrib from mxnet.gluon import nn -from mxnet.gluon.contrib.nn import Concurrent, HybridConcurrent, Identity +from mxnet.gluon.contrib.nn import Concurrent, HybridConcurrent, Identity, SparseEmbedding from mxnet.test_utils import almost_equal from common import setup_module, with_seed import numpy as np @@ -185,13 +185,25 @@ def test_concurrent(): x.wait_to_read() x2.wait_to_read() - +@with_seed() def test_identity(): model = Identity() x = mx.nd.random.uniform(shape=(128, 33, 64)) mx.test_utils.assert_almost_equal(model(x).asnumpy(),
[GitHub] marcoabreu commented on a change in pull request #11041: gpu mem pool strategy
marcoabreu commented on a change in pull request #11041: gpu mem pool strategy URL: https://github.com/apache/incubator-mxnet/pull/11041#discussion_r194490302 ## File path: tests/python/unittest/test_sparse_operator.py ## @@ -16,7 +16,7 @@ # under the License. from mxnet.test_utils import * -from common import setup_module, with_seed +from common import setup_module, with_seed, teardown Review comment: argh :/ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on a change in pull request #11223: Allow specifying AdaGrad initial accumulator value
piiswrong commented on a change in pull request #11223: Allow specifying AdaGrad initial accumulator value URL: https://github.com/apache/incubator-mxnet/pull/11223#discussion_r194482464 ## File path: python/mxnet/optimizer.py ## @@ -1091,14 +1091,20 @@ class AdaGrad(Optimizer): -- eps: float, optional Small value to avoid division by 0. +initial_accumulator_value: float, default 0 +The Adagrad state is initially set to this value. """ -def __init__(self, eps=1e-7, **kwargs): +def __init__(self, eps=1e-7, initial_accumulator_value=0, **kwargs): Review comment: not sure about the name. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andrewfayres commented on a change in pull request #11204: Scala inference memory leak fix
andrewfayres commented on a change in pull request #11204: Scala inference memory leak fix URL: https://github.com/apache/incubator-mxnet/pull/11204#discussion_r194492750 ## File path: scala-package/core/src/main/scala/org/apache/mxnet/FeedForward.scala ## @@ -230,8 +230,11 @@ class FeedForward private( val padded = batch.pad val realSize = batchSize - padded for ((list, nd) <- outputs zip predExec.outputs) { -list += nd.slice(0, realSize).copy() +val ndSliced = nd.slice(0, realSize) +list += ndSliced.copy() +ndSliced.dispose() Review comment: If we're getting an exception here something has gone very wrong. Most likely the problem would be memory access/allocation and I'm not confident dispose would work correctly under those conditions. I'll make the change anyway because although I think it's unlikely to ever help it definitely won't hurt and is a good practice to follow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aaronmarkham commented on issue #11155: [MXNET-521] Add Facebook open-graph tag integration
aaronmarkham commented on issue #11155: [MXNET-521] Add Facebook open-graph tag integration URL: https://github.com/apache/incubator-mxnet/pull/11155#issuecomment-396344372 @eric-haibin-lin @szha Bueller? Can you please merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aaronmarkham commented on issue #11191: [MXNET-530] Remove install page artifacts
aaronmarkham commented on issue #11191: [MXNET-530] Remove install page artifacts URL: https://github.com/apache/incubator-mxnet/pull/11191#issuecomment-396344637 @eric-haibin-lin @szha Bueller? Can you please merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services