[GitHub] ThomasDelteil commented on issue #10816: Update threaded_engine.cc
ThomasDelteil commented on issue #10816: Update threaded_engine.cc URL: https://github.com/apache/incubator-mxnet/pull/10816#issuecomment-386781720 I see, why not doing this modification along the related ones in your follow-up PR then? That would make it easier to follow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on issue #10816: Update threaded_engine.cc
ThomasDelteil commented on issue #10816: Update threaded_engine.cc URL: https://github.com/apache/incubator-mxnet/pull/10816#issuecomment-386781720 I see, why not doing this modification along the related one in your follow-up PR then? That would make it easier to follow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yajiedesign closed issue #10821: ci error MKLDNN_UTIL_FUNC.MemFormat
yajiedesign closed issue #10821: ci error MKLDNN_UTIL_FUNC.MemFormat URL: https://github.com/apache/incubator-mxnet/issues/10821 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yajiedesign opened a new issue #10821: ci error MKLDNN_UTIL_FUNC.MemFormat
yajiedesign opened a new issue #10821: ci error MKLDNN_UTIL_FUNC.MemFormat URL: https://github.com/apache/incubator-mxnet/issues/10821 ## Description ci test FAILED MKLDNN_UTIL_FUNC.MemFormat http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10629/37/pipeline/710 ## Environment info (Required) Environment is in ci. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] leezu commented on issue #10768: Use numpy in RandomSampler
leezu commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386780280 Sounds great, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asitstands commented on issue #10768: Use numpy in RandomSampler
asitstands commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386779966 I'll test on some other environments including AWS and make a PR if I'm sure that the performance hit is not usual. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] leezu commented on issue #10768: Use numpy in RandomSampler
leezu commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386779158 On my personal computer indeed I experience the same speed-up of mxnet compared to numpy. On the other machines the results I quoted above still stand. I guess in the end this depends a lot on the particular system and the build options of the libraries, though it is strange given your explanation about the implementation. As this code is only run once per epoch to shuffle the dataset I believe it is not that important if it takes 200ms or 500ms for large datasets. It was just unbearable that it took 10s+ before. I don't have a strong feeling about changing it, though I won't propose such change myself given that I had mixed results depending on the computer. If you open a PR and someone is willing to merge it I won't mind. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashokei commented on issue #10819: [MXNET-367] update mkldnn to v0.14 and disable building test examples
ashokei commented on issue #10819: [MXNET-367] update mkldnn to v0.14 and disable building test examples URL: https://github.com/apache/incubator-mxnet/pull/10819#issuecomment-386777118 @TaoLv can you please review, i updated mkldnn formats based on new release. i notice there is `mkldnn_wino_fmt` which may not apply , all other formats seem to be in sync. thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] TaoLv commented on issue #10104: [WIP][MXNET-107] Fused RNN implementation for CPU
TaoLv commented on issue #10104: [WIP][MXNET-107] Fused RNN implementation for CPU URL: https://github.com/apache/incubator-mxnet/pull/10104#issuecomment-386776831 rebase code to master branch and retrigger ci. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asitstands commented on issue #10768: Use numpy in RandomSampler
asitstands commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386774321 I think that conda has no special optimization for numpy's shuffle. Numpy's shuffle uses `n` times of element swaps in serial, where `n` is the size of the array, while mxnet's shuffle uses essetially the same number of memory copies in a way similar to parallel radix sort (except msvc build). Of course there are more details but this is the essential difference that makes the performance difference. The performance of parallel shuffle varies along with the environment, but it should be faster than numpy if the array size is not too small. `asnumpy` is a bulk copy of large memory. Its effect is not so prominent comparing to shuffle. The overhead of the serialization by mxnet's engine is also not important for large arrays. With 12525568 elements, mxnet's shuffle is 5~8 times faster than np in my test. For arrays larger than 2 elements, mxnet is always faster in my tests. I think that the only possible way that mxnet version is slower is low memory bandwidth of the underlying system (OS/hardware) which increase the effect of `asnumpy`. But the systems I used for tests don't have specially high bandwidth. So I think that my test result would be the general case. I don't have an access to AWS p3. Could you run my test code there? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asitstands commented on issue #10768: Use numpy in RandomSampler
asitstands commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386774321 I think that conda has no special optimization for numpy's shuffle. Numpy's shuffle uses `n` times of element swaps in serial, where `n` is the size of the array, while mxnet's shuffle uses essetially the same number of memory copies in a way similar to parallel radix sort (except msvc build). Of course there are more details but this is the essential difference that makes the performance difference. The performance of parallel shuffle varies along with the environment, but it should be faster than numpy if the array size is not too small. `asnumpy` is a bulk copy of large memory. Its effect is not so prominent comparing to shuffle. The overhead of the serialization by mxnet's engine is also not important for large arrays. With 12525568 elements, mxnet's shuffle is 5~8 times faster than np in my test. For arrays larger than 2 elements, mxnet is always faster in my tests. I think that the only possible way that mxnet version is slower is low memory bandwidth of the underlying system (OS/hardware) which increase the effect of `asnumpy`. But the systems I used for tests don't have specially high bandwidth. So I think that my test result would be the general case. I don't have access to AWS p3. Could you run my test code there? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asitstands commented on issue #10768: Use numpy in RandomSampler
asitstands commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386774321 I think that conda has no special optimization for numpy's shuffle. Numpy's shuffle uses `n` times of element swaps in serial, where `n` is the size of the array, while mxnet's shuffle uses essetially the same number of memory copies in a way similar to parallel radix sort (except msvc build). Of course there are more details but this is the essential difference that makes the performance difference. The performance of parallel shuffle varies along with the environment, but it should be faster than numpy if the array size is not too small. `asnumpy` is a bulk copy of large memory. Its effect is not so prominent comparing to shuffle. The overhead of the serialization by mxnet's engine is also not important for large arrays. With 12525568 elements, mxnet's shuffle is 5~8 times faster than np in my test. For arrays larger than 2 elements, mxnet is always faster in my tests. I think that the only possible way that mxnet version is slower is low memory bandwidth of the underlying system which increase the effect of `asnumpy`. But the systems I used for tests don't have specially high bandwidth. So I think that my test result would be the general case. I don't have access to AWS p3. Could you run my test code there? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asitstands commented on issue #10768: Use numpy in RandomSampler
asitstands commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386774321 I think that conda has no special optimization for numpy's shuffle. Numpy's shuffle uses `n` times of element swaps in serial, where `n` is the size of the array, while mxnet's shuffle uses the same number of memory copies in a way similar to parallel radix sort (except msvc build). Of course there are more details but this is the essential difference that makes the performance difference. The performance of parallel shuffle varies along with the environment, but it should be faster than numpy if the array size is not too small. `asnumpy` is a bulk copy of large memory. Its effect is not so prominent comparing to shuffle. The overhead of the serialization by mxnet's engine is also not important for large arrays. With 12525568 elements, mxnet's shuffle is 5~8 times faster than np in my test. For arrays larger than 2 elements, mxnet is always faster in my tests. I think that the only possible way that mxnet version is slower is low memory bandwidth of the underlying system which increase the effect of `asnumpy`. But the systems I used for tests don't have specially high bandwidth. So I think that my test result would be the general case. I don't have access to AWS p3. Could you run my test code there? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asitstands commented on issue #10768: Use numpy in RandomSampler
asitstands commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386774321 I think that conda has no special optimization for numpy's shuffle. Numpy's shuffle uses `n` times of swaps of elements in serial, where `n` is the size of the array, while mxnet's shuffle uses the same number of memory copies in a way similar to parallel radix sort (except msvc build). Of course there are more details but this is the essential difference that makes the performance difference. The performance of parallel shuffle varies along with the environment, but it should be faster than numpy if the array size is not too small. `asnumpy` is a bulk copy of large memory. Its effect is not so prominent comparing to shuffle. The overhead of the serialization by mxnet's engine is also not important for large arrays. With 12525568 elements, mxnet's shuffle is 5~8 times faster than np in my test. For arrays larger than 2 elements, mxnet is always faster in my tests. I think that the only possible way that mxnet version is slower is low memory bandwidth of the underlying system which increase the effect of `asnumpy`. But the systems I used for tests don't have specially high bandwidth. So I think that my test result would be the general case. I don't have access to AWS p3. Could you run my test code there? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wkcn commented on issue #10723: Adding custom C++ ops without modifying mxnet source
wkcn commented on issue #10723: Adding custom C++ ops without modifying mxnet source URL: https://github.com/apache/incubator-mxnet/issues/10723#issuecomment-386774110 I have tried it. It's available. The key point is to get the data pointer of NDArray using _LIB.MXNDArrayGetData Example: https://github.com/wkcn/MobulaOP-mx This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] anirudh2290 commented on issue #9118: argmax causes python VM to crash
anirudh2290 commented on issue #9118: argmax causes python VM to crash URL: https://github.com/apache/incubator-mxnet/issues/9118#issuecomment-386772793 @laszukdawid Please see: https://github.com/apache/incubator-mxnet/pull/9681 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation
rahul003 commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation URL: https://github.com/apache/incubator-mxnet/issues/8671#issuecomment-386770513 @szha Can we turn on the USE_LIBJPEG_TURBO flag. I find that it helps improve the speed of IO pipeline significantly? Results for Resnet50 v1 Imagenet, 480px resized data, batch size 1920, float16, symbolic, p3.16x Current package: 3600samples/sec With libjpeg-turbo: 4600samples/sec As an example, Here's how I used LIBJPEG_TURBO on ubuntu. ``` sudo apt-get install autoconf automake libtool nasm JPEG_TURBO_VERSION=1.5.2 && \ wget -q -O - https://github.com/libjpeg-turbo/libjpeg-turbo/archive/${JPEG_TURBO_VERSION}.tar.gz | tar -xzf - && \ cd libjpeg-turbo-${JPEG_TURBO_VERSION} && \ autoreconf -fiv && \ ./configure --enable-shared --prefix=/usr 2>&1 >/dev/null && \ sudo make -j"$(nproc)" install 2>&1 >/dev/null && \ rm -rf libjpeg-turbo-${JPEG_TURBO_VERSION} ``` Flags `USE_LIBJPEG_TURBO=1 USE_LIBJPEG_TURBO_PATH=/usr` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 7453bf9 Bump the publish timestamp. 7453bf9 is described below commit 7453bf9a43cd7496cddbffdf2f6e858f7ddfd3f8 Author: mxnet-ci AuthorDate: Sat May 5 01:56:43 2018 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..36dfffd --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Sat May 5 01:56:43 UTC 2018 -- To stop receiving notification emails like this one, please contact zhash...@apache.org.
[GitHub] indhub closed pull request #10621: [MXNET-340] Updated tutorials page.
indhub closed pull request #10621: [MXNET-340] Updated tutorials page. URL: https://github.com/apache/incubator-mxnet/pull/10621 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md index 94ea050b986..f69e1b41891 100644 --- a/docs/tutorials/index.md +++ b/docs/tutorials/index.md @@ -1,269 +1,149 @@ # Tutorials -MXNet has two primary high-level interfaces for its deep learning engine: the Gluon API and the Module API. Tutorials for each are provided below. +MXNet tutorials can be found in this section. A variety of language bindings are available for MXNet (including Python, Scala, C++ and R) and we have a different tutorial section for each language. -`TL;DR:` If you are new to deep learning or MXNet, you should start with the Gluon tutorials. +Are you new to MXNet, and don't have a preference on language? We currently recommend starting with Python, and specifically the Gluon APIs (versus Module APIs) as they're more flexible and easier to debug. -The difference between the two is an imperative versus symbolic programming style. Gluon makes it easy to prototype, build, and train deep learning models without sacrificing training speed by enabling both (1) intuitive imperative Python code development and (2) faster execution by automatically generating a symbolic execution graph using the hybridization feature. +Another great resource for learning MXNet is our [examples section](https://github.com/apache/incubator-mxnet/tree/master/example) which includes a wide variety of models (from basic to state-of-the-art) for a wide variety of tasks including: object detection, style transfer, reinforcement learning, and many others. -The Gluon and Module tutorials are in Python, but you can also find a variety of other MXNet tutorials, such as R, Scala, and C++ in the [Other Languages API Tutorials](#other-mxnet-api-tutorials) section below. + -[Example scripts and applications](#example-scripts-and-applications) as well as [contribution](#contributing-tutorials) info is below. +## Python Tutorials - +We have two types of API available for Python: Gluon APIs and Module APIs. [See here](/api/python/gluon/gluon.html) for a comparison. +A comprehensive introduction to Gluon can be found at [The Straight Dope](http://gluon.mxnet.io/). Structured like a book, it build up from first principles of deep learning and take a theoretical walkthrough of progressively more complex models using the Gluon API. Also check out the [60-Minute Gluon Crash Course](http://gluon-crash-course.mxnet.io/) if you're short on time or have used other deep learning frameworks before. -## Python API Tutorials +Use the tutorial selector below to filter to the relevant tutorials. You might see a download link in the top right corner of some tutorials. Use this to download a Jupyter Notebook version of the tutorial, and re-run and adjust the code as you wish. + + +Select API: Gluon Module - - - - - - Introduction - Applications - - - - - - - - Basics - Neural Networks - Advanced - - - - - - - - - + - - - - - -- [Manipulate data the MXNet way with ndarray](http://gluon.mxnet.io/chapter01_crashcourse/ndarray.html) - -- [Automatic differentiation with autograd](http://gluon.mxnet.io/chapter01_crashcourse/autograd.html) - -- [Linear regression with gluon](http://gluon.mxnet.io/chapter02_supervised-learning/linear-regression-gluon.html) - -- [Serialization - saving, loading and checkpointing](http://gluon.mxnet.io/chapter03_deep-neural-networks/serialization.html) - -- [Gluon Datasets and DataLoaders](http://mxnet.incubator.apache.org/tutorials/gluon/datasets.html) - - - - - - -- [Multilayer perceptrons in gluon](http://gluon.mxnet.io/chapter03_deep-neural-networks/mlp-gluon.html) - -- [Multi-class object detection using CNNs in gluon](http://gluon.mxnet.io/chapter04_convolutional-neural-networks/cnn-gluon.html) - -- [Advanced RNNs with gluon](http://gluon.mxnet.io/chapter05_recurrent-neural-networks/rnns-gluon.html) - - - - - - -- [Plumbing: A look under the hood of gluon](http://gluon.mxnet.io/chapter03_deep-neural-networks/plumbing.html) - -- [Designing a custom layer with gluon](/tutorials/gluon/custom_layer.html) - -- [Block and Parameter naming](/tutorials/gluon/naming.html) - -- [Fast, portable neural networks with Gluon HybridBlocks](http://gluon.mxnet.io/chapter07_distributed-learning/hybridize.html) - -- [Training on multiple GPUs with gluon](http://gluon.mxnet.io/chapter07_distributed-learning/multiple-gpus-gluon.html) - -- [Applying data augmentation](/tutorials/gluon/data_augmentation.html) - - - - - - - - -- [Creating custom operators with numpy](/tutorials/gluon/customop.html) -
[incubator-mxnet] branch master updated: [MXNET-340] Updated tutorials page. (#10621)
This is an automated email from the ASF dual-hosted git repository. indhub pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 97511ba [MXNET-340] Updated tutorials page. (#10621) 97511ba is described below commit 97511ba943c436492ae044ae0de2046cd89621bf Author: Thom LaneAuthorDate: Fri May 4 18:54:39 2018 -0700 [MXNET-340] Updated tutorials page. (#10621) * Updated tutorials page. * Combined tutorial links Added "alternative" links. * Corrected typo * Force build. * Force build #2 * Force build #3 * Force #4 --- docs/tutorials/index.md | 346 1 file changed, 113 insertions(+), 233 deletions(-) diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md index 94ea050..f69e1b4 100644 --- a/docs/tutorials/index.md +++ b/docs/tutorials/index.md @@ -1,269 +1,149 @@ # Tutorials -MXNet has two primary high-level interfaces for its deep learning engine: the Gluon API and the Module API. Tutorials for each are provided below. +MXNet tutorials can be found in this section. A variety of language bindings are available for MXNet (including Python, Scala, C++ and R) and we have a different tutorial section for each language. -`TL;DR:` If you are new to deep learning or MXNet, you should start with the Gluon tutorials. +Are you new to MXNet, and don't have a preference on language? We currently recommend starting with Python, and specifically the Gluon APIs (versus Module APIs) as they're more flexible and easier to debug. -The difference between the two is an imperative versus symbolic programming style. Gluon makes it easy to prototype, build, and train deep learning models without sacrificing training speed by enabling both (1) intuitive imperative Python code development and (2) faster execution by automatically generating a symbolic execution graph using the hybridization feature. +Another great resource for learning MXNet is our [examples section](https://github.com/apache/incubator-mxnet/tree/master/example) which includes a wide variety of models (from basic to state-of-the-art) for a wide variety of tasks including: object detection, style transfer, reinforcement learning, and many others. -The Gluon and Module tutorials are in Python, but you can also find a variety of other MXNet tutorials, such as R, Scala, and C++ in the [Other Languages API Tutorials](#other-mxnet-api-tutorials) section below. + -[Example scripts and applications](#example-scripts-and-applications) as well as [contribution](#contributing-tutorials) info is below. +## Python Tutorials - +We have two types of API available for Python: Gluon APIs and Module APIs. [See here](/api/python/gluon/gluon.html) for a comparison. +A comprehensive introduction to Gluon can be found at [The Straight Dope](http://gluon.mxnet.io/). Structured like a book, it build up from first principles of deep learning and take a theoretical walkthrough of progressively more complex models using the Gluon API. Also check out the [60-Minute Gluon Crash Course](http://gluon-crash-course.mxnet.io/) if you're short on time or have used other deep learning frameworks before. -## Python API Tutorials +Use the tutorial selector below to filter to the relevant tutorials. You might see a download link in the top right corner of some tutorials. Use this to download a Jupyter Notebook version of the tutorial, and re-run and adjust the code as you wish. + + +Select API: Gluon Module - - - - - - Introduction - Applications - - - - - - - - Basics - Neural Networks - Advanced - - - - - - - - - + - - - - - -- [Manipulate data the MXNet way with ndarray](http://gluon.mxnet.io/chapter01_crashcourse/ndarray.html) - -- [Automatic differentiation with autograd](http://gluon.mxnet.io/chapter01_crashcourse/autograd.html) - -- [Linear regression with gluon](http://gluon.mxnet.io/chapter02_supervised-learning/linear-regression-gluon.html) - -- [Serialization - saving, loading and checkpointing](http://gluon.mxnet.io/chapter03_deep-neural-networks/serialization.html) - -- [Gluon Datasets and DataLoaders](http://mxnet.incubator.apache.org/tutorials/gluon/datasets.html) - - - - - - -- [Multilayer perceptrons in gluon](http://gluon.mxnet.io/chapter03_deep-neural-networks/mlp-gluon.html) - -- [Multi-class object detection using CNNs in gluon](http://gluon.mxnet.io/chapter04_convolutional-neural-networks/cnn-gluon.html) - -- [Advanced RNNs with gluon](http://gluon.mxnet.io/chapter05_recurrent-neural-networks/rnns-gluon.html) - - - - - - -- [Plumbing: A look under the hood of gluon](http://gluon.mxnet.io/chapter03_deep-neural-networks/plumbing.html) - -- [Designing a custom layer with gluon](/tutorials/gluon/custom_layer.html) - -- [Block
[GitHub] szha commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation
szha commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation URL: https://github.com/apache/incubator-mxnet/issues/8671#issuecomment-386770801 @rahul003 thanks for the suggestion. I will certainly take a look. For these dependencies, my approach have been to statically link them, so some steps will be different. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation
szha commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation URL: https://github.com/apache/incubator-mxnet/issues/8671#issuecomment-386770801 @rahul003 thanks for the suggestion. I will certainly take a look. For these dependencies, my approach have been to statically link them. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation
rahul003 commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation URL: https://github.com/apache/incubator-mxnet/issues/8671#issuecomment-386770513 @szha Can we turn on the USE_LIBJPEG_TURBO flag. I find that it helps improve the speed of IO pipeline significantly? Results for Resnet50 v1 Imagenet, 480px resized data, batch size 1920, float16, symbolic: Current package: 3600samples/sec With libjpeg-turbo: 4600samples/sec As an example, Here's how I used LIBJPEG_TURBO on ubuntu. ``` sudo apt-get install autoconf automake libtool nasm JPEG_TURBO_VERSION=1.5.2 && \ wget -q -O - https://github.com/libjpeg-turbo/libjpeg-turbo/archive/${JPEG_TURBO_VERSION}.tar.gz | tar -xzf - && \ cd libjpeg-turbo-${JPEG_TURBO_VERSION} && \ autoreconf -fiv && \ ./configure --enable-shared --prefix=/usr 2>&1 >/dev/null && \ sudo make -j"$(nproc)" install 2>&1 >/dev/null && \ rm -rf libjpeg-turbo-${JPEG_TURBO_VERSION} ``` Flags `USE_LIBJPEG_TURBO=1 USE_LIBJPEG_TURBO_PATH=/usr` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation
rahul003 commented on issue #8671: Discussion and troubleshooting on PyPI (pip) installation URL: https://github.com/apache/incubator-mxnet/issues/8671#issuecomment-386770513 @szha Can we turn on the USE_LIBJPEG_TURBO flag. I find that it helps improve the speed of IO pipeline significantly? Results for Resnet50 v1 Imagenet, 480px resized data, batch size 1920, float16, symbolic: Current: 3600samples/sec With libjpeg-turbo: 4600samples/sec As an example, Here's how I used LIBJPEG_TURBO on ubuntu. ``` sudo apt-get install autoconf automake libtool nasm JPEG_TURBO_VERSION=1.5.2 && \ wget -q -O - https://github.com/libjpeg-turbo/libjpeg-turbo/archive/${JPEG_TURBO_VERSION}.tar.gz | tar -xzf - && \ cd libjpeg-turbo-${JPEG_TURBO_VERSION} && \ autoreconf -fiv && \ ./configure --enable-shared --prefix=/usr 2>&1 >/dev/null && \ sudo make -j"$(nproc)" install 2>&1 >/dev/null && \ rm -rf libjpeg-turbo-${JPEG_TURBO_VERSION} ``` Flags `USE_LIBJPEG_TURBO=1 USE_LIBJPEG_TURBO_PATH=/usr` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhreshold commented on issue #10820: fix thread contention caused by openmp
zhreshold commented on issue #10820: fix thread contention caused by openmp URL: https://github.com/apache/incubator-mxnet/pull/10820#issuecomment-386769536 @piiswrong Can you have a look? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhreshold opened a new pull request #10820: fix thread contention caused by openmp
zhreshold opened a new pull request #10820: fix thread contention caused by openmp URL: https://github.com/apache/incubator-mxnet/pull/10820 ## Description ## fix thread contention caused by openmp. This helps improve gluon.data.dataLoader perf when num_workers is large. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashokei commented on issue #10591: [MXNET-365] handle inplace in mkldnn FallBackCompute
ashokei commented on issue #10591: [MXNET-365] handle inplace in mkldnn FallBackCompute URL: https://github.com/apache/incubator-mxnet/pull/10591#issuecomment-386768975 @marcoabreu is this something you can merge. thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashokei opened a new pull request #10819: [MXNET-367] update mkldnn to v0.14 and disable building test examples
ashokei opened a new pull request #10819: [MXNET-367] update mkldnn to v0.14 and disable building test examples URL: https://github.com/apache/incubator-mxnet/pull/10819 ## Description ## Resubmitting PR. updated mkldnn submodule to latest release version, and disabled unnecessary build steps for mkldnn-examples/test ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hetong007 opened a new issue #10818: Feature Request: Improve ndarray.pad to be an numpy.pad equivalent
hetong007 opened a new issue #10818: Feature Request: Improve ndarray.pad to be an numpy.pad equivalent URL: https://github.com/apache/incubator-mxnet/issues/10818 ## Current Limitations Comparing to `numpy.pad`, the current (https://github.com/apache/incubator-mxnet/commit/3bba4c8f6362df8b3355404002eab6a6c88123d6) `mxnet.ndarray.pad` has the following limitations: 1. It only works with real data type (being fixed in https://github.com/apache/incubator-mxnet/pull/10815) 2. It only works with 4-D or 5-D ndarray: https://github.com/apache/incubator-mxnet/blob/master/src/operator/pad-inl.h#L205 2. It doesn't allow padding on the first two dimensions: https://github.com/apache/incubator-mxnet/blob/master/src/operator/pad-inl.h#L208 ## A Scenario with Difficulties In resnet training on CIFAR10, the standard data augmentation includes: 1. Pad the input 32x32 image to 40x40 with zeros. 2. Randomly crop a 32x32 area from the padded array. In the current `mxnet.image` implementations, the image at the crop stage is of shape `Height x Width x Channel`. To make use of our `nd.pad`, one needs to 1. Cast the image from `int` to `float32`. 2. Expand the dimension of the image to be `Null x Null x Height x Width x Channel` 3. Apply `nd.pad` on the 2nd and 3rd (Height, Width) dimensions 4. Squeeze the array back to `Height x Width x Channel` The above combination of operations introduces unacceptable overhead for CIFAR10 training. A slightly faster workaround is to cast the array to `numpy.array` and apply `np.pad`, but it still introduces unnecessary costs. Without the limitations, one can simply apply `nd.pad` on the 0th and 1st dimension, i.e. Height and Width in the image array, at no extra cost. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sl1pkn07 commented on issue #10558: NNVM build failed in the newst mxnet version
sl1pkn07 commented on issue #10558: NNVM build failed in the newst mxnet version URL: https://github.com/apache/incubator-mxnet/issues/10558#issuecomment-386758859 seems is better select what is not valid instead of what yes. in my case, all more than 6.3.1 and 5.5.0 i need try with 6.4.0, because with 6.4.1 fails. greetings This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sl1pkn07 commented on issue #10558: NNVM build failed in the newst mxnet version
sl1pkn07 commented on issue #10558: NNVM build failed in the newst mxnet version URL: https://github.com/apache/incubator-mxnet/issues/10558#issuecomment-386758859 seems is better select what is not valid instead of what yes. in my case, all more than 6.3.1 and 5.5.0 i need try with 6.4.0, because with 6.4.1 fails greetings This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #10816: Update threaded_engine.cc
piiswrong commented on issue #10816: Update threaded_engine.cc URL: https://github.com/apache/incubator-mxnet/pull/10816#issuecomment-386755763 It shouldn't have any effect for any use cases currently supported. I encountered this when working on something new. I'll add tests and explanation in later PRs This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: fix (#10814)
This is an automated email from the ASF dual-hosted git repository. jxie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 3bba4c8 fix (#10814) 3bba4c8 is described below commit 3bba4c8f6362df8b3355404002eab6a6c88123d6 Author: Eric Junyuan XieAuthorDate: Fri May 4 15:59:20 2018 -0700 fix (#10814) --- python/mxnet/gluon/parameter.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/mxnet/gluon/parameter.py b/python/mxnet/gluon/parameter.py index 04694df..a3a1e32 100644 --- a/python/mxnet/gluon/parameter.py +++ b/python/mxnet/gluon/parameter.py @@ -366,7 +366,7 @@ class Parameter(object): self.shape = data.shape if self._data is None: -assert self._deferred_init is not None, \ +assert self._deferred_init, \ "Parameter '%s' has not been initialized"%self.name self._deferred_init = self._deferred_init[:3] + (data,) return -- To stop receiving notification emails like this one, please contact j...@apache.org.
[GitHub] piiswrong closed pull request #10814: fix a bug for deferred init
piiswrong closed pull request #10814: fix a bug for deferred init URL: https://github.com/apache/incubator-mxnet/pull/10814 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/python/mxnet/gluon/parameter.py b/python/mxnet/gluon/parameter.py index 04694dfa545..a3a1e32c0a7 100644 --- a/python/mxnet/gluon/parameter.py +++ b/python/mxnet/gluon/parameter.py @@ -366,7 +366,7 @@ def set_data(self, data): self.shape = data.shape if self._data is None: -assert self._deferred_init is not None, \ +assert self._deferred_init, \ "Parameter '%s' has not been initialized"%self.name self._deferred_init = self._deferred_init[:3] + (data,) return This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #10814: fix a bug for deferred init
piiswrong commented on issue #10814: fix a bug for deferred init URL: https://github.com/apache/incubator-mxnet/pull/10814#issuecomment-386755521 No. It's just unclear error message. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] thomelane commented on issue #10621: [MXNET-340] Updated tutorials page.
thomelane commented on issue #10621: [MXNET-340] Updated tutorials page. URL: https://github.com/apache/incubator-mxnet/pull/10621#issuecomment-386753796 @piiswrong fixed as suggested, but kept an alternative link in brackets. @indhub are you able to merge now we have a successful build? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong opened a new pull request #10817: [WIP] Do Not Merge. Static memory allocation for cached_op
piiswrong opened a new pull request #10817: [WIP] Do Not Merge. Static memory allocation for cached_op URL: https://github.com/apache/incubator-mxnet/pull/10817 ## Description ## (Brief description on what this PR is about) ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #10558: NNVM build failed in the newst mxnet version
rahul003 commented on issue #10558: NNVM build failed in the newst mxnet version URL: https://github.com/apache/incubator-mxnet/issues/10558#issuecomment-386752885 Good to know @sl1pkn07. Thanks for verifying that this works. That's a good idea, the only concern is that we might need to come up with an exhaustive of list of which compilers work and which don't on different platforms and I'm not sure how that would work. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] laszukdawid commented on issue #9118: argmax causes python VM to crash
laszukdawid commented on issue #9118: argmax causes python VM to crash URL: https://github.com/apache/incubator-mxnet/issues/9118#issuecomment-386751353 @nswamy @anirudh2290 what's the commit sha? I'd like to follow this ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on issue #10816: Update threaded_engine.cc
ThomasDelteil commented on issue #10816: Update threaded_engine.cc URL: https://github.com/apache/incubator-mxnet/pull/10816#issuecomment-386751129 As we are all trying to get up to speed, it would be nice to explain why this is necessary, and what bug that caused. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong opened a new pull request #10816: Update threaded_engine.cc
piiswrong opened a new pull request #10816: Update threaded_engine.cc URL: https://github.com/apache/incubator-mxnet/pull/10816 ## Description ## (Brief description on what this PR is about) ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sl1pkn07 commented on issue #10558: NNVM build failed in the newst mxnet version
sl1pkn07 commented on issue #10558: NNVM build failed in the newst mxnet version URL: https://github.com/apache/incubator-mxnet/issues/10558#issuecomment-386743439 yes, seems GCC 5.4.0 and GCC 6.3.1(snapshot used in archlinux) build OK without get rid `-O3` in all makefiles/cmakelists scripts in both cases, i use GCC 7.3.1 (failed in GCC 8.1.0) for the main compiler and GCC 5.4.0/6.3.1 as `CUDA_HOST_COMPILER` (ccbin) for avoid this, is possible add a check in makeflie/cmake if use "bad" version of gcc for CUDA ccbin (or main compiler if not set `CMAKE_HOST_COMPILER`? greetings This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sl1pkn07 commented on issue #10558: NNVM build failed in the newst mxnet version
sl1pkn07 commented on issue #10558: NNVM build failed in the newst mxnet version URL: https://github.com/apache/incubator-mxnet/issues/10558#issuecomment-386743439 yes, seems GCC 5.4.0 and GCC 6.3.1(snapshot used in archlinux) build OK without get rid `-O3` in all makefiles/cmakelists scripts in both cases, i use GCC 7.3.1 (failed in GCC 8.1.0) for the main compiler and GCC 5.4.0/6.3.1 as `CUDA_HOST_COMPILER` (ccbin) for avoid this, is possible add a check in makeflie/cmake if use "bad" version of gcc for CUDA ccbin? greetings This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] leezu commented on issue #10768: Use numpy in RandomSampler
leezu commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386742567 @asitstands I guess the difference between our experiments is that I used a optimized numpy from conda and the standard [mxnet pypi build](https://pypi.org/pypi/mxnet). Using both optimized numpy and an optimized mxnet build on AWS p3 instance I do observe like you that mxnet is faster for small sizes (4): ~500μs vs ~800μs of numpy For large sizes (12525568) the `asnumpy()` overhead is however large and the numpy version takes just 180ms compared to 600ms with the mxnet code. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhanghang1989 opened a new pull request #10815: [MXNET-402] add integer type for pad
zhanghang1989 opened a new pull request #10815: [MXNET-402] add integer type for pad URL: https://github.com/apache/incubator-mxnet/pull/10815 ## Description ## Add integer pad @hetong007 ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [x] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong closed pull request #10810: Fix Reorder2Default
piiswrong closed pull request #10810: Fix Reorder2Default URL: https://github.com/apache/incubator-mxnet/pull/10810 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/ndarray/ndarray.cc b/src/ndarray/ndarray.cc index 82de0949ccc..aeb3c63ef34 100644 --- a/src/ndarray/ndarray.cc +++ b/src/ndarray/ndarray.cc @@ -348,7 +348,8 @@ void NDArray::Chunk::Reorder2Default() { return; mkldnn_memory_format_t format = mkl_mem_->GetDefaultFormat(); - CHECK_NE(format, mkl_mem_->GetFormat()); + if (format == mkl_mem_->GetFormat()) +return; mkldnn::memory::primitive_desc def_pd = mkl_mem_->GetPrimitiveDesc(format); mkldnn_mem_ptr def_mem(new mkldnn::memory(def_pd)); This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: Reorder2Default: return directly for default format (#10810)
This is an automated email from the ASF dual-hosted git repository. jxie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new ab6a25e Reorder2Default: return directly for default format (#10810) ab6a25e is described below commit ab6a25ea2f184998cc472a98f1b0f4808c89211a Author: Tao LvAuthorDate: Sat May 5 05:20:44 2018 +0800 Reorder2Default: return directly for default format (#10810) --- src/ndarray/ndarray.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/ndarray/ndarray.cc b/src/ndarray/ndarray.cc index a28a907..a643da1 100644 --- a/src/ndarray/ndarray.cc +++ b/src/ndarray/ndarray.cc @@ -348,7 +348,8 @@ void NDArray::Chunk::Reorder2Default() { return; mkldnn_memory_format_t format = mkl_mem_->GetDefaultFormat(); - CHECK_NE(format, mkl_mem_->GetFormat()); + if (format == mkl_mem_->GetFormat()) +return; mkldnn::memory::primitive_desc def_pd = mkl_mem_->GetPrimitiveDesc(format); mkldnn_mem_ptr def_mem(new mkldnn::memory(def_pd)); -- To stop receiving notification emails like this one, please contact j...@apache.org.
[GitHub] piiswrong opened a new pull request #10814: fix a bug for deferred init
piiswrong opened a new pull request #10814: fix a bug for deferred init URL: https://github.com/apache/incubator-mxnet/pull/10814 ## Description ## (Brief description on what this PR is about) ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #10766: Bug Cannot save/load params with Gluon model
piiswrong commented on issue #10766: Bug Cannot save/load params with Gluon model URL: https://github.com/apache/incubator-mxnet/issues/10766#issuecomment-386734298 You cannot set_data before you initialize the model. @ariwaranosai yes that's the issue. I'm making a fix This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: Fix a mem error. (#10812)
This is an automated email from the ASF dual-hosted git repository. jxie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new e49fdae Fix a mem error. (#10812) e49fdae is described below commit e49fdaefd7017005aaed968f66413a0e2ef4a3b9 Author: Da ZhengAuthorDate: Fri May 4 13:44:26 2018 -0700 Fix a mem error. (#10812) --- include/mxnet/ndarray.h | 5 + src/ndarray/ndarray.cc | 6 ++ 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/include/mxnet/ndarray.h b/include/mxnet/ndarray.h index 6fda8c3..e243eb7 100644 --- a/include/mxnet/ndarray.h +++ b/include/mxnet/ndarray.h @@ -678,10 +678,7 @@ class NDArray { */ NDArray Reorder2Default() const; - void InvalidateMKLDNNData() { -// Removing mkl_mem_ means the NDArray will store data in the default format. -ptr_->mkl_mem_ = nullptr; - } + void InvalidateMKLDNNData(); /* * This function is used inside operators to reshape an array. diff --git a/src/ndarray/ndarray.cc b/src/ndarray/ndarray.cc index 82de094..a28a907 100644 --- a/src/ndarray/ndarray.cc +++ b/src/ndarray/ndarray.cc @@ -620,6 +620,12 @@ const mkldnn::memory *NDArray::GetMKLDNNData() const { } } +void NDArray::InvalidateMKLDNNData() { + // Removing mkl_mem_ means the NDArray will store data in the default format. + if (ptr_->mkl_mem_ && ptr_->mkl_mem_->IsMKLDNN()) +ptr_->mkl_mem_ = nullptr; +} + void NDArray::CopyFrom(const mkldnn::memory ) { CHECK(ptr_ != nullptr) << "The NDArray hasn't been initialized"; if (ptr_->mkl_mem_ && ptr_->mkl_mem_->GetRaw() == ) -- To stop receiving notification emails like this one, please contact j...@apache.org.
[GitHub] piiswrong closed pull request #10812: Fix a mem error.
piiswrong closed pull request #10812: Fix a mem error. URL: https://github.com/apache/incubator-mxnet/pull/10812 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/include/mxnet/ndarray.h b/include/mxnet/ndarray.h index 6fda8c37b41..e243eb71c47 100644 --- a/include/mxnet/ndarray.h +++ b/include/mxnet/ndarray.h @@ -678,10 +678,7 @@ class NDArray { */ NDArray Reorder2Default() const; - void InvalidateMKLDNNData() { -// Removing mkl_mem_ means the NDArray will store data in the default format. -ptr_->mkl_mem_ = nullptr; - } + void InvalidateMKLDNNData(); /* * This function is used inside operators to reshape an array. diff --git a/src/ndarray/ndarray.cc b/src/ndarray/ndarray.cc index 82de0949ccc..a28a907a941 100644 --- a/src/ndarray/ndarray.cc +++ b/src/ndarray/ndarray.cc @@ -620,6 +620,12 @@ const mkldnn::memory *NDArray::GetMKLDNNData() const { } } +void NDArray::InvalidateMKLDNNData() { + // Removing mkl_mem_ means the NDArray will store data in the default format. + if (ptr_->mkl_mem_ && ptr_->mkl_mem_->IsMKLDNN()) +ptr_->mkl_mem_ = nullptr; +} + void NDArray::CopyFrom(const mkldnn::memory ) { CHECK(ptr_ != nullptr) << "The NDArray hasn't been initialized"; if (ptr_->mkl_mem_ && ptr_->mkl_mem_->GetRaw() == ) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong closed pull request #10800: Update index.md
piiswrong closed pull request #10800: Update index.md URL: https://github.com/apache/incubator-mxnet/pull/10800 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: Update index.md (#10800)
This is an automated email from the ASF dual-hosted git repository. jxie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new aabaab4 Update index.md (#10800) aabaab4 is described below commit aabaab418eab303c64042cf068d2f47965f55086 Author: Eric Junyuan XieAuthorDate: Fri May 4 13:42:54 2018 -0700 Update index.md (#10800) * Update index.md * Update index.md --- docs/api/python/index.md | 9 + 1 file changed, 9 insertions(+) diff --git a/docs/api/python/index.md b/docs/api/python/index.md index 88e8031..54aaef1 100644 --- a/docs/api/python/index.md +++ b/docs/api/python/index.md @@ -140,6 +140,15 @@ Code examples are placed throughout the API documentation and these can be run a metric/metric.md ``` +## Profiler API + +```eval_rst +.. toctree:: + :maxdepth: 1 + + profiler/profiler.md +``` + ## Run-Time Compilation API ```eval_rst -- To stop receiving notification emails like this one, please contact j...@apache.org.
[GitHub] rahul003 commented on issue #10778: Make android error
rahul003 commented on issue #10778: Make android error URL: https://github.com/apache/incubator-mxnet/issues/10778#issuecomment-386724201 Are you trying to cross compile for android? Does using USE_F16C=0 build flag help? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #10778: Make android error
rahul003 commented on issue #10778: Make android error URL: https://github.com/apache/incubator-mxnet/issues/10778#issuecomment-386724201 Does using USE_F16C=0 help? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] leezu opened a new pull request #10813: Fix context handling when creating sparse arrays from definition
leezu opened a new pull request #10813: Fix context handling when creating sparse arrays from definition URL: https://github.com/apache/incubator-mxnet/pull/10813 ## Description ## @eric-haibin-lin Currently when creating a sparse array from definition (ie. dense data array and indices) without specifying a context, mxnet assumes it should use the default context (eg CPU). This leads to the following crash in case that the dense data and indices arrays live on GPU. `Default GPU stream was used when MSHADOW_FORCE_STREAM was on` This PR fixes the issue by making sure the data and indices arrays live on the chosen context. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhreshold closed pull request #10794: fix topk nms in multibox_detection operator
zhreshold closed pull request #10794: fix topk nms in multibox_detection operator URL: https://github.com/apache/incubator-mxnet/pull/10794 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/operator/contrib/multibox_detection.cc b/src/operator/contrib/multibox_detection.cc index a2e681a8e60..112c033552e 100644 --- a/src/operator/contrib/multibox_detection.cc +++ b/src/operator/contrib/multibox_detection.cc @@ -142,7 +142,11 @@ inline void MultiBoxDetectionForward(const Tensor, DType *ptemp = temp_space.dptr_ + nbatch * num_anchors * 6; int nkeep = static_cast(sorter.size()); if (nms_topk > 0 && nms_topk < nkeep) { + // keep topk detections nkeep = nms_topk; + for (int i = nkeep; i < valid_count; ++i) { +p_out[i * 6] = -1; + } } for (int i = 0; i < nkeep; ++i) { for (int j = 0; j < 6; ++j) { @@ -150,10 +154,10 @@ inline void MultiBoxDetectionForward(const Tensor , } } // apply nms -for (int i = 0; i < valid_count; ++i) { +for (int i = 0; i < nkeep; ++i) { int offset_i = i * 6; if (p_out[offset_i] < 0) continue; // skip eliminated - for (int j = i + 1; j < valid_count; ++j) { + for (int j = i + 1; j < nkeep; ++j) { int offset_j = j * 6; if (p_out[offset_j] < 0) continue; // skip eliminated if (force_suppress || (p_out[offset_i] == p_out[offset_j])) { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhreshold commented on issue #10794: fix topk nms in multibox_detection operator
zhreshold commented on issue #10794: fix topk nms in multibox_detection operator URL: https://github.com/apache/incubator-mxnet/pull/10794#issuecomment-386713496 Merged, thanks for the contribution! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: fix topk nms in multibox_detection operator (#10794)
This is an automated email from the ASF dual-hosted git repository. zhreshold pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 38ec93c fix topk nms in multibox_detection operator (#10794) 38ec93c is described below commit 38ec93cbf32ddea921319cf0fd8e446b7e3e9b6b Author: Wang JiajunAuthorDate: Sat May 5 03:47:40 2018 +0800 fix topk nms in multibox_detection operator (#10794) --- src/operator/contrib/multibox_detection.cc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/operator/contrib/multibox_detection.cc b/src/operator/contrib/multibox_detection.cc index a2e681a..112c033 100644 --- a/src/operator/contrib/multibox_detection.cc +++ b/src/operator/contrib/multibox_detection.cc @@ -142,7 +142,11 @@ inline void MultiBoxDetectionForward(const Tensor , DType *ptemp = temp_space.dptr_ + nbatch * num_anchors * 6; int nkeep = static_cast(sorter.size()); if (nms_topk > 0 && nms_topk < nkeep) { + // keep topk detections nkeep = nms_topk; + for (int i = nkeep; i < valid_count; ++i) { +p_out[i * 6] = -1; + } } for (int i = 0; i < nkeep; ++i) { for (int j = 0; j < 6; ++j) { @@ -150,10 +154,10 @@ inline void MultiBoxDetectionForward(const Tensor , } } // apply nms -for (int i = 0; i < valid_count; ++i) { +for (int i = 0; i < nkeep; ++i) { int offset_i = i * 6; if (p_out[offset_i] < 0) continue; // skip eliminated - for (int j = i + 1; j < valid_count; ++j) { + for (int j = i + 1; j < nkeep; ++j) { int offset_j = j * 6; if (p_out[offset_j] < 0) continue; // skip eliminated if (force_suppress || (p_out[offset_i] == p_out[offset_j])) { -- To stop receiving notification emails like this one, please contact zhresh...@apache.org.
[GitHub] snflake commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available
snflake commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available URL: https://github.com/apache/incubator-mxnet/pull/10804#issuecomment-386712514 I got similar runtime with MobileNet v2 on laptop (Nvidia Quadro M1000M) using custom kernel and grouped conv by cudnn 7. IMO, we should always use cudnn if cudnn 7 is available. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hetong007 commented on issue #10123: train_cifar10.py hangs on first epoch in debug mode (4 P100 GPUs)
hetong007 commented on issue #10123: train_cifar10.py hangs on first epoch in debug mode (4 P100 GPUs) URL: https://github.com/apache/incubator-mxnet/issues/10123#issuecomment-386704607 Can you try the train_cifar10.py script at: http://gluon-cv.mxnet.io/model_zoo/index.html#image-classification This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da opened a new pull request #10812: Fix a mem error.
zheng-da opened a new pull request #10812: Fix a mem error. URL: https://github.com/apache/incubator-mxnet/pull/10812 ## Description ## This memory error has been discussed in the dev mailing list and this error can be reproduced with the following commands. ``` export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0 export MXNET_TEST_SEED=11 export MXNET_MODULE_SEED=812478194 export MXNET_TEST_COUNT=1 nosetests-2.7 -v tests/python/unittest/test_module.py:test_forward_reshape ``` This is a temporary fix. The error is caused by a race condition that the MKLDNN memory in an output NDArray is removed when some MKLDNN operator tries to read the MKLDNN memory from its input arrays. However, the race condition shouldn't happen. The execution engine schedules the execution of computation based on the data dependency. When an operator is scheduled to write data to an output NDArray, any operator that reads data from the NDArray shouldn't be scheduled for execution. But we actually observe that the input array of an operator is modified when the operator is running, which suggests that the race condition can mess up data in the input NDArray even without MKLDNN. So we need a fundamental fix for this bug. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch piiswrong-patch-2 updated (0d1927c -> f6f67e8)
This is an automated email from the ASF dual-hosted git repository. jxie pushed a change to branch piiswrong-patch-2 in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. from 0d1927c Update index.md add f6f67e8 Update index.md No new revisions were added by this update. Summary of changes: docs/api/python/index.md | 1 - 1 file changed, 1 deletion(-) -- To stop receiving notification emails like this one, please contact j...@apache.org.
[GitHub] sergeykolychev commented on issue #10791: Unable to install mxnet in R 3.5.0
sergeykolychev commented on issue #10791: Unable to install mxnet in R 3.5.0 URL: https://github.com/apache/incubator-mxnet/issues/10791#issuecomment-386665520 @marcoabreu Sorry, I am a perl maintainer not R This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] piiswrong commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available
piiswrong commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available URL: https://github.com/apache/incubator-mxnet/pull/10804#issuecomment-386662839 How does cudnn implementation compare to the custom kernels from tf? Should we always use cudnn? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] arcadiaphy commented on issue #10794: fix topk nms in multibox_detection operator
arcadiaphy commented on issue #10794: fix topk nms in multibox_detection operator URL: https://github.com/apache/incubator-mxnet/pull/10794#issuecomment-386660719 Done, I find the last commit that has passed the CI and cherry-pick on it This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on a change in pull request #10797: Better cleaning of git repo, signal handlers
larroy commented on a change in pull request #10797: Better cleaning of git repo, signal handlers URL: https://github.com/apache/incubator-mxnet/pull/10797#discussion_r186132116 ## File path: ci/docker/runtime_functions.sh ## @@ -353,7 +369,18 @@ sanity_check() { } + +before_unittest() { Review comment: The machine name is the node name, this prints the instance id, so you can ssh directly in case you have a corefile for example. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #10797: Better cleaning of git repo, signal handlers
marcoabreu commented on a change in pull request #10797: Better cleaning of git repo, signal handlers URL: https://github.com/apache/incubator-mxnet/pull/10797#discussion_r186121711 ## File path: ci/docker/runtime_functions.sh ## @@ -353,7 +369,18 @@ sanity_check() { } + +before_unittest() { Review comment: In how far is it less comfortable to have it in build.py? It removes a lot of boilerplate code from the runtime functions which are supposed to be as clean and small as possible. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #10797: Better cleaning of git repo, signal handlers
marcoabreu commented on a change in pull request #10797: Better cleaning of git repo, signal handlers URL: https://github.com/apache/incubator-mxnet/pull/10797#discussion_r186121711 ## File path: ci/docker/runtime_functions.sh ## @@ -353,7 +369,18 @@ sanity_check() { } + +before_unittest() { Review comment: In how far is it less comfortable to have it in build.py? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu closed pull request #10811: Update emails for build failures in Jenkins
marcoabreu closed pull request #10811: Update emails for build failures in Jenkins URL: https://github.com/apache/incubator-mxnet/pull/10811 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/Jenkinsfile b/Jenkinsfile index 7a08acc38a5..eb23d4096cb 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -788,7 +788,10 @@ try { node("mxnetlinux-cpu") { // Only send email if master failed if (currentBuild.result == "FAILURE" && env.BRANCH_NAME == "master") { - emailext body: 'Build for MXNet branch ${BRANCH_NAME} has broken. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[BUILD FAILED] Branch ${BRANCH_NAME} build ${BUILD_NUMBER}', to: '${EMAIL}' + emailext body: 'Build for MXNet branch ${env.BRANCH_NAME}:${env.GIT_COMMIT} has broken. Please view the build at ${BUILD_URL}', + replyTo: 'd...@mxnet.incubator.apache.org', + subject: '[MXNet CI] ${env.BRANCH_NAME} build ${env.BUILD_NUMBER} failed', + to: 'd...@mxnet.incubator.apache.org' } // Remember to rethrow so the build is marked as failing if (err) { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #10811: Update emails for build failures in Jenkins
marcoabreu commented on issue #10811: Update emails for build failures in Jenkins URL: https://github.com/apache/incubator-mxnet/pull/10811#issuecomment-386638736 This is disabled on purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] IFeelBloated commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available
IFeelBloated commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available URL: https://github.com/apache/incubator-mxnet/pull/10804#issuecomment-386635183 I have been working on something recently with heavy use of ResNeXt building blocks, would be nice to have grouped convolutions directly backed by cudnn7 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dwSun commented on issue #10810: Fix Reorder2Default
dwSun commented on issue #10810: Fix Reorder2Default URL: https://github.com/apache/incubator-mxnet/pull/10810#issuecomment-386624098 quick response. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on a change in pull request #10797: Better cleaning of git repo, signal handlers
larroy commented on a change in pull request #10797: Better cleaning of git repo, signal handlers URL: https://github.com/apache/incubator-mxnet/pull/10797#discussion_r186105158 ## File path: ci/docker/runtime_functions.sh ## @@ -353,7 +369,18 @@ sanity_check() { } + +before_unittest() { Review comment: that is not very comfortable, we should make it easy to investigate problems. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy opened a new pull request #10811: Update emails for build failures in Jenkins
larroy opened a new pull request #10811: Update emails for build failures in Jenkins URL: https://github.com/apache/incubator-mxnet/pull/10811 ## Description ## (Brief description on what this PR is about) ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zheng-da commented on issue #10810: Fix Reorder2Default
zheng-da commented on issue #10810: Fix Reorder2Default URL: https://github.com/apache/incubator-mxnet/pull/10810#issuecomment-386621576 looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on issue #10791: Unable to install mxnet in R 3.5.0
marcoabreu commented on issue #10791: Unable to install mxnet in R 3.5.0 URL: https://github.com/apache/incubator-mxnet/issues/10791#issuecomment-386607901 @sergeykolychev This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] TaoLv opened a new pull request #10810: Fix Reorder2Default
TaoLv opened a new pull request #10810: Fix Reorder2Default URL: https://github.com/apache/incubator-mxnet/pull/10810 ## Description ## Fix issue #10809. @zheng-da @pengzhao-intel @ashokei Please review. @dwSun Feel free to try if you are instereted in building from source. ## Checklist ## ### Essentials ### - [ ] Passed code style checking (`make lint`) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage - [ ] For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented. - [ ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marcoabreu commented on a change in pull request #10797: Better cleaning of git repo, signal handlers
marcoabreu commented on a change in pull request #10797: Better cleaning of git repo, signal handlers URL: https://github.com/apache/incubator-mxnet/pull/10797#discussion_r186086546 ## File path: ci/docker/runtime_functions.sh ## @@ -353,7 +369,18 @@ sanity_check() { } + +before_unittest() { Review comment: I open the overall log and search for the [ut-cpp-gpu], for example. The first occurence contains the machine name. I would rather have this as part of the build.py than in the runtime functions. They're supposed to only have logic and not boilerplate This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] TaoLv commented on issue #10809: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5)
TaoLv commented on issue #10809: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5) URL: https://github.com/apache/incubator-mxnet/issues/10809#issuecomment-386600967 @dwSun Thanks for reporting this. I will take a look and be back to you soon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce
threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-386539894 @rahul003 For GPU, I agree with your comment. But the majority code of this PR is the infrastructure of adding allreduce into MXNet which is shared by both CPU and GPU. Currently we leave the place holder for GPU for future extension. We don't run into any issue on GPU and we enable CPU firstly simply because we currently have a lot CPU multi-node environment We can discuss further about how to add GPU extension. @pengzhao-intel Patric will shed more lights upon it. For resnet50, local batch size 64, global batch size 64 * 8 = 512. (8 machine) Yes, we trained all on CPU. In general, all reduce performance should be similar for openmpi and mpich. Intel mpi has better performance on all reduce performance, but it's not free software though it's run-time part is free. I agree you that we select openmpi as default mpi if no one objects. (we will download open mpi zip in 3rd party and compile it.) For proto3, I tested original kvstore type dist_sync, it works fine for PS-Lite. Moreover, we just use protobuf 3.5.1. For PS-Lite it still uses proto2. (just need to specify its version explicitly.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet
diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet URL: https://github.com/apache/incubator-mxnet/issues/10805#issuecomment-386563096 @QiXuanWang I have used MxNet R to implement SKIP RNN You may find it in this function. https://github.com/diyang/deeplearning.mxnet/blob/master/LSTnet/src/lstnet_model.R I used queue to contain the hidden states of 24 hours, then I will pop the queue head, and then push the newly yielded hidden state of current hour into the queue tail. ```R rnn.skip.unroll<-function(data, num.rnn.layer=1, seq.len, num.hidden, seasonal.period, dropout=0, config="gru") { param.cells <- list() last.states <- list() for( i in 1:num.rnn.layer){ if(config == "gru"){ param.cells[[i]] <- list(gates.i2h.weight = mx.symbol.Variable(paste0("l", i, ".gates.i2h.weight")), gates.i2h.bias = mx.symbol.Variable(paste0("l", i, ".gates.i2h.bias")), gates.h2h.weight = mx.symbol.Variable(paste0("l", i, ".gates.h2h.weight")), gates.h2h.bias = mx.symbol.Variable(paste0("l", i, ".gates.h2h.bias")), trans.i2h.weight = mx.symbol.Variable(paste0("l", i, ".trans.i2h.weight")), trans.i2h.bias = mx.symbol.Variable(paste0("l", i, ".trans.i2h.bias")), trans.h2h.weight = mx.symbol.Variable(paste0("l", i, ".trans.h2h.weight")), trans.h2h.bias = mx.symbol.Variable(paste0("l", i, ".trans.h2h.bias"))) state <- list(h=mx.symbol.Variable(paste0("l", i, ".gru.init.h"))) }else{ param.cells[[i]] <- list(i2h.weight = mx.symbol.Variable(paste0("l", i, ".i2h.weight")), i2h.bias = mx.symbol.Variable(paste0("l", i, ".i2h.bias")), h2h.weight = mx.symbol.Variable(paste0("l", i, ".h2h.weight")), h2h.bias = mx.symbol.Variable(paste0("l", i, ".h2h.bias"))) state <- list(c=mx.symbol.Variable(paste0("l", i, ".lstm.init.c")), h=mx.symbol.Variable(paste0("l", i, ".lstm.init.h"))) } last.states[[i]] <- state } data_seq_slice = mx.symbol.SliceChannel(data=data, num_outputs=seq.len, axis=2, squeeze_axis=1) last.hidden <- list() #it's a queue seasonal.states <- list() for (seqidx in 1:seq.len){ hidden <- data_seq_slice[[seqidx]] # stack lstm if(seqidx <= seasonal.period){ for (i in 1:num.rnn.layer){ dropout <- ifelse(i==1, 0, dropout) prev.state <- last.states[[i]] if(config == "gru"){ next.state <- gru.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) }else{ next.state <- lstm.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) } hidden <- next.state$h last.states[[i]] <- next.state } seasonal.states <- c(seasonal.states, last.states) }else{ for (i in 1:num.rnn.layer){ dropout <- ifelse(i==1, 0, dropout) prev.state <- seasonal.states[[1]] seasonal.states <- seasonal.states[-1] if(config == "gru"){ next.state <- gru.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) }else{ next.state <- lstm.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) } hidden <-
[GitHub] diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet
diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet URL: https://github.com/apache/incubator-mxnet/issues/10805#issuecomment-386581465 @QiXuanWang By the way, according to the paper, if your data is not periodic, or period is dynamic, then you shall choose the variation of LSTnet - LSTnet-Attn. Meaning use Temporal Attention Layer to replace SKIP RNN This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] larroy commented on a change in pull request #10797: Better cleaning of git repo, signal handlers
larroy commented on a change in pull request #10797: Better cleaning of git repo, signal handlers URL: https://github.com/apache/incubator-mxnet/pull/10797#discussion_r186055013 ## File path: ci/docker/runtime_functions.sh ## @@ -353,7 +369,18 @@ sanity_check() { } + +before_unittest() { Review comment: I didn't see an easy way to see the instance id from the logs or locate where it's executing. What's the way you propose? The logs are huge already it's a good thing to have and doesn't create any problems. I would like to have it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: Revert "[MXNET-367] update mkldnn to v0.14 and disable building test examples (#10736)" (#10808)
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 66365ef Revert "[MXNET-367] update mkldnn to v0.14 and disable building test examples (#10736)" (#10808) 66365ef is described below commit 66365efd4a37f9f4ce0dc46f32f3ebd6444a1efc Author: Marco de AbreuAuthorDate: Fri May 4 13:20:37 2018 +0200 Revert "[MXNET-367] update mkldnn to v0.14 and disable building test examples (#10736)" (#10808) This reverts commit 3c7afccb2fca31cc9b555d86506e0c10f18c41a0. --- 3rdparty/mkldnn | 2 +- ci/docker/install/ubuntu_mklml.sh | 4 ++-- prepare_mkl.sh| 6 +++--- prepare_mkldnn.sh | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/3rdparty/mkldnn b/3rdparty/mkldnn index 0e7ca73..b4137df 16 --- a/3rdparty/mkldnn +++ b/3rdparty/mkldnn @@ -1 +1 @@ -Subproject commit 0e7ca738866d22cc700aa33b8de120b938f910d0 +Subproject commit b4137dfc88e3bf5c6b62e833121802eb8c6696da diff --git a/ci/docker/install/ubuntu_mklml.sh b/ci/docker/install/ubuntu_mklml.sh index 3689aad..253cf95 100755 --- a/ci/docker/install/ubuntu_mklml.sh +++ b/ci/docker/install/ubuntu_mklml.sh @@ -21,5 +21,5 @@ # the whole docker cache for the image set -ex -wget --no-check-certificate -O /tmp/mklml.tgz https://github.com/intel/mkl-dnn/releases/download/v0.14/mklml_lnx_2018.0.3.20180406.tgz -tar -zxvf /tmp/mklml.tgz && cp -rf mklml_*/* /usr/local/ && rm -rf mklml_* +wget --no-check-certificate -O /tmp/mklml.tgz https://github.com/intel/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz +tar -zxvf /tmp/mklml.tgz && cp -rf mklml_*/* /usr/local/ && rm -rf mklml_* \ No newline at end of file diff --git a/prepare_mkl.sh b/prepare_mkl.sh index b702b06..12e5df7 100755 --- a/prepare_mkl.sh +++ b/prepare_mkl.sh @@ -58,16 +58,16 @@ MXNET_ROOT=`dirname $0` USE_MKLML=0 # NOTE: if you update the following line, please also update the dockerfile at # tests/ci_build/Dockerfile.mkl -VERSION_MATCH=20180406 +VERSION_MATCH=20171227 PLATFORM=$(uname) if [ $PLATFORM == "Darwin" ]; then INFIX=mac elif [ $PLATFORM == "Linux" ]; then INFIX=lnx fi -ARCHIVE_BASENAME=mklml_${INFIX}_2018.0.3.${VERSION_MATCH}.tgz +ARCHIVE_BASENAME=mklml_${INFIX}_2018.0.1.${VERSION_MATCH}.tgz MKL_CONTENT_DIR=`echo $ARCHIVE_BASENAME | rev | cut -d "." -f 2- | rev` -MKLURL="https://github.com/intel/mkl-dnn/releases/download/v0.14/$ARCHIVE_BASENAME; +MKLURL="https://github.com/01org/mkl-dnn/releases/download/v0.12/$ARCHIVE_BASENAME; # there are diffrent MKL lib to be used for GCC and for ICC reg='^[0-9]+$' VERSION_LINE=`GetVersionName $MKLROOT` diff --git a/prepare_mkldnn.sh b/prepare_mkldnn.sh index d210d64..828cfe1 100755 --- a/prepare_mkldnn.sh +++ b/prepare_mkldnn.sh @@ -93,7 +93,7 @@ if [ ! -f $MKLDNN_LIBFILE ]; then echo "Building MKLDNN ..." >&2 cd $MXNET_ROOTDIR g++ --version >&2 -cmake $MKLDNN_ROOTDIR -DCMAKE_INSTALL_PREFIX=$MKLDNN_INSTALLDIR -B$MKLDNN_BUILDDIR -DARCH_OPT_FLAGS="-mtune=generic" -DWITH_TEST=OFF -DWITH_EXAMPLE=OFF >&2 +cmake $MKLDNN_ROOTDIR -DCMAKE_INSTALL_PREFIX=$MKLDNN_INSTALLDIR -B$MKLDNN_BUILDDIR -DARCH_OPT_FLAGS="-mtune=generic" >&2 NUM_PROC=1 if [[ ! -z $(command -v nproc) ]]; then NUM_PROC=$(nproc) -- To stop receiving notification emails like this one, please contact marcoab...@apache.org.
[GitHub] marcoabreu closed pull request #10808: Revert "[MXNET-367] update mkldnn to v0.14 and disable building test examples"
marcoabreu closed pull request #10808: Revert "[MXNET-367] update mkldnn to v0.14 and disable building test examples" URL: https://github.com/apache/incubator-mxnet/pull/10808 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet
diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet URL: https://github.com/apache/incubator-mxnet/issues/10805#issuecomment-386563096 I have used MxNet R to implement SKIP RNN You may find it in this function. https://github.com/diyang/deeplearning.mxnet/blob/master/LSTnet/src/lstnet_model.R I used queue to contain the hidden states of 24 hours, then I will pop the queue head, and then push the newly yielded hidden state of current hour into the queue tail. ```R rnn.skip.unroll<-function(data, num.rnn.layer=1, seq.len, num.hidden, seasonal.period, dropout=0, config="gru") { param.cells <- list() last.states <- list() for( i in 1:num.rnn.layer){ if(config == "gru"){ param.cells[[i]] <- list(gates.i2h.weight = mx.symbol.Variable(paste0("l", i, ".gates.i2h.weight")), gates.i2h.bias = mx.symbol.Variable(paste0("l", i, ".gates.i2h.bias")), gates.h2h.weight = mx.symbol.Variable(paste0("l", i, ".gates.h2h.weight")), gates.h2h.bias = mx.symbol.Variable(paste0("l", i, ".gates.h2h.bias")), trans.i2h.weight = mx.symbol.Variable(paste0("l", i, ".trans.i2h.weight")), trans.i2h.bias = mx.symbol.Variable(paste0("l", i, ".trans.i2h.bias")), trans.h2h.weight = mx.symbol.Variable(paste0("l", i, ".trans.h2h.weight")), trans.h2h.bias = mx.symbol.Variable(paste0("l", i, ".trans.h2h.bias"))) state <- list(h=mx.symbol.Variable(paste0("l", i, ".gru.init.h"))) }else{ param.cells[[i]] <- list(i2h.weight = mx.symbol.Variable(paste0("l", i, ".i2h.weight")), i2h.bias = mx.symbol.Variable(paste0("l", i, ".i2h.bias")), h2h.weight = mx.symbol.Variable(paste0("l", i, ".h2h.weight")), h2h.bias = mx.symbol.Variable(paste0("l", i, ".h2h.bias"))) state <- list(c=mx.symbol.Variable(paste0("l", i, ".lstm.init.c")), h=mx.symbol.Variable(paste0("l", i, ".lstm.init.h"))) } last.states[[i]] <- state } data_seq_slice = mx.symbol.SliceChannel(data=data, num_outputs=seq.len, axis=2, squeeze_axis=1) last.hidden <- list() #it's a queue seasonal.states <- list() for (seqidx in 1:seq.len){ hidden <- data_seq_slice[[seqidx]] # stack lstm if(seqidx <= seasonal.period){ for (i in 1:num.rnn.layer){ dropout <- ifelse(i==1, 0, dropout) prev.state <- last.states[[i]] if(config == "gru"){ next.state <- gru.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) }else{ next.state <- lstm.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) } hidden <- next.state$h last.states[[i]] <- next.state } seasonal.states <- c(seasonal.states, last.states) }else{ for (i in 1:num.rnn.layer){ dropout <- ifelse(i==1, 0, dropout) prev.state <- seasonal.states[[1]] seasonal.states <- seasonal.states[-1] if(config == "gru"){ next.state <- gru.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) }else{ next.state <- lstm.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) } hidden <- next.state$h
[GitHub] diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet
diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet URL: https://github.com/apache/incubator-mxnet/issues/10805#issuecomment-386563096 I have used MxNet R to implement SKIP RNN You may find it in this function. https://github.com/diyang/deeplearning.mxnet/blob/master/LSTnet/src/lstnet_model.R ```R rnn.skip.unroll<-function(data, num.rnn.layer=1, seq.len, num.hidden, seasonal.period, dropout=0, config="gru") { param.cells <- list() last.states <- list() for( i in 1:num.rnn.layer){ if(config == "gru"){ param.cells[[i]] <- list(gates.i2h.weight = mx.symbol.Variable(paste0("l", i, ".gates.i2h.weight")), gates.i2h.bias = mx.symbol.Variable(paste0("l", i, ".gates.i2h.bias")), gates.h2h.weight = mx.symbol.Variable(paste0("l", i, ".gates.h2h.weight")), gates.h2h.bias = mx.symbol.Variable(paste0("l", i, ".gates.h2h.bias")), trans.i2h.weight = mx.symbol.Variable(paste0("l", i, ".trans.i2h.weight")), trans.i2h.bias = mx.symbol.Variable(paste0("l", i, ".trans.i2h.bias")), trans.h2h.weight = mx.symbol.Variable(paste0("l", i, ".trans.h2h.weight")), trans.h2h.bias = mx.symbol.Variable(paste0("l", i, ".trans.h2h.bias"))) state <- list(h=mx.symbol.Variable(paste0("l", i, ".gru.init.h"))) }else{ param.cells[[i]] <- list(i2h.weight = mx.symbol.Variable(paste0("l", i, ".i2h.weight")), i2h.bias = mx.symbol.Variable(paste0("l", i, ".i2h.bias")), h2h.weight = mx.symbol.Variable(paste0("l", i, ".h2h.weight")), h2h.bias = mx.symbol.Variable(paste0("l", i, ".h2h.bias"))) state <- list(c=mx.symbol.Variable(paste0("l", i, ".lstm.init.c")), h=mx.symbol.Variable(paste0("l", i, ".lstm.init.h"))) } last.states[[i]] <- state } data_seq_slice = mx.symbol.SliceChannel(data=data, num_outputs=seq.len, axis=2, squeeze_axis=1) last.hidden <- list() #it's a queue seasonal.states <- list() for (seqidx in 1:seq.len){ hidden <- data_seq_slice[[seqidx]] # stack lstm if(seqidx <= seasonal.period){ for (i in 1:num.rnn.layer){ dropout <- ifelse(i==1, 0, dropout) prev.state <- last.states[[i]] if(config == "gru"){ next.state <- gru.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) }else{ next.state <- lstm.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) } hidden <- next.state$h last.states[[i]] <- next.state } seasonal.states <- c(seasonal.states, last.states) }else{ for (i in 1:num.rnn.layer){ dropout <- ifelse(i==1, 0, dropout) prev.state <- seasonal.states[[1]] seasonal.states <- seasonal.states[-1] if(config == "gru"){ next.state <- gru.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) }else{ next.state <- lstm.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) } hidden <- next.state$h last.states[[i]] <- next.state } seasonal.states <- c(seasonal.states, last.states) } # Aggeregate outputs from each timestep
[GitHub] diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet
diyang commented on issue #10805: SKIP RNN is incorrect in LSTnet URL: https://github.com/apache/incubator-mxnet/issues/10805#issuecomment-386563096 I have used MxNet R to implement SKIP RNN You may find it in this function. ```R rnn.skip.unroll<-function(data, num.rnn.layer=1, seq.len, num.hidden, seasonal.period, dropout=0, config="gru") { param.cells <- list() last.states <- list() for( i in 1:num.rnn.layer){ if(config == "gru"){ param.cells[[i]] <- list(gates.i2h.weight = mx.symbol.Variable(paste0("l", i, ".gates.i2h.weight")), gates.i2h.bias = mx.symbol.Variable(paste0("l", i, ".gates.i2h.bias")), gates.h2h.weight = mx.symbol.Variable(paste0("l", i, ".gates.h2h.weight")), gates.h2h.bias = mx.symbol.Variable(paste0("l", i, ".gates.h2h.bias")), trans.i2h.weight = mx.symbol.Variable(paste0("l", i, ".trans.i2h.weight")), trans.i2h.bias = mx.symbol.Variable(paste0("l", i, ".trans.i2h.bias")), trans.h2h.weight = mx.symbol.Variable(paste0("l", i, ".trans.h2h.weight")), trans.h2h.bias = mx.symbol.Variable(paste0("l", i, ".trans.h2h.bias"))) state <- list(h=mx.symbol.Variable(paste0("l", i, ".gru.init.h"))) }else{ param.cells[[i]] <- list(i2h.weight = mx.symbol.Variable(paste0("l", i, ".i2h.weight")), i2h.bias = mx.symbol.Variable(paste0("l", i, ".i2h.bias")), h2h.weight = mx.symbol.Variable(paste0("l", i, ".h2h.weight")), h2h.bias = mx.symbol.Variable(paste0("l", i, ".h2h.bias"))) state <- list(c=mx.symbol.Variable(paste0("l", i, ".lstm.init.c")), h=mx.symbol.Variable(paste0("l", i, ".lstm.init.h"))) } last.states[[i]] <- state } data_seq_slice = mx.symbol.SliceChannel(data=data, num_outputs=seq.len, axis=2, squeeze_axis=1) last.hidden <- list() #it's a queue seasonal.states <- list() for (seqidx in 1:seq.len){ hidden <- data_seq_slice[[seqidx]] # stack lstm if(seqidx <= seasonal.period){ for (i in 1:num.rnn.layer){ dropout <- ifelse(i==1, 0, dropout) prev.state <- last.states[[i]] if(config == "gru"){ next.state <- gru.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) }else{ next.state <- lstm.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) } hidden <- next.state$h last.states[[i]] <- next.state } seasonal.states <- c(seasonal.states, last.states) }else{ for (i in 1:num.rnn.layer){ dropout <- ifelse(i==1, 0, dropout) prev.state <- seasonal.states[[1]] seasonal.states <- seasonal.states[-1] if(config == "gru"){ next.state <- gru.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) }else{ next.state <- lstm.cell(num.hidden, indata = hidden, prev.state = prev.state, param = param.cells[[i]], seqidx = seqidx, layeridx = i, dropout = dropout) } hidden <- next.state$h last.states[[i]] <- next.state } seasonal.states <- c(seasonal.states, last.states) } # Aggeregate outputs from each timestep last.hidden <- c(last.hidden, hidden) } list.all <- list(outputs = last.hidden,
[GitHub] dwSun opened a new issue #10809: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5)
dwSun opened a new issue #10809: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5) URL: https://github.com/apache/incubator-mxnet/issues/10809 ## Description Crashed when training a model. With code from [this tutorial](http://mxnet.incubator.apache.org/tutorials/gluon/datasets.html), I try to train my own model with MobileNetV2. But it crashed with mxnet-mkl-1.2.0b20180503 from pypi. On mxnet-mkl-1.1.0 from pypi, this code works. Batch size 32 and 16 can reproduce this error, others like 8 or 32 seems can't. Smaller network can't reproduce this error. Not sure this error related to pr #10317 or not. And maybe this is a same error like issue #10807. ## Environment info (Required) This is the code [crash.zip](https://github.com/apache/incubator-mxnet/files/1973878/crash.zip) Run with ```py python3 fashion.py ``` Package used (Python/R/Scala/Julia): ``` % pip3 list Package Version --- -- certifi 2018.4.16 chardet 3.0.4 graphviz0.8.3 idna2.6 mxnet-mkl 1.2.0b20180503 numpy 1.14.3 pandas 0.22.0 pip 10.0.1 pkg-resources 0.0.0 python-dateutil 2.7.2 pytz2018.4 requests2.18.4 setuptools 39.1.0 six 1.11.0 urllib3 1.22 wheel 0.31.0 ``` ## Error Message: ``` % python3 fashion.py [17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly [17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly Epoch 0, training loss: 2.55, validation loss: 2.31 [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly Epoch 1, training loss: 2.56, validation loss: 2.35 [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly [17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly [17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly Traceback (most recent call last): File "fashion.py", line 71, in valid_loss = cumulative_valid_loss.asscalar()/valid_samples File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1894, in asscalar return self.asnumpy()[0] File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1876, in asnumpy ctypes.c_size_t(data.size))) File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/base.py", line 149, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [17:28:51] src/ndarray/ndarray.cc:351: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5) Stack trace returned 10 entries: [bt] (0) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17009d) [0x7fba25e2f09d] [bt] (1) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x170468) [0x7fba25e2f468] [bt] (2) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a4a1b8) [0x7fba287091b8] [bt] (3)
[GitHub] dwSun commented on issue #10807: Ndarray.asnumpy() error with gluon dense under both GPU and CPU environment
dwSun commented on issue #10807: Ndarray.asnumpy() error with gluon dense under both GPU and CPU environment URL: https://github.com/apache/incubator-mxnet/issues/10807#issuecomment-386549085 modified your script as this: ```py from mxnet.gluon import nn import mxnet as mx mx.Context.default_ctx = mx.Context('cpu', 0) layer = nn.Dense(1000) x = mx.nd.random.uniform(shape=(16, 128, 300, 300)) x.attach_grad() layer.collect_params().initialize() with mx.autograd.record(): out = layer(x) out.backward() print(x.grad.shape) print(x.grad) print(x.grad.asnumpy().shape) ``` with mxnet-mkl 1.1.0 from pypi, I got this: ``` % python3 script.py 134 ↵ (16, 128, 300, 300) terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc [1]17413 abort python3 script.py ``` with mxnet-mkl 1.2.0b20180503 from pypi, I got this: ``` % python3 script.py 1 ↵ (16, 128, 300, 300) Traceback (most recent call last): File "script.py", line 14, in print(x.grad) File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 189, in __repr__ return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()), File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1876, in asnumpy ctypes.c_size_t(data.size))) File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/base.py", line 149, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [17:24:12] src/storage/./cpu_device_storage.h:73: Failed to allocate CPU Memory Stack trace returned 10 entries: [bt] (0) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17009d) [0x7f39de8c009d] [bt] (1) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x170468) [0x7f39de8c0468] [bt] (2) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2dc701d) [0x7f39e151701d] [bt] (3) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2dc704d) [0x7f39e151704d] [bt] (4) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2dcc77b) [0x7f39e151c77b] [bt] (5) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x29140f4) [0x7f39e10640f4] [bt] (6) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x291469f) [0x7f39e106469f] [bt] (7) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2914ab0) [0x7f39e1064ab0] [bt] (8) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2891843) [0x7f39e0fe1843] [bt] (9) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2899644) [0x7f39e0fe9644] ``` So, I am totally confused... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asitstands commented on issue #10768: Use numpy in RandomSampler
asitstands commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386546813 Thanks @leezu. I wish this discussion would not bother you too much. Here is my test code. ```python import time import mxnet as mx import numpy as np n = 4 start = time.time() for i in range(1): x = mx.nd.arange(n) mx.random.shuffle(x, out=x) y = iter(x.asnumpy()) end = time.time() print("mx elapsed time: ", end - start) start = time.time() for i in range(1): x = np.arange(n) np.random.shuffle(x) y = iter(x) end = time.time() print("np elapsed time: ", end - start) ``` On i7-3770K 3.50GHz, the result is ``` mx elapsed time: 3.1706936359405518 np elapsed time: 5.6994311809539795 ``` On two Xeon(R) E5-2680 v4 2.40GHz, the result is ``` mx elapsed time: 2.679560661315918 np elapsed time: 6.299736976623535 ``` As I increase `n`, the time ratio np / mx also increases. If `n` is smaller than 15000, np has shorter running time in i7. If `n` is smaller than 1, np outperforms mx also in Xeon. I didn't test with gluon samplers but I think this code should capture the difference between shuffles of mx and np. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asitstands commented on issue #10768: Use numpy in RandomSampler
asitstands commented on issue #10768: Use numpy in RandomSampler URL: https://github.com/apache/incubator-mxnet/pull/10768#issuecomment-386546813 Thanks @leezu. I wish this discussion would not bother you too much. Here is my test code. ```python import time import mxnet as mx import numpy as np n = 4 start = time.time() for i in range(1): x = mx.nd.arange(n) mx.random.shuffle(x, out=x) y = iter(x.asnumpy()) end = time.time() print("mx elapsed time: ", end - start) start = time.time() for i in range(1): x = np.arange(n) np.random.shuffle(x) y = iter(x) end = time.time() print("np elapsed time: ", end - start) ``` On i7-3770K CPU @ 3.50GHz, the result is ``` mx elapsed time: 3.1706936359405518 np elapsed time: 5.6994311809539795 ``` On two Xeon(R) CPU E5-2680 v4 @ 2.40GHz, the result is ``` mx elapsed time: 2.679560661315918 np elapsed time: 6.299736976623535 ``` As I increase `n`, the time ratio np / mx also increases. If `n` is smaller than 15000, np has shorter running time in i7. If `n` is smaller than 1, np outperforms mx also in Xeon. I didn't test with gluon samplers but I think this code should capture the difference between shuffles of mx and np. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xinyu-intel commented on issue #10629: [MXNET-343]fix Mkldnn with msvc
xinyu-intel commented on issue #10629: [MXNET-343]fix Mkldnn with msvc URL: https://github.com/apache/incubator-mxnet/pull/10629#issuecomment-386544479 @yajiedesign update your submodule using `git submodule update --init --recursive`, and then commit the new mkldnn to your repo. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce
threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-386539894 @rahul003 For GPU, I agree with your comment. Currently we leave the place holder for GPU for future extension. @pengzhao-intel Patric will shed more lights upon it. For resnet50, local batch size 64, global batch size 64 * 8 = 512. (8 machine) Yes, we trained all on CPU. In general, all reduce performance should be similar for openmpi and mpich. Intel mpi has better performance on all reduce performance, but it's not free software though it's run-time part is free. I agree you that we select openmpi as default mpi if no one objects. (we will download open mpi zip in 3rd party and compile it.) For proto3, I tested original kvstore type dist_sync, it works fine for PS-Lite. Moreover, we just use protobuf 3.5.1. For PS-Lite it still uses proto2. (just need to specify its version explicitly.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce
threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-386539894 @rahul003 For GPU, I agree with your comment. Currently we leave the place holder for GPU for future extension. @pengzhao-intel Patric will shed more lights upon it. For resnet50, local batch size 64, global batch size 64 * 8 = 512. (8 machine) Yes, we trained all on CPU. In general, all reduce performance should be similar for openmpi and mpich. Intel mpi has better performance on all reduce performance, but it's not free software though it's run-time part is free. I agree you that we select openmpi as default mpi if no one objects. (we will download open mpi zip in 3rd party and compile it.) For proto3, I tested original kvstore type dist_sync, it works fine for PS-Lite. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] snflake commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available
snflake commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available URL: https://github.com/apache/incubator-mxnet/pull/10804#issuecomment-386537855 About the speed, I used TensorRT with cudnn 7 for inference and depthwise conv is very fast regardless of dilation rate. There is no need for custom depthwise conv implementation if cudnn 7 group is used. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] grainw commented on issue #2754: import mxnetReason: image not found
grainw commented on issue #2754: import mxnetReason: image not found URL: https://github.com/apache/incubator-mxnet/issues/2754#issuecomment-386536477 Hey did you figure out how to solve this?@pjpan This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] snflake commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available
snflake commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available URL: https://github.com/apache/incubator-mxnet/pull/10804#issuecomment-386535684 The CI failure seems to not related to this PR. unknown file: Failure C++ exception with description "[04:51:17] /work/mxnet/tests/cpp/operator/mkldnn.cc:85: Check failed: mkldnn_format_last == 56 (67 vs. 56) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] snflake commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available
snflake commented on issue #10804: Use depthwise convolution(group convolution) by cuDNNv7 if available URL: https://github.com/apache/incubator-mxnet/pull/10804#issuecomment-386534903 Great work! This seems to explain the current low performance of Mxnet compared to Tensorflow when dilation rate > 1 is used together with depthhwise convolution. PR #7393 only addresses dilation rate = 1. Tensorflow custom CUDA implementation also works with dilation rate 1. They use CuDNN otherwise. The reason is MxNet did not use group feature of CuDNN v7 which is implemented in this PR. Would you fix merge failure? I would like to test this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services