[GitHub] [incubator-mxnet] ys2843 commented on pull request #18288: Website global search feature
ys2843 commented on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629589289 @mxnet-bot run ci [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18288: Website global search feature
mxnet-bot commented on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629589299 Jenkins CI successfully triggered : [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] connorgoggins commented on pull request #18340: [Website 2.0] Artifact URL Adjustment
connorgoggins commented on pull request #18340: URL: https://github.com/apache/incubator-mxnet/pull/18340#issuecomment-629583487 @mxnet-label-bot add [Website] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] connorgoggins commented on pull request #18340: [Website 2.0] Artifact URL Adjustment
connorgoggins commented on pull request #18340: URL: https://github.com/apache/incubator-mxnet/pull/18340#issuecomment-629583448 @mxnet-label-bot add [pr-awaiting-review] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18340: [Website 2.0] Artifact URL Adjustment
mxnet-bot commented on pull request #18340: URL: https://github.com/apache/incubator-mxnet/pull/18340#issuecomment-629583305 Hey @connorgoggins , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: - To trigger all jobs: @mxnet-bot run ci [all] - To trigger specific jobs: @mxnet-bot run ci [job1, job2] *** **CI supported jobs**: [windows-cpu, unix-gpu, sanity, centos-cpu, centos-gpu, unix-cpu, website, windows-gpu, clang, miscellaneous, edge] *** _Note_: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] connorgoggins opened a new pull request #18340: [Website 2.0] Artifact URL Adjustment
connorgoggins opened a new pull request #18340: URL: https://github.com/apache/incubator-mxnet/pull/18340 ## Description ## This PR officially transfers ownership of code version control and compressed artifact storage for the static artifacts for the MXNet website from Connor Goggins to the MXNet Website team. The new S3 bucket (owned by the AWS account for ai-mxnet-engineer...@amazon.com) gives the team the ability to update the artifacts as needed. The specific purpose of this PR is to change the artifact download link in the Jenkinsfile to point to the compressed artifacts in the new S3 bucket instead of the previous bucket. These changes will increase the accessibility of the MXNet Website 2.0 static artifacts to the team in case of future modifications. For additional details and supporting documentation, please read [this doc](https://quip-amazon.com/4i2GAb8huyKC/MXNet-Website-20-Static-Artifact-Mangement). ## Checklist ## ### Essentials ### - [x] Changes are complete (i.e. I finished coding on this PR) - [x] Code is well-documented - [x] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - M docs/static_site/Makefile ## Comments ## These changes have been tested with a complete website build on Jenkins dev and local hosting. All functionality remains the same. @sandeep-krishnamurthy @aaronmarkham @ys2843 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on pull request #18315: Fix coredumps
leezu commented on pull request #18315: URL: https://github.com/apache/incubator-mxnet/pull/18315#issuecomment-629573080 @mxnet-bot run ci [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18315: Fix coredumps
mxnet-bot commented on pull request #18315: URL: https://github.com/apache/incubator-mxnet/pull/18315#issuecomment-629573096 Jenkins CI successfully triggered : [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] szha commented on issue #18244: unix-cpu MKL/MKL-DNN Test Time
szha commented on issue #18244: URL: https://github.com/apache/incubator-mxnet/issues/18244#issuecomment-629566327 Any update on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] rondogency commented on pull request #18331: Remove test metric perf
rondogency commented on pull request #18331: URL: https://github.com/apache/incubator-mxnet/pull/18331#issuecomment-629566211 @leezu the performance benchmark we are running is focusing on model level performance on end to end training/inference, this one is more related to opperf to test components This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18326: [R] Fix incorrect copyto usage & incorrect website title for Symbol API in R
ChaiBapchya commented on pull request #18326: URL: https://github.com/apache/incubator-mxnet/pull/18326#issuecomment-629564086 @leezu @access2rohit plz review/merge This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] waytrue17 commented on pull request #18339: update dockerfile for jetson
waytrue17 commented on pull request #18339: URL: https://github.com/apache/incubator-mxnet/pull/18339#issuecomment-629563544 @ciyongch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 4bdada6 Bump the publish timestamp. 4bdada6 is described below commit 4bdada6e752b8d2aaeb528748b3ebaf74035dc61 Author: mxnet-ci AuthorDate: Sat May 16 00:48:10 2020 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..b98f4b3 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Sat May 16 00:48:10 UTC 2020
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18339: update dockerfile for jetson
mxnet-bot commented on pull request #18339: URL: https://github.com/apache/incubator-mxnet/pull/18339#issuecomment-629562168 Hey @waytrue17 , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: - To trigger all jobs: @mxnet-bot run ci [all] - To trigger specific jobs: @mxnet-bot run ci [job1, job2] *** **CI supported jobs**: [centos-gpu, miscellaneous, edge, unix-gpu, sanity, website, unix-cpu, centos-cpu, windows-gpu, clang, windows-cpu] *** _Note_: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] waytrue17 opened a new pull request #18339: update dockerfile for jetson
waytrue17 opened a new pull request #18339: URL: https://github.com/apache/incubator-mxnet/pull/18339 ## Description ## Trying to fix #18311 on v1.7.x by copying Dockerfile.build.jetson from master to v1.7.x ### Essentials ### - [ ] Changes are complete (i.e. I finished coding on this PR) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18288: Website global search feature
mxnet-bot commented on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629557848 Jenkins CI successfully triggered : [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ys2843 commented on pull request #18288: Website global search feature
ys2843 commented on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629557829 @mxnet-bot run ci [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18338: [DO NOT MERGE] Use mkl cmake flag to force DNNL to delegate FC op to MKL
ChaiBapchya commented on pull request #18338: URL: https://github.com/apache/incubator-mxnet/pull/18338#issuecomment-629557360 Related issue : https://github.com/apache/incubator-mxnet/issues/17980 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18338: [DO NOT MERGE] Use mkl cmake flag to force DNNL to delegate FC op to MKL
mxnet-bot commented on pull request #18338: URL: https://github.com/apache/incubator-mxnet/pull/18338#issuecomment-629557180 Hey @ChaiBapchya , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: - To trigger all jobs: @mxnet-bot run ci [all] - To trigger specific jobs: @mxnet-bot run ci [job1, job2] *** **CI supported jobs**: [sanity, centos-gpu, windows-gpu, unix-cpu, unix-gpu, windows-cpu, centos-cpu, website, edge, miscellaneous, clang] *** _Note_: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya opened a new pull request #18338: [DO NOT MERGE] Use mkl cmake flag to force DNNL to delegate FC op to MKL
ChaiBapchya opened a new pull request #18338: URL: https://github.com/apache/incubator-mxnet/pull/18338 ## Description ## Currently, for GEMM ops [DNNL delegates that op to MKL] However, for FC DNNL doesn't. Workaround : export USE_MKL & pass location of MKL installation. This forces DNNL to delegate ops to MKL This PR tries to verify correctness of the workaround [if it breaks any unit-tests] @leezu @PatricZhao @TaoLv @kpuatamazon @kpu This PR is not meant to be merged. But to serve as a proxy for internal releases. If it passes the CI without issues, PR will be closed. Pipelines to be watched for : windows-cpu, unix-cpu & centos-cpu Specifically mkldnn/mkl builds. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] szha commented on pull request #18335: Skip test_metric_performance on MKL builds
szha commented on pull request #18335: URL: https://github.com/apache/incubator-mxnet/pull/18335#issuecomment-629552229 I'm moving this to benchmark scripts in #18252 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] szha commented on pull request #18331: Remove test metric perf
szha commented on pull request #18331: URL: https://github.com/apache/incubator-mxnet/pull/18331#issuecomment-629552076 I'm moving it to benchmark scripts in #18252 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] szha commented on pull request #18252: [CI] run operator tests with naive engine
szha commented on pull request #18252: URL: https://github.com/apache/incubator-mxnet/pull/18252#issuecomment-629545644 @marcoabreu all but the operator unit tests still run on threaded engine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] aaraujom commented on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL
aaraujom commented on issue #17980: URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-629542494 Hi thanks for checking. Let me try to reproduce this in my end and fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet] branch master updated (3e676fc -> 37280e4)
This is an automated email from the ASF dual-hosted git repository. lausen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. from 3e676fc Fix memory leaks in Gluon (#18328) add 37280e4 Fix deferred compute mode for operators using new FFI (#18284) No new revisions were added by this update. Summary of changes: src/api/operator/utils.cc | 6 +- src/api/operator/utils.h | 3 ++- tests/python/unittest/test_deferred_compute.py | 12 3 files changed, 19 insertions(+), 2 deletions(-)
[incubator-mxnet] branch master updated: Fix deferred compute mode for operators using new FFI (#18284)
This is an automated email from the ASF dual-hosted git repository. lausen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 37280e4 Fix deferred compute mode for operators using new FFI (#18284) 37280e4 is described below commit 37280e4ddf00cacdac50c1e798fd2a14da38ae8d Author: Leonard Lausen AuthorDate: Fri May 15 15:36:55 2020 -0700 Fix deferred compute mode for operators using new FFI (#18284) --- src/api/operator/utils.cc | 6 +- src/api/operator/utils.h | 3 ++- tests/python/unittest/test_deferred_compute.py | 12 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/src/api/operator/utils.cc b/src/api/operator/utils.cc index 307bb29..6cfbd27 100644 --- a/src/api/operator/utils.cc +++ b/src/api/operator/utils.cc @@ -30,6 +30,10 @@ bool is_recording() { return Imperative::Get()->is_recording(); } +bool is_deferred_compute() { + return Imperative::Get()->is_deferred_compute(); +} + void SetInOut(std::vector* ndinputs, std::vector* ndoutputs, int num_inputs, @@ -94,7 +98,7 @@ std::vector Invoke(const nnvm::Op* op, Imperative::DCInfo::Compute(*input); } auto state = Imperative::Get()->Invoke(Context::CPU(), *attrs, ndinputs, ndoutputs); -if (Imperative::Get()->is_recording()) { +if (is_recording()) { Imperative::Get()->RecordOp(std::move(*attrs), ndinputs, ndoutputs, state); } } diff --git a/src/api/operator/utils.h b/src/api/operator/utils.h index 8943e80..014ff15 100644 --- a/src/api/operator/utils.h +++ b/src/api/operator/utils.h @@ -48,10 +48,11 @@ std::vector Invoke(const nnvm::Op* op, NDArray** outputs); bool is_recording(); +bool is_deferred_compute(); template void SetAttrDict(nnvm::NodeAttrs* attrs) { - if (is_recording()) { + if (is_recording() || is_deferred_compute()) { ::dmlc::get(attrs->parsed).SetAttrDict(&(attrs->dict)); } } diff --git a/tests/python/unittest/test_deferred_compute.py b/tests/python/unittest/test_deferred_compute.py index e68373f..d93e67f 100644 --- a/tests/python/unittest/test_deferred_compute.py +++ b/tests/python/unittest/test_deferred_compute.py @@ -162,6 +162,18 @@ def test_dc_no_inputs_subset_of_output(): _all_assert_dc(_dc_empty_setup, f) +def test_dc_numpy_tril(): +def f(a, *, nd): +assert nd is mx.np +a = nd.ones((2, 2)) +b = nd.tril(a, 1) +c = nd.tril(a, -1) +return [b, c] + +for mode in ('all', 'symbolic', 'imperative', 'imperativewithnondccompute'): +_assert_dc(_dc_simple_setup, f, mode=mode) + + ### # Test cases with inputs ###
[GitHub] [incubator-mxnet] leezu merged pull request #18284: Fix parse operator attributes in new FFI
leezu merged pull request #18284: URL: https://github.com/apache/incubator-mxnet/pull/18284 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu closed issue #18004: Wrong result when using new numpy ffi in deferred compute
leezu closed issue #18004: URL: https://github.com/apache/incubator-mxnet/issues/18004 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18315: Fix coredumps
mxnet-bot commented on pull request #18315: URL: https://github.com/apache/incubator-mxnet/pull/18315#issuecomment-629534560 Jenkins CI successfully triggered : [centos-cpu, unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on pull request #18315: Fix coredumps
leezu commented on pull request #18315: URL: https://github.com/apache/incubator-mxnet/pull/18315#issuecomment-629534528 @mxnet-bot run ci [unix-cpu, centos-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18337: fix native cd builds
mxnet-bot commented on pull request #18337: URL: https://github.com/apache/incubator-mxnet/pull/18337#issuecomment-629530814 Hey @mseth10 , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: - To trigger all jobs: @mxnet-bot run ci [all] - To trigger specific jobs: @mxnet-bot run ci [job1, job2] *** **CI supported jobs**: [windows-cpu, centos-cpu, edge, clang, unix-gpu, centos-gpu, miscellaneous, windows-gpu, unix-cpu, website, sanity] *** _Note_: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mseth10 opened a new pull request #18337: fix native cd builds
mseth10 opened a new pull request #18337: URL: https://github.com/apache/incubator-mxnet/pull/18337 ## Description ## Fix CD builds for 'native' flavor This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL
ChaiBapchya commented on issue #17980: URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-629526785 Can confirm that this issue is specific to AVX512 kernels. Tried this on c5.xl $ lscpu ``` Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ``` ## Results ### Default [slower] ``` dnnl_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.133789 dnnl_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.132812 ``` ``` [{'FullyConnected': [ {'inputs': {'data': (4, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_FullyConnected': 0.10202302001744101, 'p50_time_FullyConnected': 0.10086749989568489, 'p90_time_FullyConnected': 0.10658760029400582, 'p99_time_FullyConnected': 0.13521948004836298}, {'inputs': {'data': (5, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_FullyConnected': 0.10642346004715364, 'p50_time_FullyConnected': 0.09991750016524747, 'p90_time_FullyConnected': 0.10565369971118344, 'p99_time_FullyConnected': 0.2586996700802042}, {'inputs': {'data': (5, 512), 'weight': (1536, 512), 'no_bias': True, 'num_hidden': 1536}, 'avg_time_FullyConnected': 0.16890607999812346, 'p50_time_FullyConnected': 0.16431500012004108, 'p90_time_FullyConnected': 0.1781331999154645, 'p99_time_FullyConnected': 0.2831235897247094}, {'inputs': {'data': (5, 512), 'weight': (2048, 512), 'no_bias': True, 'num_hidden': 2048}, 'avg_time_FullyConnected': 0.20140223995440465, 'p50_time_FullyConnected': 0.19778950013460417, 'p90_time_FullyConnected': 0.20401089991537447, 'p99_time_FullyConnected': 0.3063294199228036}, {'inputs': {'data': (5, 2048), 'weight': (512, 2048), 'no_bias': True, 'num_hidden': 512}, 'avg_time_FullyConnected': 0.21596427998701984, 'p50_time_FullyConnected': 0.209670038254, 'p90_time_FullyConnected': 0.21819640001012885, 'p99_time_FullyConnected': 0.3412436299549877}]}] ``` ### MKL Workaround [Faster] ``` MKL_VERBOSE SGEMM(T,N,512,5,2048,0x7f9bcf6fac28,0x7f9bc22f4040,2048,0x7f9b1400ce80,2048,0x7f9bcf6fac30,0x7f9b1405e840,512) 21.25us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:18 dnnl_verbose,exec,cpu,inner_product,gemm:blas,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.0378418 MKL_VERBOSE SGEMM(T,N,512,5,2048,0x7f9bcf6fac28,0x7f9bc22f4040,2048,0x7f9b1400ce80,2048,0x7f9bcf6fac30,0x7f9b14061c00,512) 20.94us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:18 dnnl_verbose,exec,cpu,inner_product,gemm:blas,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.0371094 ``` ``` [{'FullyConnected': [ {'inputs': {'data': (4, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_FullyConnected': 0.11772135999308375, 'p50_time_FullyConnected': 0.1149684999290912, 'p90_time_FullyConnected': 0.1244978000613628, 'p99_time_FullyConnected': 0.14825501980340045}, {'inputs': {'data': (5, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_FullyConnected': 0.120828840035756, 'p50_time_FullyConnected': 0.11370450010872446, 'p90_time_FullyConnected': 0.12752780021401122, 'p99_time_FullyConnected': 0.2412066401620902}, {'inputs': {'data': (5, 512), 'weight': (1536, 512), 'no_bias': True, 'num_hidden': 1536}, 'avg_time_FullyConnected': 0.13385597998421872, 'p50_time_FullyConnected': 0.12600750005731243, 'p90_time_FullyConnected': 0.14806160011175962, 'p99_time_FullyConnected': 0.2509373301927551}, {'inputs': {'data': (5, 512), 'weight': (2048, 512), 'no_bias': True, 'num_hidden': 2048}, 'avg_time_FullyConnected': 0.14175208003507578, 'p50_time_FullyConnected': 0.1372545000322134, 'p90_time_FullyConnected': 0.14401020002878798, 'p99_time_FullyConnected': 0.2423993399725075}, {'inputs': {'data': (5, 2048), 'weight': (512, 2048), 'no_bias': True, 'num_hidden': 512}, 'avg_time_FullyConnected': 0.143890859962994, 'p50_time_FullyConnected': 0.139797637755, 'p90_time_FullyConnected':
[GitHub] [incubator-mxnet] bill10 opened a new issue #18336: topk operation is very memory intensive
bill10 opened a new issue #18336: URL: https://github.com/apache/incubator-mxnet/issues/18336 ## Description I am not sure if this is a bug, but the topk operation has a surprisingly high memory cost compared to, e.g., pytorch. ### Error Message This resulted in out-of-memory error for large tensors. ## To Reproduce For mxnet 1.6.0, python 3.6 ``` %memit X = mx.nd.random.uniform(shape=(32, 200)) #peak memory: 419.61 MiB, increment: 249.80 MiB ``` ``` %memit y=X.topk(k=100) #peak memory: 1396.98 MiB, increment: 977.97 MiB ``` For pytorch 1.4: ``` %memit X=th.rand(size=(32, 200)) # peak memory: 417.15 MiB, increment: 244.70 MiB ``` ``` %memit y=X.topk(k=100) # peak memory: 906.36 MiB, increment: 489.05 MiB ``` If I read correctly, mxnet uses twice as much the memory as pytorch uses (977 vs 489). ## Environment mxnet 1.6.0, python 3.6 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] zhreshold commented on pull request #18325: Change SGD with momentum to include momentum correction by default
zhreshold commented on pull request #18325: URL: https://github.com/apache/incubator-mxnet/pull/18325#issuecomment-629504587 @szhengac Can you provide feedback on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18335: Skip test_metric_performance on MKL builds
mxnet-bot commented on pull request #18335: URL: https://github.com/apache/incubator-mxnet/pull/18335#issuecomment-629504113 Hey @leezu , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: - To trigger all jobs: @mxnet-bot run ci [all] - To trigger specific jobs: @mxnet-bot run ci [job1, job2] *** **CI supported jobs**: [website, windows-gpu, miscellaneous, windows-cpu, clang, centos-cpu, centos-gpu, sanity, edge, unix-cpu, unix-gpu] *** _Note_: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu opened a new pull request #18335: Skip test_metric_performance on MKL builds
leezu opened a new pull request #18335: URL: https://github.com/apache/incubator-mxnet/pull/18335 https://github.com/apache/incubator-mxnet/issues/18244 metrics now use mxnet operators Fixes https://github.com/apache/incubator-mxnet/issues/18330 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on pull request #18331: Remove test metric perf
leezu commented on pull request #18331: URL: https://github.com/apache/incubator-mxnet/pull/18331#issuecomment-629477586 I think we should integrate it in the performance benchmark. Currently there is not mechanism to monitor the results This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL
ChaiBapchya commented on issue #17980: URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-629477511 Yes logs agree with these statements. But the perf difference isn't visible via opperf. My bad, lhs, rhs wrongly interpreted. But that's for dot,batch_dot. FC values for data, weight are same as yours. CPU : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Instance p3.8xl This instance doesn't have avx512 ``` $ lscpu | grep Flags | grep avx Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ``` OMP_Threading i used was default. However trying FC with OMP_NUM_THREADS=4, still shows that workaround makes FC slower [despite it using MKL] than default[which doesn't use MKL] ### Workaround [slow] ``` MKL_VERBOSE SGEMM(T,N,512,5,2048,0x7f0b79ff8c28,0x7f0b62cfb040,2048,0x7f0b6800b740,2048,0x7f0b79ff8c30,0x7f0b68074e80,512) 119.93us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:4 dnnl_verbose,exec,cpu,inner_product,gemm:blas,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.142822 ``` ``` {'inputs': {'data': (4, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_forward_FullyConnected': 0.0925}, {'inputs': {'data': (5, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_forward_FullyConnected': 0.0921}, {'inputs': {'data': (5, 512), 'weight': (1536, 512), 'no_bias': True, 'num_hidden': 1536}, 'avg_time_forward_FullyConnected': 0.1441}, {'inputs': {'data': (5, 512), 'weight': (2048, 512), 'no_bias': True, 'num_hidden': 2048}, 'avg_time_forward_FullyConnected': 0.1688}, {'inputs': {'data': (5, 2048), 'weight': (512, 2048), 'no_bias': True, 'num_hidden': 512}, 'avg_time_forward_FullyConnected': 0.1773}]}] ``` ### Default [faster] ``` dnnl_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.11499 dnnl_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.115967 ``` ``` {'inputs': {'data': (4, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_forward_FullyConnected': 0.0691}, {'inputs': {'data': (5, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_forward_FullyConnected': 0.0669}, {'inputs': {'data': (5, 512), 'weight': (1536, 512), 'no_bias': True, 'num_hidden': 1536}, 'avg_time_forward_FullyConnected': 0.1166}, {'inputs': {'data': (5, 512), 'weight': (2048, 512), 'no_bias': True, 'num_hidden': 2048}, 'avg_time_forward_FullyConnected': 0.1438}, {'inputs': {'data': (5, 2048), 'weight': (512, 2048), 'no_bias': True, 'num_hidden': 512}, 'avg_time_forward_FullyConnected': 0.1509}]}] ``` ## Python's time-it function in OpPerf Also, instead of using MXNet's native [built-in] profiler, if i use python time-it function still ### Workaround [slow] ``` MKL_VERBOSE SGEMM(T,N,512,5,2048,0x7f86067f9c28,0x7f85e34fd040,2048,0x7f85f000b740,2048,0x7f86067f9c30,0x7f85f005fc80,512) 120.56us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:4 dnnl_verbose,exec,cpu,inner_product,gemm:blas,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.143066 ``` ``` [{'FullyConnected': [ {'inputs': {'data': (4, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_FullyConnected': 0.18504947423934937, 'p50_time_FullyConnected': 0.1817569718696177, 'p90_time_FullyConnected': 0.19327133195474744, 'p99_time_FullyConnected': 0.21545797004364425}, {'inputs': {'data': (5, 512), 'weight': (512, 512), 'no_bias': True, 'num_hidden': 512}, 'avg_time_FullyConnected': 0.1922117848880589, 'p50_time_FullyConnected': 0.18283800454810262, 'p90_time_FullyConnected': 0.19954713061451912, 'p99_time_FullyConnected': 0.36291924072429527}, {'inputs': {'data': (5, 512), 'weight': (1536, 512), 'no_bias': True, 'num_hidden': 1536}, 'avg_time_FullyConnected': 0.24743830785155296, 'p50_time_FullyConnected': 0.23927190341055393, 'p90_time_FullyConnected': 0.25965459644794464, 'p99_time_FullyConnected': 0.3613424429204313}, {'inputs': {'data': (5, 512), 'weight': (2048, 512), 'no_bias': True, 'num_hidden': 2048}, 'avg_time_FullyConnected': 0.2728361077606678,
[GitHub] [incubator-mxnet] ys2843 edited a comment on pull request #18288: Website global search feature
ys2843 edited a comment on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-628990327 > Looks good. > > * Should we use offsite js files or host locally? > * Should the API key be in a js file are can it be in a config file so we keep this stuff in someplace central? > * Wonder if we could have the list of versions also in a config file and not in .js, so it is more obvious and easier to maintain... then other parts of the site like the install selector or other things that need the list of versions can share? > > Just some thoughts... otherwise this looks good to me. > Good work! Should we use offsite js files or host locally? + Based on the research, the CDN works pretty well in China. I will keep an eye on it, if there is problem loading this file in any country, I will switch to host the file locally. Should the API key be in a js file are can it be in a config file so we keep this stuff in someplace central? + Didn't move API key to Jekyll at last, because Jekyll does not work very well with files such as `assets/*.js`, the way to get variable from Jekyll to JS is not straight forward ( use `_include`), please correct me if I am wrong. And after this version is archived, it is also easier to maintain if the key is kept in 1 JS file, but not spread to every page by Jekyll include
[GitHub] [incubator-mxnet] ys2843 edited a comment on pull request #18288: Website global search feature
ys2843 edited a comment on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-628990327 > Looks good. > > * Should we use offsite js files or host locally? > * Should the API key be in a js file are can it be in a config file so we keep this stuff in someplace central? > * Wonder if we could have the list of versions also in a config file and not in .js, so it is more obvious and easier to maintain... then other parts of the site like the install selector or other things that need the list of versions can share? > > Just some thoughts... otherwise this looks good to me. > Good work! Should we use offsite js files or host locally? + Based on the research, the CDN works pretty well in China. I will keep an eye on it, if there is problem loading this file in any country, I will switch to host the file locally. Should the API key be in a js file are can it be in a config file so we keep this stuff in someplace central? + Didn't move API key to Jekyll at last, because Jekyll does not work with files such as `assets/*.js` very well, the way to get variable from Jekyll to JS is not straight forward ( use `_include`), please correct me if I am wrong. And after this version is archived, it is also easier to maintain if the key is kept in 1 JS file, but not spread to every page by Jekyll include
[GitHub] [incubator-mxnet-ci] ChaiBapchya commented on a change in pull request #25: Serverless implementation for jenkins pipeline monitor lambda
ChaiBapchya commented on a change in pull request #25: URL: https://github.com/apache/incubator-mxnet-ci/pull/25#discussion_r426024021 ## File path: services/jenkins-pipeline-monitor/handler.py ## @@ -0,0 +1,134 @@ +import os +import boto3 +import json +import logging +import secret_manager + +from jenkinsapi.jenkins import Jenkins + +logging.getLogger().setLevel(logging.INFO) +logging.getLogger('boto3').setLevel(logging.CRITICAL) +logging.getLogger('botocore').setLevel(logging.CRITICAL) + + +def get_jenkins_obj(secret): +""" +This method returns an object of Jenkins instantiated using username, password +""" +jenkins_url, jenkins_username, jenkins_password = os.environ["JENKINS_URL"], secret["jenkins_username"], secret["jenkins_password"] +return Jenkins(jenkins_url, username=jenkins_username, password=jenkins_password) + + +def get_secret(): +""" +This method is to get secret value from Secrets Manager +""" +secret = json.loads(secret_manager.get_secret()) +return secret + + +def get_pipeline_job(jenkinsObj): +job = jenkinsObj["restricted-mxnet-cd/mxnet-cd-release-job"] +return job + + +def get_latest_build_number(job): +return job.get_last_build().get_number() + + +def get_build_from_build_number(job, build_number): +return job.get_build(build_number) + + +def get_build_timestamp(build): +return build.get_timestamp() + + +def get_build_date(timestamp): +return timestamp.date() + + +def is_latest_day_build(current_build, latest_build): +current_build_timestamp = get_build_timestamp(current_build) +latest_build_timestamp = get_build_timestamp(latest_build) +# if 2 builds are within 24 hours and on the same day Review comment: Addressed. https://github.com/apache/incubator-mxnet-ci/pull/25/commits/2da87ac0424dd68b799cd24ded81e27ebc3a0969 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet-ci] mseth10 commented on a change in pull request #25: Serverless implementation for jenkins pipeline monitor lambda
mseth10 commented on a change in pull request #25: URL: https://github.com/apache/incubator-mxnet-ci/pull/25#discussion_r425988017 ## File path: services/jenkins-pipeline-monitor/handler.py ## @@ -0,0 +1,134 @@ +import os +import boto3 +import json +import logging +import secret_manager + +from jenkinsapi.jenkins import Jenkins + +logging.getLogger().setLevel(logging.INFO) +logging.getLogger('boto3').setLevel(logging.CRITICAL) +logging.getLogger('botocore').setLevel(logging.CRITICAL) + + +def get_jenkins_obj(secret): +""" +This method returns an object of Jenkins instantiated using username, password +""" +jenkins_url, jenkins_username, jenkins_password = os.environ["JENKINS_URL"], secret["jenkins_username"], secret["jenkins_password"] +return Jenkins(jenkins_url, username=jenkins_username, password=jenkins_password) + + +def get_secret(): +""" +This method is to get secret value from Secrets Manager +""" +secret = json.loads(secret_manager.get_secret()) +return secret + + +def get_pipeline_job(jenkinsObj): +job = jenkinsObj["restricted-mxnet-cd/mxnet-cd-release-job"] +return job + + +def get_latest_build_number(job): +return job.get_last_build().get_number() + + +def get_build_from_build_number(job, build_number): +return job.get_build(build_number) + + +def get_build_timestamp(build): +return build.get_timestamp() + + +def get_build_date(timestamp): +return timestamp.date() + + +def is_latest_day_build(current_build, latest_build): +current_build_timestamp = get_build_timestamp(current_build) +latest_build_timestamp = get_build_timestamp(latest_build) +# if 2 builds are within 24 hours and on the same day Review comment: As per offline discussion, we'll trigger the lambda at 2 am UTC and check for CD pipelines triggered in the last 8 hours. That will cover all builds for the day. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet-ci] mseth10 commented on a change in pull request #25: Serverless implementation for jenkins pipeline monitor lambda
mseth10 commented on a change in pull request #25: URL: https://github.com/apache/incubator-mxnet-ci/pull/25#discussion_r425988017 ## File path: services/jenkins-pipeline-monitor/handler.py ## @@ -0,0 +1,134 @@ +import os +import boto3 +import json +import logging +import secret_manager + +from jenkinsapi.jenkins import Jenkins + +logging.getLogger().setLevel(logging.INFO) +logging.getLogger('boto3').setLevel(logging.CRITICAL) +logging.getLogger('botocore').setLevel(logging.CRITICAL) + + +def get_jenkins_obj(secret): +""" +This method returns an object of Jenkins instantiated using username, password +""" +jenkins_url, jenkins_username, jenkins_password = os.environ["JENKINS_URL"], secret["jenkins_username"], secret["jenkins_password"] +return Jenkins(jenkins_url, username=jenkins_username, password=jenkins_password) + + +def get_secret(): +""" +This method is to get secret value from Secrets Manager +""" +secret = json.loads(secret_manager.get_secret()) +return secret + + +def get_pipeline_job(jenkinsObj): +job = jenkinsObj["restricted-mxnet-cd/mxnet-cd-release-job"] +return job + + +def get_latest_build_number(job): +return job.get_last_build().get_number() + + +def get_build_from_build_number(job, build_number): +return job.get_build(build_number) + + +def get_build_timestamp(build): +return build.get_timestamp() + + +def get_build_date(timestamp): +return timestamp.date() + + +def is_latest_day_build(current_build, latest_build): +current_build_timestamp = get_build_timestamp(current_build) +latest_build_timestamp = get_build_timestamp(latest_build) +# if 2 builds are within 24 hours and on the same day Review comment: As per offline discussion, we'll trigger the lambda at 2 am UTC and check for CD pipelines triggered in the last 8 hours. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #18330: test_metric_performance
ChaiBapchya edited a comment on issue #18330: URL: https://github.com/apache/incubator-mxnet/issues/18330#issuecomment-629044532 http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18326/1/pipeline http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-18327/runs/1/nodes/365/log/?start=0 http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18324/3/pipeline #18324 #18326 #18327 unrelated PRs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18324: [OpPerf] Add example of using opperf with internal op locally
ChaiBapchya commented on pull request #18324: URL: https://github.com/apache/incubator-mxnet/pull/18324#issuecomment-629445905 @mxnet-bot run ci [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18324: [OpPerf] Add example of using opperf with internal op locally
mxnet-bot commented on pull request #18324: URL: https://github.com/apache/incubator-mxnet/pull/18324#issuecomment-629445934 Jenkins CI successfully triggered : [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on pull request #18331: Remove test metric perf
ChaiBapchya edited a comment on pull request #18331: URL: https://github.com/apache/incubator-mxnet/pull/18331#issuecomment-629444500 Quick history check reveals this file was added by @safrooze in #9705 Can we move this to nightly since that was planned? or do we get rid of it altogether? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18331: Remove test metric perf
ChaiBapchya commented on pull request #18331: URL: https://github.com/apache/incubator-mxnet/pull/18331#issuecomment-629444500 Quick history check reveals this file was added by @safrooze in #9705 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on pull request #18331: Remove test metric perf
ChaiBapchya edited a comment on pull request #18331: URL: https://github.com/apache/incubator-mxnet/pull/18331#issuecomment-629444500 Quick history check reveals this file was added by @safrooze in #9705 Can we move this to nightly since that was planned? or do we get rid of it altogether? The test errors out This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ys2843 edited a comment on pull request #18288: Website global search feature
ys2843 edited a comment on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629441115 > Very nice. Thank you @ys2843 This will be very helpful for the community. > Please update the release manager checklist for changing the defaults. > > Should we change anything with algolia (like default version) whenever there is a new release? > Also, for updating the search results, what is the process? What do you mean by improving the search result. > > Overall, this is really nice improvement for easily finding the content. Thank you. Thank you for reviewing. Will update the release manager checklist. > Should we change anything with algolia (like default version) whenever there is a new release? No, we don't need any change from algolia. We only need to update website code to set the default version. I will document this part. > Also, for updating the search results, what is the process? What do you mean by improving the search result. To make the search result more relevant and accurate, the process is to update the [configuration of DocSearch's web crawler](https://github.com/algolia/docsearch-configs/blob/master/configs/apache_mxnet.json). Currently the config is not ideal, and the best way to update and test the config is to run the web crawler locally. I have set up the crawler on an ec2 and run it with new config, the data is sent to my test account for review, then I can make adjustment to the config accordingly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18326: [R] Fix incorrect copyto usage & incorrect website title for Symbol API in R
ChaiBapchya commented on pull request #18326: URL: https://github.com/apache/incubator-mxnet/pull/18326#issuecomment-629442039 @mxnet-label-bot add [pr-awaiting-review] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18130: Set cache_intermediate to True
ChaiBapchya commented on pull request #18130: URL: https://github.com/apache/incubator-mxnet/pull/18130#issuecomment-629442396 Closing since this one is useful for local development & not on CI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya closed pull request #18130: Set cache_intermediate to True
ChaiBapchya closed pull request #18130: URL: https://github.com/apache/incubator-mxnet/pull/18130 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ys2843 commented on pull request #18288: Website global search feature
ys2843 commented on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629441115 > Very nice. Thank you @ys2843 This will be very helpful for the community. > Please update the release manager checklist for changing the defaults. > > Should we change anything with algolia (like default version) whenever there is a new release? > Also, for updating the search results, what is the process? What do you mean by improving the search result. > > Overall, this is really nice improvement for easily finding the content. Thank you. Thank you for reviewing. Will update the release manager checklist. > Should we change anything with algolia (like default version) whenever there is a new release? No, we don't need any change from algolia. We only need to update website code to set the default version. I will document this part. > Also, for updating the search results, what is the process? What do you mean by improving the search result. To make the search result more relevant and accurate, the process is to update the [configuration of DocSearch's web crawler](https://github.com/algolia/docsearch-configs/blob/master/configs/apache_mxnet.json). Currently the config is not ideal, and the best way to update and test the config is to run the web crawler locally. I have set up the crawler on an ec2 and run it with new config, and then make adjustment according to the result from the crawler. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18284: Fix parse operator attributes in new FFI
mxnet-bot commented on pull request #18284: URL: https://github.com/apache/incubator-mxnet/pull/18284#issuecomment-629428729 Jenkins CI successfully triggered : [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on pull request #18284: Fix parse operator attributes in new FFI
leezu commented on pull request #18284: URL: https://github.com/apache/incubator-mxnet/pull/18284#issuecomment-629428687 @mxnet-bot run ci [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] sandeep-krishnamurthy commented on pull request #18288: Website global search feature
sandeep-krishnamurthy commented on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629426892 @szha / @leezu - FYI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] sandeep-krishnamurthy commented on pull request #18288: Website global search feature
sandeep-krishnamurthy commented on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629425112 @mxnet-bot run ci [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18288: Website global search feature
mxnet-bot commented on pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#issuecomment-629425161 Jenkins CI successfully triggered : [unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] sandeep-krishnamurthy commented on a change in pull request #18288: Website global search feature
sandeep-krishnamurthy commented on a change in pull request #18288: URL: https://github.com/apache/incubator-mxnet/pull/18288#discussion_r425990851 ## File path: docs/static_site/src/assets/js/globalSearch.js ## @@ -0,0 +1,152 @@ +/*! + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/* Installation page display functions for install selector. + This utility allows direct links to specific install instructions. +*/ + +$(document).ready(function () { + const DEFAULT_CURRENT_VERSION = "1.6"; + const VERSIONS = [ +"master", Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet-ci] mseth10 commented on a change in pull request #25: Serverless implementation for jenkins pipeline monitor lambda
mseth10 commented on a change in pull request #25: URL: https://github.com/apache/incubator-mxnet-ci/pull/25#discussion_r425988017 ## File path: services/jenkins-pipeline-monitor/handler.py ## @@ -0,0 +1,134 @@ +import os +import boto3 +import json +import logging +import secret_manager + +from jenkinsapi.jenkins import Jenkins + +logging.getLogger().setLevel(logging.INFO) +logging.getLogger('boto3').setLevel(logging.CRITICAL) +logging.getLogger('botocore').setLevel(logging.CRITICAL) + + +def get_jenkins_obj(secret): +""" +This method returns an object of Jenkins instantiated using username, password +""" +jenkins_url, jenkins_username, jenkins_password = os.environ["JENKINS_URL"], secret["jenkins_username"], secret["jenkins_password"] +return Jenkins(jenkins_url, username=jenkins_username, password=jenkins_password) + + +def get_secret(): +""" +This method is to get secret value from Secrets Manager +""" +secret = json.loads(secret_manager.get_secret()) +return secret + + +def get_pipeline_job(jenkinsObj): +job = jenkinsObj["restricted-mxnet-cd/mxnet-cd-release-job"] +return job + + +def get_latest_build_number(job): +return job.get_last_build().get_number() + + +def get_build_from_build_number(job, build_number): +return job.get_build(build_number) + + +def get_build_timestamp(build): +return build.get_timestamp() + + +def get_build_date(timestamp): +return timestamp.date() + + +def is_latest_day_build(current_build, latest_build): +current_build_timestamp = get_build_timestamp(current_build) +latest_build_timestamp = get_build_timestamp(latest_build) +# if 2 builds are within 24 hours and on the same day Review comment: As per offline discussion, checking for last 6 hours would suffice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new e3cceb2 Bump the publish timestamp. e3cceb2 is described below commit e3cceb2ba28e168473b10ae3427fa87a50e47ddb Author: mxnet-ci AuthorDate: Fri May 15 18:48:13 2020 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..dc047eb --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Fri May 15 18:48:13 UTC 2020
[GitHub] [incubator-mxnet-ci] ChaiBapchya commented on a change in pull request #25: Serverless implementation for jenkins pipeline monitor lambda
ChaiBapchya commented on a change in pull request #25: URL: https://github.com/apache/incubator-mxnet-ci/pull/25#discussion_r425978610 ## File path: services/jenkins-pipeline-monitor/handler.py ## @@ -0,0 +1,134 @@ +import os +import boto3 +import json +import logging +import secret_manager + +from jenkinsapi.jenkins import Jenkins + +logging.getLogger().setLevel(logging.INFO) +logging.getLogger('boto3').setLevel(logging.CRITICAL) +logging.getLogger('botocore').setLevel(logging.CRITICAL) + + +def get_jenkins_obj(secret): +""" +This method returns an object of Jenkins instantiated using username, password +""" +jenkins_url, jenkins_username, jenkins_password = os.environ["JENKINS_URL"], secret["jenkins_username"], secret["jenkins_password"] +return Jenkins(jenkins_url, username=jenkins_username, password=jenkins_password) + + +def get_secret(): +""" +This method is to get secret value from Secrets Manager +""" +secret = json.loads(secret_manager.get_secret()) +return secret + + +def get_pipeline_job(jenkinsObj): +job = jenkinsObj["restricted-mxnet-cd/mxnet-cd-release-job"] +return job + + +def get_latest_build_number(job): +return job.get_last_build().get_number() + + +def get_build_from_build_number(job, build_number): +return job.get_build(build_number) + + +def get_build_timestamp(build): +return build.get_timestamp() + + +def get_build_date(timestamp): +return timestamp.date() + + +def is_latest_day_build(current_build, latest_build): +current_build_timestamp = get_build_timestamp(current_build) +latest_build_timestamp = get_build_timestamp(latest_build) +# if 2 builds are within 24 hours and on the same day Review comment: So we are May 7 11:59pm UTC [lambda is triggered] Latest build as of May 7 11:59pm UTC is _Build 1088_ [May 7 9:54:39pm] http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-release-job/1088/ If we relax the same day check & only keep the 24hour check. So all builds after May 6 9:55:39 pm will be accepted. 1082 to 1088 Here 1082 isn't supposed to be selected as it was part of the previous day trigger _Build 1082_ May 6, 2020, 9:57:08 PM [which is not supposed to be picked] http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-release-job/1082/ Hence we need both checks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet-ci] ChaiBapchya commented on a change in pull request #25: Serverless implementation for jenkins pipeline monitor lambda
ChaiBapchya commented on a change in pull request #25: URL: https://github.com/apache/incubator-mxnet-ci/pull/25#discussion_r425978610 ## File path: services/jenkins-pipeline-monitor/handler.py ## @@ -0,0 +1,134 @@ +import os +import boto3 +import json +import logging +import secret_manager + +from jenkinsapi.jenkins import Jenkins + +logging.getLogger().setLevel(logging.INFO) +logging.getLogger('boto3').setLevel(logging.CRITICAL) +logging.getLogger('botocore').setLevel(logging.CRITICAL) + + +def get_jenkins_obj(secret): +""" +This method returns an object of Jenkins instantiated using username, password +""" +jenkins_url, jenkins_username, jenkins_password = os.environ["JENKINS_URL"], secret["jenkins_username"], secret["jenkins_password"] +return Jenkins(jenkins_url, username=jenkins_username, password=jenkins_password) + + +def get_secret(): +""" +This method is to get secret value from Secrets Manager +""" +secret = json.loads(secret_manager.get_secret()) +return secret + + +def get_pipeline_job(jenkinsObj): +job = jenkinsObj["restricted-mxnet-cd/mxnet-cd-release-job"] +return job + + +def get_latest_build_number(job): +return job.get_last_build().get_number() + + +def get_build_from_build_number(job, build_number): +return job.get_build(build_number) + + +def get_build_timestamp(build): +return build.get_timestamp() + + +def get_build_date(timestamp): +return timestamp.date() + + +def is_latest_day_build(current_build, latest_build): +current_build_timestamp = get_build_timestamp(current_build) +latest_build_timestamp = get_build_timestamp(latest_build) +# if 2 builds are within 24 hours and on the same day Review comment: So we are May 9 11:59pm UTC [lambda is triggered] Latest build as of May 9 11:59pm UTC is _Build 1088_ [May 7 9:54:39pm] http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-release-job/1088/ If we relax the same day check & only keep the 24hour check. So all builds after May 6 9:55:39 pm will be accepted. 1082 to 1088 Here 1082 isn't supposed to be selected as it was part of the previous day trigger _Build 1082_ May 6, 2020, 9:57:08 PM [which is not supposed to be picked] http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-release-job/1082/ Hence we need both checks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] marcoabreu commented on pull request #18252: [CI] run operator tests with naive engine
marcoabreu commented on pull request #18252: URL: https://github.com/apache/incubator-mxnet/pull/18252#issuecomment-629411739 What's our strategy to test multi threaded engine? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #18325: Change SGD with momentum to include momentum correction by default
ptrendx commented on a change in pull request #18325: URL: https://github.com/apache/incubator-mxnet/pull/18325#discussion_r425969856 ## File path: python/mxnet/optimizer/lars.py ## @@ -252,6 +253,13 @@ def fused_step(self, indices, weights, grads, states): wd = wds[i] lr = lrs[i] lr *= self._get_lars(index, weight, grad, wd) +# normal SGD picks up momentum correction by default, +# so need to modify the momentum to undo that. +# The correction term is previous_lr / current_lr. +kwargs['momentum'] = (self.momentum * (self.last_lr / lr)) \ Review comment: Thanks, done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] JonTanS closed issue #16822: Trouble building Mxnet 1.6.x
JonTanS closed issue #16822: URL: https://github.com/apache/incubator-mxnet/issues/16822 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] kpu commented on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL
kpu commented on issue #17980: URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-629400657 `dot` hasn't been taken over by MKLDNN = DNNL = oneAPI yet, so you shouldn't expect to see a difference there because it never calls MKLDNN's GEMM anyway. FullyConnected has been taken over by MKLDNN = DNNL = oneAPI. This inconsistency is part of the bug report. Your logs agree with these statements. Do you really mean LHS (4, 512, 512)? I'm talking about LHS (4,512) RHS (512, 512). What CPU is this on? Note that mine were on a Skylake Xeon `c9.9xlarge`. I'd expect the AVX512 kernels to be pretty different from the AVX2 kernels. What OMP threading were you using? I'd expect for matrices this small that thread scaling is terrible for anything but a tiny number of threads. (I mentioned using 4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL
ChaiBapchya edited a comment on issue #17980: URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-629385677 > Tested with MXNet [cfb474b](https://github.com/apache/incubator-mxnet/commit/cfb474ba743d5ea85161bf19875488f4cb409d3c). Compiled with mostly-default cmake settings: > > ```shell > cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release .. > ``` > > Then when I run > > ``` > export MKL_VERBOSE=1 > export MKLDNN_VERBOSE=1 > python3 > Python 3.6.9 (default, Nov 7 2019, 10:44:02) > [GCC 8.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import mxnet as mx > Numpy + Intel(R) MKL: THREADING LAYER: (null) > Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime > Numpy + Intel(R) MKL: preloading libiomp5.so runtime > ``` @kpuatamazon @kpu Running on Ubuntu 18.04 [which doesn't have MKL installed by default] with default cmake config doesn't use MKL as blas. Hence we can't get the above exports. Thus for Ubuntu 18.04 base AMI, one has to install MKL in /opt/intel & update the cmake command to ``` cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` This I found uses mkl as blas & export MKL_VERBOSE=1 confirms it. With this addition to both [default & workaround] I reran the opperf & I didn't see much perf differences. ## Commands Default ``` cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` Workaround ``` export CXXFLAGS="${CXXFLAGS} -DUSE_MKL -I/opt/intel/mkl/include" cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` ## Logs Default Batch_dot ``` MKL_VERBOSE SGEMM_BATCH(T,N,0x7fdafe19ecec,0x7fdafe19ecf0,0x7fdafe19ecf4,0x7fdafe19ed04,0x7fda6001dfd0,0x7fdafe19ecf8,0x7fda6001e490,0x7fdafe19ecfc,0x7fdafe19ed08,0x3fd7ec0,0x7fdafe19ed00,0x7fdafe19ec28,0x7fdafe 28.71ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16 MKL_VERBOSE SGEMM_BATCH(T,N,0x7fdafe19ecec,0x7fdafe19ecf0,0x7fdafe19ecf4,0x7fdafe19ed04,0x7fda6001dfd0,0x7fdafe19ecf8,0x7fda6001e490,0x7fdafe19ecfc,0x7fdafe19ed08,0x3fd7ec0,0x7fdafe19ed00,0x7fdafe19ec28,0x7fdafe 28.53ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16 ``` FC ``` dnnl_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.0551758 dnnl_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.0559082 ``` Workaround Batch_dot [same as default] ``` MKL_VERBOSE SGEMM_BATCH(T,N,0x7f985b78acec,0x7f985b78acf0,0x7f985b78acf4,0x7f985b78ad04,0x7f97b4016cd0,0x7f985b78acf8,0x7f97b401e550,0x7f985b78acfc,0x7f985b78ad08,0x26f2890,0x7f985b78ad00,0x7f985b78ac28,0x7f985b 28.72ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16 MKL_VERBOSE SGEMM_BATCH(T,N,0x7f985b78acec,0x7f985b78acf0,0x7f985b78acf4,0x7f985b78ad04,0x7f97b4016cd0,0x7f985b78acf8,0x7f97b401e550,0x7f985b78acfc,0x7f985b78ad08,0x26f2890,0x7f985b78ad00,0x7f985b78ac28,0x7f985b 28.77ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16 ``` FC [additional MKL_VERBOSE before DNNL_VERBOSE] ``` MKL_VERBOSE SGEMM(T,N,512,4,512,0x7f985b789c28,0x7f97b5e52e80,512,0x7f97b401e600,512,0x7f985b789c30,0x7f976e89dd80,512) 39.68us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16 dnnl_verbose,exec,cpu,inner_product,gemm:blas,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb4ic512oc512,0.0769043 MKL_VERBOSE SGEMM(T,N,512,5,2048,0x7f985b789c28,0x7f976c400100,2048,0x7f976e887100,2048,0x7f985b789c30,0x7f976e8da000,512) 79.41us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16 dnnl_verbose,exec,cpu,inner_product,gemm:blas,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb5ic2048oc512,0.11377 ``` ## Results Operator| LHS | RHS | MKL Default | MKL Workaround| |--- | | |- | | | Dot| (4, 512, 512) | (4, 512, 512) | 4.1112 | 4.8241| || (5, 512, 512) | (5, 512, 512) | 6.4421 | 7.607 | || (5, 512, 1536 | (5, 512, 1536)| 20.3648 | 19.2217 | || (5, 512, 2048)| (5, 512, 2048)| 23.3236 | 23.2849 | || (5, 2048, 512)| (5, 2048, 512)| 123.1235 | 123.9806 | || | | | | | Batch_dot | (4,
[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL
ChaiBapchya edited a comment on issue #17980: URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-629385677 > Tested with MXNet [cfb474b](https://github.com/apache/incubator-mxnet/commit/cfb474ba743d5ea85161bf19875488f4cb409d3c). Compiled with mostly-default cmake settings: > > ```shell > cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release .. > ``` > > Then when I run > > ``` > export MKL_VERBOSE=1 > export MKLDNN_VERBOSE=1 > python3 > Python 3.6.9 (default, Nov 7 2019, 10:44:02) > [GCC 8.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import mxnet as mx > Numpy + Intel(R) MKL: THREADING LAYER: (null) > Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime > Numpy + Intel(R) MKL: preloading libiomp5.so runtime > ``` @kpuatamazon @kpu Running on Ubuntu 18.04 [which doesn't have MKL installed by default] with default cmake config doesn't use MKL as blas. Hence we can't get the above exports. Thus for Ubuntu 18.04 base AMI, one has to install MKL in /opt/intel & update the cmake command to ``` cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` This I found uses mkl as blas & export MKL_VERBOSE=1 confirms it. With this addition to both [default & workaround] I reran the opperf & I didn't see much perf differences. Default ``` cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` Workaround ``` export CXXFLAGS="${CXXFLAGS} -DUSE_MKL -I/opt/intel/mkl/include" cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` Results Operator| LHS | RHS | MKL Default | MKL Workaround| |--- | | |- | | | Dot| (4, 512, 512) | (4, 512, 512) | 4.1112 | 4.8241| || (5, 512, 512) | (5, 512, 512) | 6.4421 | 7.607 | || (5, 512, 1536 | (5, 512, 1536)| 20.3648 | 19.2217 | || (5, 512, 2048)| (5, 512, 2048)| 23.3236 | 23.2849 | || (5, 2048, 512)| (5, 2048, 512)| 123.1235 | 123.9806 | || | | | | | Batch_dot | (4, 512, 512) | (4, 512, 512) | 1.4105 | 1.407 | || (5, 512, 512) | (5, 512, 512) | 1.7558 | 1.7511| || (5, 512, 1536)| (5, 512, 1536)| 6.5931 | 6.5585| || (5, 512, 2048)| (5, 512, 2048)| 9.1452 | 9.1031| || (5, 2048, 512)| (5, 2048, 512)| 29.0192 | 28.9236 | Operator| Data | Weight| MKL Default | MKL Workaround| |--- | | |- | | | FC | (4, 512) | (512, 512)| 0.057 | 0.0685| || (5, 512) | (512, 512)| 0.0591 | 0.0698| || (5, 512) | (1536, 512) | 0.0823 | 0.0939| || (5, 512) | (2048, 512) | 0.0916 | 0.1026| || (5, 2048) | (512, 2048) | 0.1146 | 0.1267| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] szha closed issue #18334: test_numpy_ndarray.py::test_np_ndarray_boolean_indexing
szha closed issue #18334: URL: https://github.com/apache/incubator-mxnet/issues/18334 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL
ChaiBapchya commented on issue #17980: URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-629385677 > Tested with MXNet [cfb474b](https://github.com/apache/incubator-mxnet/commit/cfb474ba743d5ea85161bf19875488f4cb409d3c). Compiled with mostly-default cmake settings: > > ```shell > cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release .. > ``` > > Then when I run > > ``` > export MKL_VERBOSE=1 > export MKLDNN_VERBOSE=1 > python3 > Python 3.6.9 (default, Nov 7 2019, 10:44:02) > [GCC 8.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import mxnet as mx > Numpy + Intel(R) MKL: THREADING LAYER: (null) > Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime > Numpy + Intel(R) MKL: preloading libiomp5.so runtime > ``` Running on Ubuntu 18.04 [which doesn't have MKL installed by default] with default cmake config doesn't use MKL as blas. Hence we can't get the above exports. Thus for Ubuntu 18.04 base AMI, one has to install MKL in /opt/intel & update the cmake command to ``` cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` This I found uses mkl as blas & export MKL_VERBOSE=1 confirms it. With this addition to both [default & workaround] I reran the opperf & I didn't see much perf differences. Default ``` cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` Workaround ``` export CXXFLAGS="${CXXFLAGS} -DUSE_MKL -I/opt/intel/mkl/include" cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DUSE_BLAS=mkl .. ``` Results Operator| LHS | RHS | MKL Default | MKL Workaround| |--- | | |- | | | Dot| (4, 512, 512) | (4, 512, 512) | 4.1112 | 4.8241| || (5, 512, 512) | (5, 512, 512) | 6.4421 | 7.607 | || (5, 512, 1536 | (5, 512, 1536)| 20.3648 | 19.2217 | || (5, 512, 2048)| (5, 512, 2048)| 23.3236 | 23.2849 | || (5, 2048, 512)| (5, 2048, 512)| 123.1235 | 123.9806 | || | | | | | Batch_dot | (4, 512, 512) | (4, 512, 512) | 1.4105 | 1.407 | || (5, 512, 512) | (5, 512, 512) | 1.7558 | 1.7511| || (5, 512, 1536)| (5, 512, 1536)| 6.5931 | 6.5585| || (5, 512, 2048)| (5, 512, 2048)| 9.1452 | 9.1031| || (5, 2048, 512)| (5, 2048, 512)| 29.0192 | 28.9236 | Operator| Data | Weight| MKL Default | MKL Workaround| |--- | | |- | | | FC | (4, 512) | (512, 512)| 0.057 | 0.0685| || (5, 512) | (512, 512)| 0.0591 | 0.0698| || (5, 512) | (1536, 512) | 0.0823 | 0.0939| || (5, 512) | (2048, 512) | 0.0916 | 0.1026| || (5, 2048) | (512, 2048) | 0.1146 | 0.1267| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL
ChaiBapchya edited a comment on issue #17980: URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-623803913 > In case somebody finds this issue and wants their optimized build, here is a different workaround that removes the need for `LD_PRELOAD`. Just do this before running cmake the first time: > > ```shell > export CXXFLAGS="${CXXFLAGS} -DUSE_MKL -I/opt/intel/mkl/include" > ``` > > Then `cmake` can be run normally: > > ```shell > cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release .. > ``` > > and the compiled MXNet can be run normally without any special environment variables. @kpuatamazon Hi I was trying to benchmark using opperf for mkl [default] vs workaround And despite ensuring mkl is installed & using export CXXFlags followed by usual cmake command, build failed with ``` gemm.cpp:(.text+0xb45): undefined reference to `cblas_gemm_s8u8s32' ``` I tried the undocumented abominable kludge option you mentioned and that worked smoothly. ``` export LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_rt.so rm -rf build/ mkdir -p build && cd build cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -D_DNNL_USE_MKL=FULL -DMKLINC=/opt/intel/mkl/include .. cmake --build . --parallel 1024 ``` Script for OpPerf : https://gist.github.com/ChaiBapchya/5f2342f75ddeb1e21f14acac665c76ad Results | Operator | LHS | RHS | MKL Default | MKL Workaround| |--- | | |- | | | Dot| (4, 512, 512) | (4, 512, 512) | 15.1122 | 4.1254| || (5, 512, 512) | (5, 512, 512) | 38.1678 | 7.5323| || (5, 512, 1536 | (5, 512, 1536)| 21.6601 | 19.2503 | || (5, 512, 2048)| (5, 512, 2048)| 29.0369 | 23.7432 | || (5, 2048, 512)| (5, 2048, 512)| 167.5528 | 129.9957 | || | | | | | Batch_dot | (4, 512, 512) | (4, 512, 512) | 1.7898 | 1.5445| || (5, 512, 512) | (5, 512, 512) | 2.2457 | 1.9361| || (5, 512, 1536)| (5, 512, 1536)| 6.1453 | 5.4034| || (5, 512, 2048)| (5, 512, 2048)| 8.246 | 8.0442| || (5, 2048, 512)| (5, 2048, 512)| 160.6243 | 29.0772 | || | | | | || **Data** | **Weight** | | | | FC | (4, 512) | (512, 512)| 0.0609 | 0.068 | || (5, 512) | (512, 512)| 0.0633 | 0.0731| || (5, 512) | (1536, 512) | 0.0916 | 0.0996| || (5, 512) | (2048, 512) | 0.1081 | 0.1084| || | | | | However @kpuatamazon when I try to test out with default [i.e. default -> workaround -> default] by unsetting the environment variable LD_PRELOAD, it failed to build default with `gemm.cpp:(.text+0xe6b): undefined reference to `cblas_gemm_s8u8s32'` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet] branch master updated (fec534a -> 3e676fc)
This is an automated email from the ASF dual-hosted git repository. lausen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. from fec534a Fix missing MKLDNN headers (#18310) add 3e676fc Fix memory leaks in Gluon (#18328) No new revisions were added by this update. Summary of changes: python/mxnet/gluon/block.py| 25 tests/python/unittest/test_gluon.py| 38 ++ tests/python/unittest/test_thread_local.py | 5 ++-- 3 files changed, 56 insertions(+), 12 deletions(-)
[incubator-mxnet] branch master updated (fec534a -> 3e676fc)
This is an automated email from the ASF dual-hosted git repository. lausen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. from fec534a Fix missing MKLDNN headers (#18310) add 3e676fc Fix memory leaks in Gluon (#18328) No new revisions were added by this update. Summary of changes: python/mxnet/gluon/block.py| 25 tests/python/unittest/test_gluon.py| 38 ++ tests/python/unittest/test_thread_local.py | 5 ++-- 3 files changed, 56 insertions(+), 12 deletions(-)
[GitHub] [incubator-mxnet] leezu commented on pull request #18328: Fix memory leaks in Gluon
leezu commented on pull request #18328: URL: https://github.com/apache/incubator-mxnet/pull/18328#issuecomment-629372543 @ciyongch should we backport this to 1.7? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu merged pull request #18328: Fix memory leaks in Gluon
leezu merged pull request #18328: URL: https://github.com/apache/incubator-mxnet/pull/18328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] szha opened a new issue #18334: test_numpy_ndarray.py::test_np_ndarray_boolean_indexing
szha opened a new issue #18334: URL: https://github.com/apache/incubator-mxnet/issues/18334 ## Description test_numpy_ndarray.py::test_np_ndarray_boolean_indexing ## Occurrences http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/centos-cpu/branches/PR-18252/runs/9/nodes/236/steps/284/log/?start=0 ## What have you tried to solve it? 1. rewrite and break down the test. 2. mark as flaky This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] John1231983 opened a new issue #18333: How to implement cross entropy for binary segmentation using symbol only?
John1231983 opened a new issue #18333: URL: https://github.com/apache/incubator-mxnet/issues/18333 Given a softmax classified feature ff (`Bx2xHxW`). And a target label size of `Bx1xHxW` . I want to implement cross entropy loss using symbol only. This is my implementation ``` # target size of Bx1xHxW target_squeeze = mx.symbol.squeeze(target, axis=1) #size of BxHxW target_squeeze = mx.sym.one_hot(target_squeeze, depth = 2, on_value = -1.0, off_value = 0.0) # Transpose from BxHxWx2 to Bx2xHxW target_squeeze = mx.symbol.transpose(target_squeeze, axes=(0,3,1,2)) # Get log of feature f f_log = mx.sym.log(f) batch_size =32 f_sum = mx.symbol.sum(target_squeeze * f_log)/batch_size f_sum = mx.symbol.MakeLoss(f_sum, name = 'loss_ce') ``` Is my implementation correct? If not, please correct it for me. Thanks in advantage This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] D-Roberts commented on a change in pull request #18197: [Numpy] Add qr backward part 2 for wide matrices with m < n
D-Roberts commented on a change in pull request #18197: URL: https://github.com/apache/incubator-mxnet/pull/18197#discussion_r425903506 ## File path: src/operator/numpy/linalg/np_qr-inl.h ## @@ -542,36 +591,119 @@ void QrBackwardImpl(const TBlob& grad_a, const nnvm::NodeAttrs& attrs) { Stream *s = ctx.get_stream(); const mxnet::TShape& a_shape = a.shape_; + const mxnet::TShape& q_shape = q.shape_; const mxnet::TShape& r_shape = r.shape_; const int a_ndim = a_shape.ndim(); + const int m = a.size(a_ndim - 2); const int n = a.size(a_ndim - 1); if (kNullOp == req[0]) { return; } if (0U == a_shape.Size()) { return; } MSHADOW_SGL_DBL_TYPE_SWITCH(grad_a.type_flag_, DType, { -// case m >= n; Q of same shape with A and R is (n, n) -DType *m_ptr = reinterpret_cast(workspace.dptr_); -DType *grad_a_ptr = m_ptr + r_shape.Size(); -TBlob temp_m(m_ptr, r_shape, xpu::kDevMask); +// common for all shapes (m, n) +DType *grad_a_ptr = reinterpret_cast(workspace.dptr_); TBlob grad_a_data(grad_a_ptr, a_shape, xpu::kDevMask); -// dR_T -mxnet_op::Kernel::Launch( - s, r_shape.Size(), grad_r.dptr(), m_ptr, n, n, n * n); - -qr_backward::op(grad_a_data.FlatToKD(s), -grad_q.FlatToKD(s), -grad_r.FlatToKD(s), -a.FlatToKD(s), -q.FlatToKD(s), -r.FlatToKD(s), -temp_m.FlatToKD(s), -ctx, attrs); - +if (m >= n) { + // Q of same shape with A (m, n) and R is (n, n) + DType *m_ptr = grad_a_ptr + a_shape.Size(); + TBlob temp_m(m_ptr, r_shape, xpu::kDevMask); + // dR_T + mxnet_op::Kernel::Launch( +s, r_shape.Size(), grad_r.dptr(), m_ptr, n, n, n * n); + qr_backward::op(grad_a_data.FlatToKD(s), + grad_q.FlatToKD(s), + grad_r.FlatToKD(s), + q.FlatToKD(s), + r.FlatToKD(s), + temp_m.FlatToKD(s), + ctx, attrs); +} else { + // R is same shape with A (m, n) and Q is (m, m) + // Partition A = (X | Y); R = (U | V) + // X and U are (m, m); Y and V are (m, n - m) + mxnet::TShape v_shape(q_shape); + v_shape[a_ndim - 1] = n - m; + + DType *m_ptr = grad_a_ptr + a_shape.Size(); + DType *u_ptr = m_ptr + q_shape.Size(); + DType *dq_prime_ptr = u_ptr + q_shape.Size(); + DType *dv_ptr = dq_prime_ptr + q_shape.Size(); + DType *y_ptr = dv_ptr + v_shape.Size(); + DType *du_ptr = y_ptr + v_shape.Size(); + DType *dx_ptr = du_ptr + q_shape.Size(); + DType *dy_ptr = dx_ptr + q_shape.Size(); + + TBlob temp_m(m_ptr, q_shape, xpu::kDevMask); + TBlob u_data(u_ptr, q_shape, xpu::kDevMask); + TBlob dq_prime_data(dq_prime_ptr, q_shape, xpu::kDevMask); + TBlob dv_data(dv_ptr, v_shape, xpu::kDevMask); + TBlob y_data(y_ptr, v_shape, xpu::kDevMask); + TBlob du_data(du_ptr, q_shape, xpu::kDevMask); + TBlob dx_data(dx_ptr, q_shape, xpu::kDevMask); + TBlob dy_data(dy_ptr, v_shape, xpu::kDevMask); + + Tensor R = r.FlatToKD(s); + Tensor dR = grad_r.FlatToKD(s); + Tensor Q = q.FlatToKD(s); + Tensor dQ = grad_q.FlatToKD(s); + Tensor dQ_prime = dq_prime_data.FlatToKD(s); + Tensor A = a.FlatToKD(s); + Tensor dA = grad_a_data.FlatToKD(s); + Tensor U = u_data.FlatToKD(s); + Tensor dU = du_data.FlatToKD(s); + Tensor dV = dv_data.FlatToKD(s); + Tensor Y = y_data.FlatToKD(s); + Tensor dX = dx_data.FlatToKD(s); + Tensor dY = dy_data.FlatToKD(s); + Tensor M = temp_m.FlatToKD(s); + + // U + for (index_t i = 0; i < R.size(0); ++i) { +const Tensor& Ri = R[i]; +const Tensor& Ui = U[i]; +Tensor Um(Ri.dptr_, Shape2(m, m), Ri.stride_, s); +Copy(Ui, Um, s); + } + // dU + for (index_t i = 0; i < dR.size(0); ++i) { +const Tensor& dRi = dR[i]; +const Tensor& dUi = dU[i]; +Tensor dUm(dRi.dptr_, Shape2(m, m), dRi.stride_, s); +Copy(dUi, dUm, s); + } + // Y + mxnet_op::Kernel::Launch( +s, A.size(0), m, n, A.dptr_, A.stride_, Y.dptr_, Y.stride_); + // dV + mxnet_op::Kernel::Launch( +s, dR.size(0), m, n, dR.dptr_, dR.stride_, dV.dptr_, dV.stride_); + // store dU_T in M + mxnet_op::Kernel::Launch( +s, q_shape.Size(), dU.dptr_, m_ptr, m, m, m * m); + // dq_prime = dQ + Copy(dQ_prime, dQ, s); + // dq_prime = dQ+Y@dV.T + gemm::op(Y, dV, dQ_prime, DType(1.0), DType(1.0), false, true, s); + // dX = op call + qr_backward::op(dX, + dQ_prime, + dU, + Q, + U, + M, + ctx, attrs); + // dY = Q@dV + gemm::op(Q, dV, dY,
[GitHub] [incubator-mxnet] hzfan commented on a change in pull request #18197: [Numpy] Add qr backward part 2 for wide matrices with m < n
hzfan commented on a change in pull request #18197: URL: https://github.com/apache/incubator-mxnet/pull/18197#discussion_r425852127 ## File path: src/operator/numpy/linalg/np_qr-inl.h ## @@ -542,36 +591,119 @@ void QrBackwardImpl(const TBlob& grad_a, const nnvm::NodeAttrs& attrs) { Stream *s = ctx.get_stream(); const mxnet::TShape& a_shape = a.shape_; + const mxnet::TShape& q_shape = q.shape_; const mxnet::TShape& r_shape = r.shape_; const int a_ndim = a_shape.ndim(); + const int m = a.size(a_ndim - 2); const int n = a.size(a_ndim - 1); if (kNullOp == req[0]) { return; } if (0U == a_shape.Size()) { return; } MSHADOW_SGL_DBL_TYPE_SWITCH(grad_a.type_flag_, DType, { -// case m >= n; Q of same shape with A and R is (n, n) -DType *m_ptr = reinterpret_cast(workspace.dptr_); -DType *grad_a_ptr = m_ptr + r_shape.Size(); -TBlob temp_m(m_ptr, r_shape, xpu::kDevMask); +// common for all shapes (m, n) +DType *grad_a_ptr = reinterpret_cast(workspace.dptr_); TBlob grad_a_data(grad_a_ptr, a_shape, xpu::kDevMask); -// dR_T -mxnet_op::Kernel::Launch( - s, r_shape.Size(), grad_r.dptr(), m_ptr, n, n, n * n); - -qr_backward::op(grad_a_data.FlatToKD(s), -grad_q.FlatToKD(s), -grad_r.FlatToKD(s), -a.FlatToKD(s), -q.FlatToKD(s), -r.FlatToKD(s), -temp_m.FlatToKD(s), -ctx, attrs); - +if (m >= n) { + // Q of same shape with A (m, n) and R is (n, n) + DType *m_ptr = grad_a_ptr + a_shape.Size(); + TBlob temp_m(m_ptr, r_shape, xpu::kDevMask); + // dR_T + mxnet_op::Kernel::Launch( +s, r_shape.Size(), grad_r.dptr(), m_ptr, n, n, n * n); + qr_backward::op(grad_a_data.FlatToKD(s), + grad_q.FlatToKD(s), + grad_r.FlatToKD(s), + q.FlatToKD(s), + r.FlatToKD(s), + temp_m.FlatToKD(s), + ctx, attrs); +} else { + // R is same shape with A (m, n) and Q is (m, m) + // Partition A = (X | Y); R = (U | V) + // X and U are (m, m); Y and V are (m, n - m) + mxnet::TShape v_shape(q_shape); + v_shape[a_ndim - 1] = n - m; + + DType *m_ptr = grad_a_ptr + a_shape.Size(); + DType *u_ptr = m_ptr + q_shape.Size(); + DType *dq_prime_ptr = u_ptr + q_shape.Size(); + DType *dv_ptr = dq_prime_ptr + q_shape.Size(); + DType *y_ptr = dv_ptr + v_shape.Size(); + DType *du_ptr = y_ptr + v_shape.Size(); + DType *dx_ptr = du_ptr + q_shape.Size(); + DType *dy_ptr = dx_ptr + q_shape.Size(); + + TBlob temp_m(m_ptr, q_shape, xpu::kDevMask); + TBlob u_data(u_ptr, q_shape, xpu::kDevMask); + TBlob dq_prime_data(dq_prime_ptr, q_shape, xpu::kDevMask); + TBlob dv_data(dv_ptr, v_shape, xpu::kDevMask); + TBlob y_data(y_ptr, v_shape, xpu::kDevMask); + TBlob du_data(du_ptr, q_shape, xpu::kDevMask); + TBlob dx_data(dx_ptr, q_shape, xpu::kDevMask); + TBlob dy_data(dy_ptr, v_shape, xpu::kDevMask); + + Tensor R = r.FlatToKD(s); + Tensor dR = grad_r.FlatToKD(s); + Tensor Q = q.FlatToKD(s); + Tensor dQ = grad_q.FlatToKD(s); + Tensor dQ_prime = dq_prime_data.FlatToKD(s); + Tensor A = a.FlatToKD(s); + Tensor dA = grad_a_data.FlatToKD(s); + Tensor U = u_data.FlatToKD(s); + Tensor dU = du_data.FlatToKD(s); + Tensor dV = dv_data.FlatToKD(s); + Tensor Y = y_data.FlatToKD(s); + Tensor dX = dx_data.FlatToKD(s); + Tensor dY = dy_data.FlatToKD(s); + Tensor M = temp_m.FlatToKD(s); + + // U + for (index_t i = 0; i < R.size(0); ++i) { +const Tensor& Ri = R[i]; +const Tensor& Ui = U[i]; +Tensor Um(Ri.dptr_, Shape2(m, m), Ri.stride_, s); +Copy(Ui, Um, s); + } + // dU + for (index_t i = 0; i < dR.size(0); ++i) { +const Tensor& dRi = dR[i]; +const Tensor& dUi = dU[i]; +Tensor dUm(dRi.dptr_, Shape2(m, m), dRi.stride_, s); +Copy(dUi, dUm, s); + } + // Y + mxnet_op::Kernel::Launch( +s, A.size(0), m, n, A.dptr_, A.stride_, Y.dptr_, Y.stride_); + // dV + mxnet_op::Kernel::Launch( +s, dR.size(0), m, n, dR.dptr_, dR.stride_, dV.dptr_, dV.stride_); + // store dU_T in M + mxnet_op::Kernel::Launch( +s, q_shape.Size(), dU.dptr_, m_ptr, m, m, m * m); + // dq_prime = dQ + Copy(dQ_prime, dQ, s); + // dq_prime = dQ+Y@dV.T + gemm::op(Y, dV, dQ_prime, DType(1.0), DType(1.0), false, true, s); + // dX = op call + qr_backward::op(dX, + dQ_prime, + dU, + Q, + U, + M, + ctx, attrs); + // dY = Q@dV + gemm::op(Q, dV, dY,
[GitHub] [incubator-mxnet] hzfan commented on a change in pull request #17749: Fix races in block scope
hzfan commented on a change in pull request #17749: URL: https://github.com/apache/incubator-mxnet/pull/17749#discussion_r425799124 ## File path: python/mxnet/gluon/block.py ## @@ -91,22 +92,22 @@ def create(prefix, params, hint): def __enter__(self): if self._block._empty_prefix: return self -self._old_scope = getattr(_BlockScope._current, "value", None) +self._local._old_scope = getattr(_BlockScope._current, "value", None) _BlockScope._current.value = self Review comment: Seems that it has been thread-local ? https://github.com/apache/incubator-mxnet/blob/f3e8dc14de881a5b8ccc3ff5985167f8d6fa2fb8/python/mxnet/gluon/block.py#L47 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18197: [Numpy] Add qr backward part 2 for wide matrices with m < n
mxnet-bot commented on pull request #18197: URL: https://github.com/apache/incubator-mxnet/pull/18197#issuecomment-629227942 Jenkins CI successfully triggered : [centos-cpu, unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] D-Roberts commented on pull request #18197: [Numpy] Add qr backward part 2 for wide matrices with m < n
D-Roberts commented on pull request #18197: URL: https://github.com/apache/incubator-mxnet/pull/18197#issuecomment-629227877 @mxnet-bot run ci [centos-cpu, unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #18325: Change SGD with momentum to include momentum correction by default
wkcn commented on a change in pull request #18325: URL: https://github.com/apache/incubator-mxnet/pull/18325#discussion_r425780192 ## File path: python/mxnet/optimizer/lars.py ## @@ -252,6 +253,13 @@ def fused_step(self, indices, weights, grads, states): wd = wds[i] lr = lrs[i] lr *= self._get_lars(index, weight, grad, wd) +# normal SGD picks up momentum correction by default, +# so need to modify the momentum to undo that. +# The correction term is previous_lr / current_lr. +kwargs['momentum'] = (self.momentum * (self.last_lr / lr)) \ Review comment: Thank you for the fix! It will reduce the times of getting an item of `self.last_lr`. ```python kwargs['momentum'] = (self.momentum * (self.last_lr.get(index, lr) / lr)) \ if lr != 0 else \ self.momentum ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. aaronmarkham pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 4327a7f Bump the publish timestamp. 4327a7f is described below commit 4327a7fc85f8b43c91c8a1782e83820e5b633187 Author: mxnet-ci AuthorDate: Fri May 15 12:48:04 2020 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..1fd79c6 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Fri May 15 12:48:04 UTC 2020
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18320: Improve log_softmax op performance by using DNNL support
mxnet-bot commented on pull request #18320: URL: https://github.com/apache/incubator-mxnet/pull/18320#issuecomment-629132145 Jenkins CI successfully triggered : [unix-cpu, centos-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] bgawrych commented on pull request #18320: Improve log_softmax op performance by using DNNL support
bgawrych commented on pull request #18320: URL: https://github.com/apache/incubator-mxnet/pull/18320#issuecomment-629132092 @mxnet-bot run ci [centos-cpu, unix-cpu] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] Kh4L opened a new pull request #18332: Fix FInferType of ops to support partial type inference
Kh4L opened a new pull request #18332: URL: https://github.com/apache/incubator-mxnet/pull/18332 ## Description ## As described in #16757 , a list of operators `FInferType` functions were throwing an exception if the input type is not defined (dtype=`-1`). This was preventing us from running a partial type inference where some not (yet) defined input `dtype`s would trigger an exception. The expected behavior is that `FInferType` returns false when the type cannot be inferred correctly, a boolean value that the executor inference pass uses to wet the `Graph` attribute `dtype_num_unknown_nodes`. This attribute is then used by the executor or any graph modules (e.g. Partition API) to determine the correctness, or know if a new inference pass is required. This PR fixes it by making the `FInferType` functions return `false` when the input dtype is `-1`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18332: Fix FInferType of ops to support partial type inference
mxnet-bot commented on pull request #18332: URL: https://github.com/apache/incubator-mxnet/pull/18332#issuecomment-629131254 Hey @Kh4L , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: - To trigger all jobs: @mxnet-bot run ci [all] - To trigger specific jobs: @mxnet-bot run ci [job1, job2] *** **CI supported jobs**: [clang, unix-gpu, edge, website, centos-gpu, unix-cpu, centos-cpu, miscellaneous, windows-cpu, sanity, windows-gpu] *** _Note_: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] BenjaminCHEN2016 commented on pull request #18313: [Fix][Numpy][WIP] fix mix numpy scalar and MXNet numpy ndarray
BenjaminCHEN2016 commented on pull request #18313: URL: https://github.com/apache/incubator-mxnet/pull/18313#issuecomment-629091482 Conversion rules: | Expression | a type | b type | out type| | --- | --- | --- | --- | | `a += b` | onp | mx_np | onp | | `a += b` | mx_np | onp | mx_np | | `c = a + b` | onp | mx_np | mx_np | | `c = a + b` | mx_np | onp | mx_np | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18160: [1.x] Add BatchNormWithReLU bf16 into AMP list
mxnet-bot commented on pull request #18160: URL: https://github.com/apache/incubator-mxnet/pull/18160#issuecomment-629089260 Jenkins CI successfully triggered : [sanity] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] xinyu-intel commented on pull request #18160: [1.x] Add BatchNormWithReLU bf16 into AMP list
xinyu-intel commented on pull request #18160: URL: https://github.com/apache/incubator-mxnet/pull/18160#issuecomment-629089225 @mxnet-bot run ci [sanity] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] xidulu commented on issue #18068: [Numpy] Failed for Boolean type
xidulu commented on issue #18068: URL: https://github.com/apache/incubator-mxnet/issues/18068#issuecomment-629089392 Thanks for your issue, This is indeed a very interesting use case. I noticed that this problem also occurred in PyTorch: ```python >>> inputs = torch.tensor([1,1,1]) >>> fake_data = torch.tensor([1,1,0]) >>> 1 - (inputs == fake_data) Traceback (most recent call last): File "", line 1, in File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 394, in __rsub__ return _C._VariableFunctions.rsub(self, other) RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead. >>> ``` If we are capable of providing a solution to this case, that would be a huge advantage over PyTorch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] acphile opened a new pull request #18331: No test metric perf
acphile opened a new pull request #18331: URL: https://github.com/apache/incubator-mxnet/pull/18331 related to https://github.com/apache/incubator-mxnet/issues/18330 We think that "test_metric_perf" does not enforce anything and currently there are speed issues related to using mxnet.numpy. We should simple delete the test because it only prints the elapsed time. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18331: No test metric perf
mxnet-bot commented on pull request #18331: URL: https://github.com/apache/incubator-mxnet/pull/18331#issuecomment-629086999 Hey @acphile , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: - To trigger all jobs: @mxnet-bot run ci [all] - To trigger specific jobs: @mxnet-bot run ci [job1, job2] *** **CI supported jobs**: [windows-cpu, centos-cpu, windows-gpu, unix-cpu, centos-gpu, unix-gpu, clang, website, sanity, miscellaneous, edge] *** _Note_: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] sxjscience commented on pull request #18328: Fix memory leaks in Gluon
sxjscience commented on pull request #18328: URL: https://github.com/apache/incubator-mxnet/pull/18328#issuecomment-629079777 @Jerryzcn I think this is also related to your previous benchmark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on issue #18330: test_metric_performance
leezu commented on issue #18330: URL: https://github.com/apache/incubator-mxnet/issues/18330#issuecomment-629077891 The test doesn't actually enforce anything. It's output is not monitored (AFAIK) so it may be best to simply delete this test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on pull request #9705: [MXNET-91] Added unittest for benchmarking metric performance
leezu commented on pull request #9705: URL: https://github.com/apache/incubator-mxnet/pull/9705#issuecomment-629076060 Why does this test only print numbers but doesn't actually enforce anything? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org