[GitHub] [incubator-mxnet] ciyongch commented on a change in pull request #15950: [MKLDNN]Support fullyconnected and element-wise ops fusion
ciyongch commented on a change in pull request #15950: [MKLDNN]Support fullyconnected and element-wise ops fusion URL: https://github.com/apache/incubator-mxnet/pull/15950#discussion_r316002034 ## File path: src/operator/subgraph/mkldnn/mkldnn_fc_property.h ## @@ -68,44 +73,70 @@ class SgMKLDNNFCSelector : public SubgraphSelector { } bool SelectOutput(const nnvm::Node , const nnvm::Node _node) override { -if (status == kFail || status == kSuccess || new_node.is_variable()) +if (status_ == kFail || status_ == kSuccess || new_node.is_variable()) return false; // If n isn't the last matched node, then we encoutered a internal // branch, we should pop out the node behind n and stop fusion. -if (matched_list.back() != ) { - if (std::find(matched_list.begin(), matched_list.end(), ) != -matched_list.end()) { -while (matched_list.back() != ) { - matched_list.pop_back(); +if (matched_list_.back() != ) { + if (std::find(matched_list_.begin(), matched_list_.end(), ) != +matched_list_.end()) { +while (matched_list_.back() != ) { + matched_list_.pop_back(); } } - status = kSuccess; + status_ = kSuccess; return false; } -switch (status) { +switch (status_) { case kStart: -if (new_node.op() == Op::Get("Activation") && -new_node.attrs.dict.at("act_type") == "relu") { - matched_list.push_back(_node); - status = kSuccess; +// Currently, For INT8 FC fusion, only supports relu/bounded_relu(clip)/abs. +if (new_node.op() == Op::Get("Activation")) { + const ActivationParam = nnvm::get(new_node.attrs.parsed); + if ((quantized_ && SupportQuantizedMKLDNNAct(param)) || + (!quantized_ && SupportMKLDNNAct(param))) { +matched_list_.push_back(_node); +status_ = kSuccess; +return true; + } +} +if (!quantized_ && (new_node.op() == Op::Get("square") || +new_node.op() == Op::Get("sqrt") || +new_node.op() == Op::Get("exp"))) { + matched_list_.push_back(_node); + status_ = kSuccess; + return true; +} +if (new_node.op() == Op::Get("abs")) { + matched_list_.push_back(_node); + status_ = kSuccess; return true; } +if (new_node.op() == Op::Get("clip")) { + const ClipParam = nnvm::get(new_node.attrs.parsed); + if (param.a_min == 0.f && param.a_max == 1.0f) { Review comment: Good catch, will remove this check for a_max for `bounded_relu`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] hzfan commented on a change in pull request #15938: Tvm broadcast backward
hzfan commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r316000342 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); +for (int i = 0; i < ndim; ++i) { Review comment: I think we can pad the shape after dimension collapse. In this case, the `tblob` will be reshaped into `(2, 12, 30, 1, 1)` and then reduce on `axis=[1, 3]`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #15545: Softmax optimization for GPU
apeforest commented on issue #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#issuecomment-523292097 Thanks for refactoring the Softmax functions to make it into one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] reminisce commented on a change in pull request #15938: Tvm broadcast backward
reminisce commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315995425 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); +for (int i = 0; i < ndim; ++i) { Review comment: Please correct me if my understanding is wrong, but don't you still need kernels generated for `ndims < 5` since you will collapse consecutive dimensions where reduction is performed? For example, given a 5d shape `(2, 3, 4, 5, 6)`, and perform reduction on `axis=(1, 2)`, the `tblob` will be first reshaped into `(2, 12, 30)`, and then reduce on `axis=1`. In this case, do you need a kernel generated for 3D shapes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15545: Softmax optimization for GPU
apeforest commented on a change in pull request #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#discussion_r315994975 ## File path: src/operator/nn/softmax-inl.h ## @@ -188,89 +180,77 @@ struct log_softmax_bwd { } }; - template + typename AType, typename DType, typename OType, typename IType, int ndim> inline void SoftmaxGrad(Stream *s, OType *out, OType *ograd, -DType *igrad, Shape shape, int axis, -const DType temperature) { +DType *igrad, IType *length, Shape shape, +int axis, const DType temperature) { index_t M = shape[axis]; index_t N = shape.Size()/M; Shape stride = calc_stride(shape); Shape sshape = shape; sshape[axis] = 1; index_t sa = stride[axis]; - #pragma omp parallel for - for (index_t i = 0; i < N; ++i) { -index_t base = unravel_dot(i, sshape, stride); + if (length != nullptr) { +#pragma omp parallel for +for (index_t i = 0; i < N; ++i) { + index_t base = unravel_dot(i, sshape, stride); + index_t len = static_cast(length[i]); -AType sum = AType(0); -for (index_t j = 0; j < M; ++j) { - sum += OP1::Map(ograd[base + j*sa], out[base + j*sa]); -} - -// By default temperature is 1.0. -// Adding a branch here to save the CPU 'divide-by-1' computation at runtime -DType final_result; -if (temperature == 1.0) { - for (index_t j = 0; j < M; ++j) { -final_result = negate ? - -OP2::Map(ograd[base + j*sa], out[base + j*sa], sum) : - OP2::Map(ograd[base + j*sa], out[base + j*sa], sum); -KERNEL_ASSIGN(igrad[base + j*sa], Req, final_result); - } -} else { - for (index_t j = 0; j < M; ++j) { -final_result = negate ? - -OP2::Map(ograd[base + j*sa], out[base + j*sa], sum) / temperature : - OP2::Map(ograd[base + j*sa], out[base + j*sa], sum) / temperature; -KERNEL_ASSIGN(igrad[base + j*sa], Req, final_result); + AType sum = AType(0); + for (index_t j = 0; j < len; ++j) { +sum += OP1::Map(ograd[base + j*sa], out[base + j*sa]); } -} - } -} -template -inline void SoftmaxWithLengthGrad(Stream *s, OType *out, OType *ograd, - DType *igrad, IType *length, Shape shape, - int axis, const DType temperature) { - index_t M = shape[axis]; - index_t N = shape.Size()/M; - Shape stride = calc_stride(shape); - Shape sshape = shape; - sshape[axis] = 1; - index_t sa = stride[axis]; - - #pragma omp parallel for - for (index_t i = 0; i < N; ++i) { -index_t base = unravel_dot(i, sshape, stride); -index_t len = static_cast(length[i]); - -AType sum = AType(0); -for (index_t j = 0; j < len; ++j) { - sum += OP1::Map(ograd[base + j*sa], out[base + j*sa]); + // By default temperature is 1.0. + // Adding a branch here to save the CPU 'divide-by-1' computation at runtime + DType final_result; + if (temperature == 1.0) { Review comment: Yes, I have done performance comparison earlier. This check speed up the operator by 30% This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] mahmoodn opened a new issue #15957: error: call of overloaded is ambiguous
mahmoodn opened a new issue #15957: error: call of overloaded is ambiguous URL: https://github.com/apache/incubator-mxnet/issues/15957 With gcc 7.4 and cuda 10 and mxnet-1.4.1, I get this compilation error ``` include/mxnet/././ndarray.h:169:78: error: call of overloaded ‘NodeEntry()’ is ambiguous dtype_(data.type_flag_), storage_type_(stype), entry_({nullptr, 0, 0}) { ``` details: ``` g++ -std=c++11 -c -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -DNDEBUG=1 -I/home/mh.naderan/mx/mxnet-1.4.1/3rdparty/mshadow/ -I/home/mh.naderan/mx/mxnet-1.4.1/3rdparty/dmlc-core/include -fPIC -I/home/mh.naderan/mx/mxnet-1.4.1/3rdparty/tvm/nnvm/include -I/home/mh.naderan/mx/mxnet-1.4.1/3rdparty/dlpack/include -I/home/mh.naderan/mx/mxnet-1.4.1/3rdparty/tvm/include -Iinclude -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -DMSHADOW_USE_F16C=0 -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_OPENCV=1 -I/usr/include/opencv -fopenmp -DMXNET_USE_OPERATOR_TUNING=1 -DMSHADOW_USE_CUDNN=1 -I/home/mh.naderan/mx/mxnet-1.4.1/warp-ctc/include -I/home/mh.naderan/mx/mxnet-1.4.1/3rdparty/cub -DMXNET_ENABLE_CUDA_RTC=1 -DMXNET_USE_NCCL=0 -DMXNET_USE_LIBJPEG_TURBO=0 -MMD -c src/operator/nn/mkldnn/mkldnn_act.cc -o build/src/operator/nn/mkldnn/mkldnn_act.o In file included from include/mxnet/./base.h:432:0, from include/mxnet/operator.h:38, from src/operator/nn/mkldnn/mkldnn_act.cc:28: include/mxnet/././tensor_blob.h: In member function ‘virtual void dmlc::parameter::FieldEntry::Check(void*) const’: include/mxnet/././tensor_blob.h:438:39: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (expect_ndim_ != 0 && v.ndim() != expect_ndim_) { ~^~~ include/mxnet/././tensor_blob.h:445:36: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (mxnet::index_t i = 0; i < v.ndim(); ++i) { ~~^~ In file included from include/mxnet/./op_attr_types.h:36:0, from include/mxnet/operator.h:40, from src/operator/nn/mkldnn/mkldnn_act.cc:28: include/mxnet/././ndarray.h: In constructor ‘mxnet::NDArray::NDArray(const TShape&, mxnet::Context, bool, int)’: include/mxnet/././ndarray.h:98:31: error: call of overloaded ‘NodeEntry()’ is ambiguous entry_({nullptr, 0, 0}) { ^ In file included from include/mxnet/operator.h:33:0, from src/operator/nn/mkldnn/mkldnn_act.cc:28: /home/mh.naderan/mx/mxnet-1.4.1/3rdparty/tvm/nnvm/include/nnvm/node.h:59:12: note: candidate: nnvm::NodeEntry::NodeEntry(nnvm::NodePtr) explicit NodeEntry(NodePtr node): ^ /home/mh.naderan/mx/mxnet-1.4.1/3rdparty/tvm/nnvm/include/nnvm/node.h:52:8: note: candidate: nnvm::NodeEntry::NodeEntry(const nnvm::NodeEntry&) struct NodeEntry { ^ /home/mh.naderan/mx/mxnet-1.4.1/3rdparty/tvm/nnvm/include/nnvm/node.h:52:8: note: candidate: nnvm::NodeEntry::NodeEntry(nnvm::NodeEntry&&) In file included from include/mxnet/./op_attr_types.h:36:0, from include/mxnet/operator.h:40, from src/operator/nn/mkldnn/mkldnn_act.cc:28: include/mxnet/././ndarray.h: In constructor ‘mxnet::NDArray::NDArray(const mxnet::TBlob&, int)’: include/mxnet/././ndarray.h:128:31: error: call of overloaded ‘NodeEntry()’ is ambiguous entry_({nullptr, 0, 0}) { ^ In file included from include/mxnet/operator.h:33:0, from src/operator/nn/mkldnn/mkldnn_act.cc:28: /home/mh.naderan/mx/mxnet-1.4.1/3rdparty/tvm/nnvm/include/nnvm/node.h:59:12: note: candidate: nnvm::NodeEntry::NodeEntry(nnvm::NodePtr) explicit NodeEntry(NodePtr node): ^ /home/mh.naderan/mx/mxnet-1.4.1/3rdparty/tvm/nnvm/include/nnvm/node.h:52:8: note: candidate: nnvm::NodeEntry::NodeEntry(const nnvm::NodeEntry&) struct NodeEntry { ^ /home/mh.naderan/mx/mxnet-1.4.1/3rdparty/tvm/nnvm/include/nnvm/node.h:52:8: note: candidate: nnvm::NodeEntry::NodeEntry(nnvm::NodeEntry&&) In file included from include/mxnet/./op_attr_types.h:36:0, from include/mxnet/operator.h:40, from src/operator/nn/mkldnn/mkldnn_act.cc:28: include/mxnet/././ndarray.h: In constructor ‘mxnet::NDArray::NDArray(const mxnet::TBlob&, int, const std::function&)’: include/mxnet/././ndarray.h:147:31: error: call of overloaded ‘NodeEntry()’ is ambiguous entry_({nullptr, 0, 0}) { ^ In file included from
[GitHub] [incubator-mxnet] xidulu opened a new pull request #15956: [Numpy] random.randint() implemented
xidulu opened a new pull request #15956: [Numpy] random.randint() implemented URL: https://github.com/apache/incubator-mxnet/pull/15956 ## Description ## np.random.randint() https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] hzfan commented on a change in pull request #15938: Tvm broadcast backward
hzfan commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315983914 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); Review comment: What about expand it into a if-else? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] hzfan commented on a change in pull request #15938: Tvm broadcast backward
hzfan commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315983703 ## File path: contrib/tvmop/basic/ufunc.py ## @@ -48,3 +50,71 @@ def vadd_gpu(dtype, ndim): s[C].bind(bx, tvm.thread_axis("blockIdx.x")) s[C].bind(tx, tvm.thread_axis("threadIdx.x")) return s, [A, B, C] + + +def assign_by_req(a, req): Review comment: Shall we use the existing contrib/tvmop/utils.py or create a contrib/tvmop/basic/common.py? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] hzfan commented on a change in pull request #15938: Tvm broadcast backward
hzfan commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315980043 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); +for (int i = 0; i < ndim; ++i) { Review comment: Yes, `igrad.ndim = ograd.ndim` is assumed. @yzhliu suggests padding the input to 5-dim, which is the largest possible dim supported by this op. The padding will 1) reduce the number of kernels (by a factor of 5) 2) handle the `igrad.ndim < ograd.ndim` issue. But there may be loss in performance. I think prepending axes before `igrad` to make it `ograd.dim` requires more kernels, but the performance is better. It is a tradeoff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] DickJC123 opened a new pull request #15955: Debug laop 6
DickJC123 opened a new pull request #15955: Debug laop 6 URL: https://github.com/apache/incubator-mxnet/pull/15955 ## Description ## Do not merge this PR. This PR is experimenting with seeds of the test_laop_6 unittest. The PR is seeking to learn if errors seen with the 64-bit windows build toolchain are observable on the 32-bit windows build ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] hzfan commented on a change in pull request #15938: Tvm broadcast backward
hzfan commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315980043 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); +for (int i = 0; i < ndim; ++i) { Review comment: Yes, `igrad.ndim = ograd.ndim` is assumed. @yzhliu suggests padding the input to 5-dim, which is the largest possible dim supported by this op. The padding will 1) reduce the number of kernels 2) handle the `igrad.ndim < ograd.ndim` issue. But there may be loss in performance. I think prepending axes before `igrad` to make it `ograd.dim` requires more kernels, but the performance is better. It is a tradeoff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 7fb8860 Bump the publish timestamp. 7fb8860 is described below commit 7fb886057d57b4dd6f0e17a73906a0468ed95868 Author: mxnet-ci AuthorDate: Wed Aug 21 01:38:35 2019 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..29f87f1 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Wed Aug 21 01:38:35 UTC 2019
[GitHub] [incubator-mxnet] marcoabreu commented on issue #15167: Pointwise fusion for GPU
marcoabreu commented on issue #15167: Pointwise fusion for GPU URL: https://github.com/apache/incubator-mxnet/pull/15167#issuecomment-523250049 Okay, so lets have a discussion on dev@ and then agree on a style. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #15063: Rename np_compat to np_shape
larroy commented on a change in pull request #15063: Rename np_compat to np_shape URL: https://github.com/apache/incubator-mxnet/pull/15063#discussion_r315958800 ## File path: include/mxnet/c_api.h ## @@ -1067,14 +1067,14 @@ MXNET_DLL int MXAutogradIsTraining(bool* curr); * \param curr returns the current status * \return 0 when success, -1 when failure happens */ -MXNET_DLL int MXIsNumpyCompatible(bool* curr); +MXNET_DLL int MXIsNumpyShape(bool* curr); /*! * \brief set numpy compatibility switch - * \param is_np_comp 1 when numpy compatibility is on, 0 when off + * \param is_np_shape 1 when numpy shape semantics is on, 0 when off * \param prev returns the previous status before this set * \return 0 when success, -1 when failure happens */ -MXNET_DLL int MXSetIsNumpyCompatible(int is_np_comp, int* prev); +MXNET_DLL int MXSetIsNumpyShape(int is_np_shape, int* prev); Review comment: This changes how array format is expected when loading from disk. It would be good to document this on the function call. If NumpyShape is on, then Loading ndarray will fail. Just happened to me now. I think this is a side effect which should be clearly documented, or can we add additional arguments to Load with the different semantics? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] anirudh2290 merged pull request #15574: fix naive engine for multi-threaded inference
anirudh2290 merged pull request #15574: fix naive engine for multi-threaded inference URL: https://github.com/apache/incubator-mxnet/pull/15574 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated (308e4ac -> d225074)
This is an automated email from the ASF dual-hosted git repository. anirudh2290 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. from 308e4ac Adding tests to verify support for Large Tensors in additional Ops along with new C_Apis supporting 64bit indexing (#15895) add d225074 fix naive engine for multi-threaded inference (#15574) No new revisions were added by this update. Summary of changes: src/engine/naive_engine.cc | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-)
[GitHub] [incubator-mxnet] haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523243342 @larroy I think the author of #13818 is @larroy , isn't it? I was quoting your comment: https://github.com/apache/incubator-mxnet/pull/13818#issuecomment-452919438 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #15545: Softmax optimization for GPU
larroy commented on a change in pull request #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#discussion_r315954114 ## File path: src/operator/nn/softmax-inl.h ## @@ -313,71 +294,134 @@ __global__ void softmax_compute_kernel(DType *in, OType *out, index_t M, int axi for (index_t i = x; i < M; i += x_size) { val = negate ? -in[base + i*sa] : in[base + i*sa]; -out[base + i*sa] = OP::Map((val - smax)/static_cast(temperature), ssum); +out[base + i*sa] = + (i < len) ? OType(OP::Map((val - smax)/static_cast(temperature), ssum)) : OType(0.0f); } } -template -inline void Softmax(Stream *s, DType *in, OType *out, -Shape shape, int axis, const double temperature) { - const int x_bits = 7; - const int x_size = 1 << x_bits; - index_t M = shape[axis]; - index_t N = shape.Size()/M; - Shape stride = calc_stride(shape); - Shape sshape = shape; - sshape[axis] = 1; +const int softmax_threads_per_block = 512; + +template +__global__ void softmax_stride1_compute_kernel(const DType *in, OType *out, IType *length, + const index_t M, const double temperature, + const int rows_per_block, const index_t total_rows) { + __shared__ AType scratch[softmax_threads_per_block]; + __shared__ LType persistent_storage[20 * 1024 / sizeof(LType)]; + const int warp_size = 32; + const int threads_per_row = softmax_threads_per_block / rows_per_block; + const int my_local_row = threadIdx.x / threads_per_row; + const int my_row = blockIdx.x * rows_per_block + my_local_row; + if (my_row >= total_rows) return; + const int my_id = threadIdx.x % threads_per_row; + const int entries_per_load = sizeof(LType)/sizeof(DType); + const index_t len = length == nullptr ? M : static_cast(length[my_row]); + // Due to usage of MSHADOW_TYPE_SWITCH macro we are generating + // kernels where sizeof(LType) may be less than sizeof(DType), + // resulting in entries_per_load being 0. + // This is not a valid combination and is being checked against + // in the launcher code. This switch here is just to silence + // the division by zero warning generated for such invalid cases. + const int row_length = entries_per_load > 0 ? M / entries_per_load : 0; + + const LType* in_aligned = reinterpret_cast(in); + size_t base = my_row * row_length; + + for (index_t i = my_id; i < row_length; i += threads_per_row) { +persistent_storage[my_local_row * row_length + i] = in_aligned[base + i]; + } + DType * row = reinterpret_cast(persistent_storage + my_local_row * row_length); + __syncthreads(); - softmax_compute_kernel -<<::GetStream(s)>>>( - in, out, M, axis, sshape, stride, temperature); - MSHADOW_CUDA_POST_KERNEL_CHECK(softmax_compute_kernel); -} + DType my_max_value; + red::maximum::SetInitValue(my_max_value); -template -__global__ void softmax_with_length_kernel(DType *in, OType *out, IType *length, - index_t M, int axis, Shape sshape, - Shape stride, const double temperature) { - const unsigned x_size = 1 << x_bits; - __shared__ AType smem[x_size]; - index_t sa = stride[axis]; - index_t base = unravel_dot(blockIdx.x, sshape, stride); - index_t x = threadIdx.x; - index_t len = static_cast(length[blockIdx.x]); - - red::maximum::SetInitValue(smem[x]); - for (index_t i = x; i < len; i += x_size) { -smem[x] = ::max(smem[x], negate ? -in[base + i*sa] : in[base + i*sa]); + for (index_t i = my_id; i < len; i += threads_per_row) { +my_max_value = ::max(my_max_value, negate ? -row[i] : row[i]); } + scratch[threadIdx.x] = my_max_value; __syncthreads(); - cuda::Reduce1D(smem); + for (int size = threads_per_row / 2; size >= warp_size; size /= 2) { +if (my_id < size) { + scratch[threadIdx.x] = ::max(scratch[threadIdx.x], scratch[threadIdx.x + size]); +} +__syncthreads(); + } + if (my_id < warp_size) { +AType my_value = warp_reduce(scratch[threadIdx.x], + [](AType x, AType y) { return ::max(x, y); }); +scratch[threadIdx.x] = my_value; + } __syncthreads(); - DType smax = smem[0]; + DType smax = scratch[threadIdx.x - threadIdx.x % threads_per_row]; __syncthreads(); - red::sum::SetInitValue(smem[x]); - DType val; - for (index_t i = x; i < len; i += x_size) { -val = negate ? -in[base + i*sa]:in[base + i*sa]; -smem[x] += static_cast(expf((val - smax) / static_cast(temperature))); + AType my_sum; + red::sum::SetInitValue(my_sum); + + for (index_t i = my_id; i < len; i += threads_per_row) { +const DType val = negate ? -row[i] : row[i]; +my_sum += static_cast(expf((val - smax) / static_cast(temperature))); } + scratch[threadIdx.x] = my_sum; __syncthreads(); - cuda::Reduce1D(smem); + for (int size = threads_per_row / 2;
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #15545: Softmax optimization for GPU
larroy commented on a change in pull request #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#discussion_r315953722 ## File path: src/operator/nn/softmax-inl.h ## @@ -313,71 +294,134 @@ __global__ void softmax_compute_kernel(DType *in, OType *out, index_t M, int axi for (index_t i = x; i < M; i += x_size) { val = negate ? -in[base + i*sa] : in[base + i*sa]; -out[base + i*sa] = OP::Map((val - smax)/static_cast(temperature), ssum); +out[base + i*sa] = + (i < len) ? OType(OP::Map((val - smax)/static_cast(temperature), ssum)) : OType(0.0f); } } -template -inline void Softmax(Stream *s, DType *in, OType *out, -Shape shape, int axis, const double temperature) { - const int x_bits = 7; - const int x_size = 1 << x_bits; - index_t M = shape[axis]; - index_t N = shape.Size()/M; - Shape stride = calc_stride(shape); - Shape sshape = shape; - sshape[axis] = 1; +const int softmax_threads_per_block = 512; + +template +__global__ void softmax_stride1_compute_kernel(const DType *in, OType *out, IType *length, + const index_t M, const double temperature, + const int rows_per_block, const index_t total_rows) { + __shared__ AType scratch[softmax_threads_per_block]; + __shared__ LType persistent_storage[20 * 1024 / sizeof(LType)]; + const int warp_size = 32; + const int threads_per_row = softmax_threads_per_block / rows_per_block; + const int my_local_row = threadIdx.x / threads_per_row; + const int my_row = blockIdx.x * rows_per_block + my_local_row; + if (my_row >= total_rows) return; + const int my_id = threadIdx.x % threads_per_row; + const int entries_per_load = sizeof(LType)/sizeof(DType); + const index_t len = length == nullptr ? M : static_cast(length[my_row]); + // Due to usage of MSHADOW_TYPE_SWITCH macro we are generating + // kernels where sizeof(LType) may be less than sizeof(DType), + // resulting in entries_per_load being 0. + // This is not a valid combination and is being checked against + // in the launcher code. This switch here is just to silence + // the division by zero warning generated for such invalid cases. + const int row_length = entries_per_load > 0 ? M / entries_per_load : 0; + + const LType* in_aligned = reinterpret_cast(in); + size_t base = my_row * row_length; + + for (index_t i = my_id; i < row_length; i += threads_per_row) { +persistent_storage[my_local_row * row_length + i] = in_aligned[base + i]; + } + DType * row = reinterpret_cast(persistent_storage + my_local_row * row_length); + __syncthreads(); - softmax_compute_kernel -<<::GetStream(s)>>>( - in, out, M, axis, sshape, stride, temperature); - MSHADOW_CUDA_POST_KERNEL_CHECK(softmax_compute_kernel); -} + DType my_max_value; Review comment: Can we add a comment or maybe a more descriptive name? is this the max of the stride? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #15545: Softmax optimization for GPU
larroy commented on a change in pull request #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#discussion_r315953443 ## File path: src/operator/nn/softmax-inl.h ## @@ -301,7 +282,7 @@ __global__ void softmax_compute_kernel(DType *in, OType *out, index_t M, int axi red::sum::SetInitValue(smem[x]); Review comment: Does have multiple max values affect numerical accuracy? Or are they reduced at some other point to a final max? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #15613: [Discussion] 1.5.1 Patch Release
apeforest commented on issue #15613: [Discussion] 1.5.1 Patch Release URL: https://github.com/apache/incubator-mxnet/issues/15613#issuecomment-523242083 This PR fixes a memory misalignment bug in topk operator introduced recently. Please add it to 1.5.1 patch release: https://github.com/apache/incubator-mxnet/pull/15948 Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15613: [Discussion] 1.5.1 Patch Release
samskalicky commented on issue #15613: [Discussion] 1.5.1 Patch Release URL: https://github.com/apache/incubator-mxnet/issues/15613#issuecomment-523240695 @TaoLv I'd like to pull in the following PRs, they are necessary fixes for some of my use-cases: https://github.com/apache/incubator-mxnet/pull/15245 https://github.com/apache/incubator-mxnet/pull/15917 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #15545: Softmax optimization for GPU
larroy commented on a change in pull request #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#discussion_r315952016 ## File path: src/operator/nn/softmax-inl.h ## @@ -188,89 +180,77 @@ struct log_softmax_bwd { } }; - template + typename AType, typename DType, typename OType, typename IType, int ndim> inline void SoftmaxGrad(Stream *s, OType *out, OType *ograd, -DType *igrad, Shape shape, int axis, -const DType temperature) { +DType *igrad, IType *length, Shape shape, +int axis, const DType temperature) { index_t M = shape[axis]; index_t N = shape.Size()/M; Shape stride = calc_stride(shape); Shape sshape = shape; sshape[axis] = 1; index_t sa = stride[axis]; - #pragma omp parallel for - for (index_t i = 0; i < N; ++i) { -index_t base = unravel_dot(i, sshape, stride); + if (length != nullptr) { +#pragma omp parallel for +for (index_t i = 0; i < N; ++i) { + index_t base = unravel_dot(i, sshape, stride); + index_t len = static_cast(length[i]); -AType sum = AType(0); -for (index_t j = 0; j < M; ++j) { - sum += OP1::Map(ograd[base + j*sa], out[base + j*sa]); -} - -// By default temperature is 1.0. -// Adding a branch here to save the CPU 'divide-by-1' computation at runtime -DType final_result; -if (temperature == 1.0) { - for (index_t j = 0; j < M; ++j) { -final_result = negate ? - -OP2::Map(ograd[base + j*sa], out[base + j*sa], sum) : - OP2::Map(ograd[base + j*sa], out[base + j*sa], sum); -KERNEL_ASSIGN(igrad[base + j*sa], Req, final_result); - } -} else { - for (index_t j = 0; j < M; ++j) { -final_result = negate ? - -OP2::Map(ograd[base + j*sa], out[base + j*sa], sum) / temperature : - OP2::Map(ograd[base + j*sa], out[base + j*sa], sum) / temperature; -KERNEL_ASSIGN(igrad[base + j*sa], Req, final_result); + AType sum = AType(0); + for (index_t j = 0; j < len; ++j) { +sum += OP1::Map(ograd[base + j*sa], out[base + j*sa]); } -} - } -} -template -inline void SoftmaxWithLengthGrad(Stream *s, OType *out, OType *ograd, - DType *igrad, IType *length, Shape shape, - int axis, const DType temperature) { - index_t M = shape[axis]; - index_t N = shape.Size()/M; - Shape stride = calc_stride(shape); - Shape sshape = shape; - sshape[axis] = 1; - index_t sa = stride[axis]; - - #pragma omp parallel for - for (index_t i = 0; i < N; ++i) { -index_t base = unravel_dot(i, sshape, stride); -index_t len = static_cast(length[i]); - -AType sum = AType(0); -for (index_t j = 0; j < len; ++j) { - sum += OP1::Map(ograd[base + j*sa], out[base + j*sa]); + // By default temperature is 1.0. + // Adding a branch here to save the CPU 'divide-by-1' computation at runtime + DType final_result; + if (temperature == 1.0) { Review comment: is this micro-opt really making things better? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523239737 @marcoabreu I think you can close this empty PR, since the conversation here is not constructive and there's no code change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy edited a comment on issue #15951: Revert "Numpy-compatible concatenate upstream"
larroy edited a comment on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523238480 @haojin2 I think you are confusing me with @marcoabreu. I didn't make the change you mention. My handle is @larroy. Please followup with @marcoabreu on this topic or through the dev@ list. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523238480 @haojin2 I think you are confusing me with @marcoabreu. I didn't make the change you mention. My handle is @larroy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #15545: Softmax optimization for GPU
ptrendx commented on a change in pull request #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#discussion_r315948857 ## File path: src/operator/nn/softmax-inl.h ## @@ -218,6 +219,157 @@ __global__ void softmax_compute_kernel(DType *in, OType *out, index_t M, int axi } } +const int softmax_threads_per_block = 512; + +template +__device__ inline T warp_reduce(T value, OP redfun) { + value = redfun(value, __shfl_down_sync(0x, value, 16)); + value = redfun(value, __shfl_down_sync(0x, value, 8)); + value = redfun(value, __shfl_down_sync(0x, value, 4)); + value = redfun(value, __shfl_down_sync(0x, value, 2)); + value = redfun(value, __shfl_down_sync(0x, value, 1)); + return value; +} + +template +__device__ inline mshadow::half::half_t warp_reduce(mshadow::half::half_t value, OP redfun) { + float v = static_cast(value); + v = redfun(v, __shfl_down_sync(0x, v, 16)); + v = redfun(v, __shfl_down_sync(0x, v, 8)); + v = redfun(v, __shfl_down_sync(0x, v, 4)); + v = redfun(v, __shfl_down_sync(0x, v, 2)); + v = redfun(v, __shfl_down_sync(0x, v, 1)); + return mshadow::half::half_t(v); +} + +template +__global__ void softmax_compute_kernel2(const DType *in, OType *out, const index_t M, + const double temperature, int rows_per_block, + const index_t total_rows) { + __shared__ AType scratch[softmax_threads_per_block]; + __shared__ LType persistent_storage[20*1024 / sizeof(LType)]; + const int warp_size = 32; + const int threads_per_row = softmax_threads_per_block / rows_per_block; + const int my_local_row = threadIdx.x / threads_per_row; + const int my_row = blockIdx.x * rows_per_block + my_local_row; + if (my_row >= total_rows) return; + const int my_id = threadIdx.x % threads_per_row; + const int entries_per_load = sizeof(LType)/sizeof(DType); + // Due to usage of MSHADOW_TYPE_SWITCH macro we are generating + // kernels where sizeof(LType) may be less than sizeof(DType), + // resulting in entries_per_load being 0. + // This is not a valid combination and is being checked against + // in the launcher code. This switch here is just to silence + // the division by zero warning generated for such invalid cases. + const int row_length = entries_per_load > 0 ? M / entries_per_load : 0; + + const LType * in_aligned = reinterpret_cast(in); + size_t base = my_row * row_length; + + for (index_t i = my_id; i < row_length; i += threads_per_row) { +persistent_storage[my_local_row * row_length + i] = in_aligned[base + i]; + } + DType * row = reinterpret_cast(persistent_storage + my_local_row * row_length); + __syncthreads(); + + DType my_max_value; + red::maximum::SetInitValue(my_max_value); + + for (index_t i = my_id; i < M; i += threads_per_row) { +my_max_value = ::max(my_max_value, negate ? -row[i] : row[i]); + } + scratch[threadIdx.x] = my_max_value; + __syncthreads(); + for (int size = threads_per_row / 2; size >= warp_size; size /= 2) { +if (my_id < size) { + scratch[threadIdx.x] = ::max(scratch[threadIdx.x], scratch[threadIdx.x + size]); +} +__syncthreads(); + } + if (my_id < warp_size) { +AType my_value = warp_reduce(scratch[threadIdx.x], + [](AType x, AType y) { return ::max(x, y); }); +scratch[threadIdx.x] = my_value; + } + __syncthreads(); + DType smax = scratch[threadIdx.x - threadIdx.x % threads_per_row]; + __syncthreads(); + + AType my_sum; + red::sum::SetInitValue(my_sum); + + for (index_t i = my_id; i < M; i += threads_per_row) { +const DType val = negate ? -row[i] : row[i]; +my_sum += static_cast(expf((val - smax) / static_cast(temperature))); + } + scratch[threadIdx.x] = my_sum; + __syncthreads(); + for (int size = threads_per_row / 2; size >= warp_size; size /= 2) { +if (my_id < size) { + scratch[threadIdx.x] += scratch[threadIdx.x + size]; +} +__syncthreads(); + } + if (my_id < warp_size) { +AType my_value = warp_reduce(scratch[threadIdx.x], + [](AType x, AType y) { return x + y;}); +scratch[threadIdx.x] = my_value; + } + __syncthreads(); + + AType ssum = scratch[threadIdx.x - threadIdx.x % threads_per_row]; + __syncthreads(); + + for (index_t i = my_id; i < M; i += threads_per_row) { +const DType val = negate ? -row[i] : row[i]; +row[i] = OP::Map((val - smax)/static_cast(temperature), ssum); + } + __syncthreads(); + + LType * out_aligned = reinterpret_cast(out); + + for (index_t i = my_id; i < row_length; i += threads_per_row) { +out_aligned[base + i] = persistent_storage[my_local_row * row_length + i]; + } +} + +namespace { + +int get_load_type(size_t N) { + if (N % 8 == 0) { +return kFloat64; + } else if (N % 4 == 0) { +return kFloat32; + } else if (N % 2 ==
[GitHub] [incubator-mxnet] eric-haibin-lin edited a comment on issue #15930: Fix dtype inference in arange_like operator
eric-haibin-lin edited a comment on issue #15930: Fix dtype inference in arange_like operator URL: https://github.com/apache/incubator-mxnet/pull/15930#issuecomment-523225836 Yes. Could you also check the forwward output with [0, 1, 2,.. ] etc? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] eric-haibin-lin commented on issue #15930: Fix dtype inference in arange_like operator
eric-haibin-lin commented on issue #15930: Fix dtype inference in arange_like operator URL: https://github.com/apache/incubator-mxnet/pull/15930#issuecomment-523225836 Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] eric-haibin-lin commented on a change in pull request #15545: Softmax optimization for GPU
eric-haibin-lin commented on a change in pull request #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#discussion_r315936560 ## File path: src/operator/nn/softmax-inl.h ## @@ -218,6 +219,157 @@ __global__ void softmax_compute_kernel(DType *in, OType *out, index_t M, int axi } } +const int softmax_threads_per_block = 512; + +template +__device__ inline T warp_reduce(T value, OP redfun) { + value = redfun(value, __shfl_down_sync(0x, value, 16)); + value = redfun(value, __shfl_down_sync(0x, value, 8)); + value = redfun(value, __shfl_down_sync(0x, value, 4)); + value = redfun(value, __shfl_down_sync(0x, value, 2)); + value = redfun(value, __shfl_down_sync(0x, value, 1)); + return value; +} + +template +__device__ inline mshadow::half::half_t warp_reduce(mshadow::half::half_t value, OP redfun) { + float v = static_cast(value); + v = redfun(v, __shfl_down_sync(0x, v, 16)); + v = redfun(v, __shfl_down_sync(0x, v, 8)); + v = redfun(v, __shfl_down_sync(0x, v, 4)); + v = redfun(v, __shfl_down_sync(0x, v, 2)); + v = redfun(v, __shfl_down_sync(0x, v, 1)); + return mshadow::half::half_t(v); +} + +template +__global__ void softmax_compute_kernel2(const DType *in, OType *out, const index_t M, + const double temperature, int rows_per_block, + const index_t total_rows) { + __shared__ AType scratch[softmax_threads_per_block]; + __shared__ LType persistent_storage[20*1024 / sizeof(LType)]; + const int warp_size = 32; + const int threads_per_row = softmax_threads_per_block / rows_per_block; + const int my_local_row = threadIdx.x / threads_per_row; + const int my_row = blockIdx.x * rows_per_block + my_local_row; + if (my_row >= total_rows) return; + const int my_id = threadIdx.x % threads_per_row; + const int entries_per_load = sizeof(LType)/sizeof(DType); + // Due to usage of MSHADOW_TYPE_SWITCH macro we are generating + // kernels where sizeof(LType) may be less than sizeof(DType), + // resulting in entries_per_load being 0. + // This is not a valid combination and is being checked against + // in the launcher code. This switch here is just to silence + // the division by zero warning generated for such invalid cases. + const int row_length = entries_per_load > 0 ? M / entries_per_load : 0; + + const LType * in_aligned = reinterpret_cast(in); + size_t base = my_row * row_length; + + for (index_t i = my_id; i < row_length; i += threads_per_row) { +persistent_storage[my_local_row * row_length + i] = in_aligned[base + i]; + } + DType * row = reinterpret_cast(persistent_storage + my_local_row * row_length); + __syncthreads(); + + DType my_max_value; + red::maximum::SetInitValue(my_max_value); + + for (index_t i = my_id; i < M; i += threads_per_row) { +my_max_value = ::max(my_max_value, negate ? -row[i] : row[i]); + } + scratch[threadIdx.x] = my_max_value; + __syncthreads(); + for (int size = threads_per_row / 2; size >= warp_size; size /= 2) { +if (my_id < size) { + scratch[threadIdx.x] = ::max(scratch[threadIdx.x], scratch[threadIdx.x + size]); +} +__syncthreads(); + } + if (my_id < warp_size) { +AType my_value = warp_reduce(scratch[threadIdx.x], + [](AType x, AType y) { return ::max(x, y); }); +scratch[threadIdx.x] = my_value; + } + __syncthreads(); + DType smax = scratch[threadIdx.x - threadIdx.x % threads_per_row]; + __syncthreads(); + + AType my_sum; + red::sum::SetInitValue(my_sum); + + for (index_t i = my_id; i < M; i += threads_per_row) { +const DType val = negate ? -row[i] : row[i]; +my_sum += static_cast(expf((val - smax) / static_cast(temperature))); + } + scratch[threadIdx.x] = my_sum; + __syncthreads(); + for (int size = threads_per_row / 2; size >= warp_size; size /= 2) { +if (my_id < size) { + scratch[threadIdx.x] += scratch[threadIdx.x + size]; +} +__syncthreads(); + } + if (my_id < warp_size) { +AType my_value = warp_reduce(scratch[threadIdx.x], + [](AType x, AType y) { return x + y;}); +scratch[threadIdx.x] = my_value; + } + __syncthreads(); + + AType ssum = scratch[threadIdx.x - threadIdx.x % threads_per_row]; + __syncthreads(); + + for (index_t i = my_id; i < M; i += threads_per_row) { +const DType val = negate ? -row[i] : row[i]; +row[i] = OP::Map((val - smax)/static_cast(temperature), ssum); + } + __syncthreads(); + + LType * out_aligned = reinterpret_cast(out); + + for (index_t i = my_id; i < row_length; i += threads_per_row) { +out_aligned[base + i] = persistent_storage[my_local_row * row_length + i]; + } +} + +namespace { + +int get_load_type(size_t N) { + if (N % 8 == 0) { +return kFloat64; + } else if (N % 4 == 0) { +return kFloat32; + } else if
[GitHub] [incubator-mxnet] larroy closed pull request #14601: [WIP] Add test for gemm overflow.
larroy closed pull request #14601: [WIP] Add test for gemm overflow. URL: https://github.com/apache/incubator-mxnet/pull/14601 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523207697 @larroy Please don't avoid my question, in #15952 you claimed that I shall tag you on any changes to files under CI folder, so are you actively working on the CI? Would you help with root causing the CI issues? If you have increased the CI timeout once before [from 2:00 to 3:00](https://github.com/apache/incubator-mxnet/pull/13818/files) because you "restarted 4 PRs on the issue", have you investigated the root cause back then? TBH I've restarted more than 4 PRs on this issue, does that justify my change here then? Please provide answers to those questions, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ZhennanQin commented on a change in pull request #15950: [MKLDNN]Support fullyconnected and element-wise ops fusion
ZhennanQin commented on a change in pull request #15950: [MKLDNN]Support fullyconnected and element-wise ops fusion URL: https://github.com/apache/incubator-mxnet/pull/15950#discussion_r315916256 ## File path: src/operator/subgraph/mkldnn/mkldnn_fc_property.h ## @@ -68,44 +73,70 @@ class SgMKLDNNFCSelector : public SubgraphSelector { } bool SelectOutput(const nnvm::Node , const nnvm::Node _node) override { -if (status == kFail || status == kSuccess || new_node.is_variable()) +if (status_ == kFail || status_ == kSuccess || new_node.is_variable()) return false; // If n isn't the last matched node, then we encoutered a internal // branch, we should pop out the node behind n and stop fusion. -if (matched_list.back() != ) { - if (std::find(matched_list.begin(), matched_list.end(), ) != -matched_list.end()) { -while (matched_list.back() != ) { - matched_list.pop_back(); +if (matched_list_.back() != ) { + if (std::find(matched_list_.begin(), matched_list_.end(), ) != +matched_list_.end()) { +while (matched_list_.back() != ) { + matched_list_.pop_back(); } } - status = kSuccess; + status_ = kSuccess; return false; } -switch (status) { +switch (status_) { case kStart: -if (new_node.op() == Op::Get("Activation") && -new_node.attrs.dict.at("act_type") == "relu") { - matched_list.push_back(_node); - status = kSuccess; +// Currently, For INT8 FC fusion, only supports relu/bounded_relu(clip)/abs. +if (new_node.op() == Op::Get("Activation")) { + const ActivationParam = nnvm::get(new_node.attrs.parsed); + if ((quantized_ && SupportQuantizedMKLDNNAct(param)) || + (!quantized_ && SupportMKLDNNAct(param))) { +matched_list_.push_back(_node); +status_ = kSuccess; +return true; + } +} +if (!quantized_ && (new_node.op() == Op::Get("square") || +new_node.op() == Op::Get("sqrt") || +new_node.op() == Op::Get("exp"))) { + matched_list_.push_back(_node); + status_ = kSuccess; + return true; +} +if (new_node.op() == Op::Get("abs")) { + matched_list_.push_back(_node); + status_ = kSuccess; return true; } +if (new_node.op() == Op::Get("clip")) { + const ClipParam = nnvm::get(new_node.attrs.parsed); + if (param.a_min == 0.f && param.a_max == 1.0f) { Review comment: why a_max have to be 1.0f? I think it's not necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523204837 @marcoabreu Maybe let's take a look at one unix-cpu [build](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-15887/5/pipeline/279) triggered 15 hours ago based on the latest master (cause I performed rebase). BlueOcean shows that the whole test finished using about 2hr55min (< 180 mins obviously), now you can see that the CI infra itself could also has some ups and downs too, cause 6 days ago we sometimes even had difficulties getting CI machines up (which was why this increase was performed). With those being said, I think of your timeout more of a safe net, which means you should also account for the flaky CI infra performance when you set it. Again as I said the total time for CI is only going to increase, cause we're only adding more to it. Putting a hard limit on the total time does not work and make sense, I think you should pay more attention to the time consumed by a single test, so that the "40-minute long unit test" you mentioned somewhere else could be detected and rejected by CI, not a 3-second one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated (cd397a3 -> 308e4ac)
This is an automated email from the ASF dual-hosted git repository. apeforest pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. from cd397a3 Benchmark doc fix (#15769) add 308e4ac Adding tests to verify support for Large Tensors in additional Ops along with new C_Apis supporting 64bit indexing (#15895) No new revisions were added by this update. Summary of changes: include/mxnet/c_api.h | 10 ++ python/mxnet/ndarray/ndarray.py | 32 ++-- src/c_api/c_api.cc| 33 +++- src/operator/tensor/dot-inl.h | 2 +- tests/nightly/test_large_array.py | 316 +- 5 files changed, 378 insertions(+), 15 deletions(-)
[GitHub] [incubator-mxnet] apeforest merged pull request #15895: Adding tests and C APIs for Large Tensors
apeforest merged pull request #15895: Adding tests and C APIs for Large Tensors URL: https://github.com/apache/incubator-mxnet/pull/15895 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #15929: ./build.py -p ubuntu_tpu_tensorrt fails with error
larroy commented on issue #15929: ./build.py -p ubuntu_tpu_tensorrt fails with error URL: https://github.com/apache/incubator-mxnet/issues/15929#issuecomment-523203768 Hi @arsdragonfly You have a dirty working directory, you should remove the build folder or really clean the repository. See dev_menu.py option 10, be careful because it will discard your local modifications. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #15953: Add Median, p50, p99 to python profiler
apeforest commented on issue #15953: Add Median,p50,p99 to python profiler URL: https://github.com/apache/incubator-mxnet/pull/15953#issuecomment-523200903 Thanks for the contribution. How to enable the percentile output for users? Or it comes as default? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15953: Add Median, p50, p99 to python profiler
apeforest commented on a change in pull request #15953: Add Median,p50,p99 to python profiler URL: https://github.com/apache/incubator-mxnet/pull/15953#discussion_r315909903 ## File path: benchmark/opperf/utils/profiler_utils.py ## @@ -228,10 +229,16 @@ def python_profile(func): @functools.wraps(func) def python_profile_it(*args, **kwargs): -start_time = time.perf_counter()# 1 -res = func(*args, **kwargs) -end_time = time.perf_counter() # 2 -run_time = end_time - start_time# 3 +runs = args[1] +modified_args = (args[0], 1) Review comment: what is this for? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy edited a comment on issue #15951: Revert "Numpy-compatible concatenate upstream"
larroy edited a comment on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523200224 @haojin2 I just helped Marco revert the PR since due the way it was merged it was not trivial, please don't kill the messenger. Increasing CI timeout should be at least done in a separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523200224 @haojin2 I just helped Marco revert the PR since due the way it was merged it was not trivial, please don't kill the messenger. Increasing CI timeout should be done in a separate PR and discussed before. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #15895: Adding tests and C APIs for Large Tensors
apeforest commented on issue #15895: Adding tests and C APIs for Large Tensors URL: https://github.com/apache/incubator-mxnet/pull/15895#issuecomment-523197438 @ChaiBapchya could you please review? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] marcoabreu commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
marcoabreu commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523195164 Hi Hao, this is not a personal affront towards you or your PR and I understand your frustration. The maximum time acts as a forcing function since within the past 9 months, the test duration more than doubled. Thus, increasing the timeout is not an option since it worsens the contributor experience further and further - of course I understand that running into timeouts also does. But that's exactly the forcing function we'd like to achieve until the community improves the situation. So it's quite unfortunate that it did hit your PR and that I only caught it now after the PR was merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 9724c7a Bump the publish timestamp. 9724c7a is described below commit 9724c7af8b5c489b832baf6a484276071a83487f Author: mxnet-ci AuthorDate: Tue Aug 20 21:00:41 2019 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..c133180 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Tue Aug 20 21:00:41 UTC 2019
[GitHub] [incubator-mxnet] reminisce commented on a change in pull request #15938: Tvm broadcast backward
reminisce commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315894807 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); +for (int i = 0; i < ndim; ++i) { Review comment: If my understanding is correct, there seems to be an assumption that `ograd.ndim = igrad.ndim`, which is not necessarily true. I think you need to prepend axes before `igrad` if `igrad.ndim < ograd.ndim` and then use the logic here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] haojin2 opened a new pull request #15954: Revert unix-cpu timeout increase
haojin2 opened a new pull request #15954: Revert unix-cpu timeout increase URL: https://github.com/apache/incubator-mxnet/pull/15954 ## Description ## As title. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## @larroy @marcoabreu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] PatriciaXiao commented on issue #15629: DataLoader Error using Multi processing
PatriciaXiao commented on issue #15629: DataLoader Error using Multi processing URL: https://github.com/apache/incubator-mxnet/issues/15629#issuecomment-523183266 It isn't always like this: normally it won't happen, only after I tried to upgrade Python to 3.7 on server it becomes like this. Also, my laptop has local Python3.7 installed but nothing is wrong here, simply slow (without GPU). For now I couldn't reproduce the error anymore, since I terminated that buggy environment to make my life easier. But it is for sure that the above-mentioned code isn't causing any error on my local mac environment, neither on the other instances I've launched. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] haojin2 edited a comment on issue #15951: Revert "Numpy-compatible concatenate upstream"
haojin2 edited a comment on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523179089 @larroy So if you're really that actively in charge of CI stuff then why don't you root cause it while this very PR you guys are trying to revert only adds ~3 extra seconds to the whole unit test? Previous CI time limit increases were not caused by this commit, so simply reverting this PR does not help us find the root cause, and only blocks more people who depend on it (not to mention most of them are summer interns who are about to leave and desperately want to submit their PRs). On the other hand, why were the previous increases of CI timeout reasonable while mine is not? I actually expect this kind of increase to happen as we add more things to MXNet, each of which requires one or more unit tests (which is the right thing to do). If you guys did already find a unit test that costs more than 40 mins, please deal with it instead of targeting at my PR. Actually, I'm already aware of time costs of unit tests, and every single one of my unit tests try to reuse the generated mock test data while covering the minimal necessary test cases. Most of the tests in my PRs, including the one you guys are trying to revert here, cost less than 5 seconds to finish. I don't think I'm the right one to be blamed on the status quo here, neither should my commit be. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523179089 @larroy So if you're really that actively in charge of CI stuff then why don't you root cause it while this very PR you guys are trying to revert only adds ~3 extra seconds to the whole unit test? Previous CI time limit increases were not caused by this commit, so simply reverting this PR does not help us find the root cause, and only blocks more people who depend on it (not to mention most of them are summer interns who are about to leave and desperately want to submit their PRs). On the other hand, why were the previous increases of CI timeout reasonable while mine is not? I actually expect this kind of increase to happen as we add more things to MXNet, each of which requires one or more unit tests (which is the right thing to do). If you guys did already find a unit test that costs more than 40 mins, please deal with it instead of targeting at my PR. Actually, I'm already aware of time costs of unit tests, and every single one of my unit tests try to reuse the generated mock test data while covering the minimal necessary test cases. Most of the tests in my PRs, including the one you guys are trying to revert here, cost less than 5 seconds to finish. I don't think I'm the right one to be blamed on the status quo here, neither should my commit be. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] haojin2 closed pull request #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts.
haojin2 closed pull request #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts. URL: https://github.com/apache/incubator-mxnet/pull/15952 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] haojin2 commented on issue #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts.
haojin2 commented on issue #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts. URL: https://github.com/apache/incubator-mxnet/pull/15952#issuecomment-523172593 I'll revert the CI time limit in a separate PR myself, closing this one now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ChaiBapchya opened a new pull request #15953: Add Median, p50, p99 to python profiler
ChaiBapchya opened a new pull request #15953: Add Median,p50,p99 to python profiler URL: https://github.com/apache/incubator-mxnet/pull/15953 ## Description ## Profile more metrics (than just average; also fix incorrect avg calculation) ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523171026 @haojin2 increase on CI limit should be discussed first and be done in a separate PR. We have gone from 1:20 to 3h, we should find the root cause and not just increase the limit without bounds. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts.
larroy commented on issue #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts. URL: https://github.com/apache/incubator-mxnet/pull/15952#issuecomment-523170610 Is not really a duplicate as the PR you linked has an empty patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts.
larroy commented on issue #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts. URL: https://github.com/apache/incubator-mxnet/pull/15952#issuecomment-523170205 I couldn't revert just one, as the other builds on top and creates a conflict while reverting. Feel free to revert yourself and close this PR if you so prefer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
larroy commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523170377 This PR has an empty patch as they were several empty commits. @marcoabreu I think you should close this one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] haojin2 commented on issue #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts.
haojin2 commented on issue #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts. URL: https://github.com/apache/incubator-mxnet/pull/15952#issuecomment-523164907 Please let me know why #15842 also has to be reverted, seems like it has nothing to do with CI? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy opened a new pull request #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts.
larroy opened a new pull request #15952: Revert #15842 and #15894. These PRs should not modify CI timeouts. URL: https://github.com/apache/incubator-mxnet/pull/15952 ## Description ## These PRs introduced an unrelated increase of the CI time limit. This should be in a separate PR and discussed with the community and CI maintainers before. @haojin2 Please reintroduce your PRs without unrelated changes to the CI infrastructure and tag @marcoabreu and me on changes to CI. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new f73a540 Bump the publish timestamp. f73a540 is described below commit f73a540628faff449b4e93dfd558f6ad210c4f73 Author: mxnet-ci AuthorDate: Tue Aug 20 19:26:51 2019 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..43a7340 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Tue Aug 20 19:26:51 UTC 2019
[GitHub] [incubator-mxnet] haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
haojin2 commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523142933 I don't think that's a reasonable assumption. Firstly this PR passed CI several times already (since I rebase with master daily) until some day it gets timeout all the time. I remember for some reason there was a unit test in test_random (probably test_shuffle if I remember correctly) that was taking a very long time so CI was stuck on it. Obviously that was not related with my PR, which was the reason why I'm increasing the CI time limit since I want to unblock several summer interns on their tasks. On the other hand, the only unit test introduced in my PR only takes less than 3 seconds, so I don't think my PR is the cause for CI time regression: ``` nosetests tests/python/unittest/test_numpy_op.py:test_np_concat . -- Ran 1 test in 2.998s OK ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] reminisce commented on issue #15951: Revert "Numpy-compatible concatenate upstream"
reminisce commented on issue #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951#issuecomment-523141806 @marcoabreu Could you elaborate what urges to revert the PR? I am expecting more time duration since we have been adding a lot of operators and unit tests. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch revert-15894-np_concatenate_master created (now 3afc7c0)
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a change to branch revert-15894-np_concatenate_master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. at 3afc7c0 Revert "Numpy-compatible concatenate upstream (#15894)" No new revisions were added by this update.
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #15894: Numpy-compatible concatenate upstream
larroy commented on a change in pull request #15894: Numpy-compatible concatenate upstream URL: https://github.com/apache/incubator-mxnet/pull/15894#discussion_r315836185 ## File path: ci/jenkins/Jenkinsfile_unix_cpu ## @@ -21,7 +21,7 @@ // See documents at https://jenkins.io/doc/book/pipeline/jenkinsfile/ // timeout in minutes -max_time = 180 +max_time = 240 Review comment: why wasn't this discussed before? Sneaking this change here is not ok, this should go in a separate PR and clearly marked as such. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] marcoabreu opened a new pull request #15951: Revert "Numpy-compatible concatenate upstream"
marcoabreu opened a new pull request #15951: Revert "Numpy-compatible concatenate upstream" URL: https://github.com/apache/incubator-mxnet/pull/15951 Reverts apache/incubator-mxnet#15894 This PR increased the CI timeout. I assume that it did not pass without increasing the limit, thus I'm reverting the PR with the request to improve the test duration. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15948: Fix a memory misalignment in topk operator
apeforest commented on a change in pull request #15948: Fix a memory misalignment in topk operator URL: https://github.com/apache/incubator-mxnet/pull/15948#discussion_r315824638 ## File path: 3rdparty/mshadow/mshadow/tensor.h ## @@ -69,15 +69,15 @@ struct Shape { * \param idx dimension index * \return the corresponding dimension size */ - MSHADOW_XINLINE index_t [](index_t idx) { + MSHADOW_XINLINE index_t [](int idx) { Review comment: This is not directly fixing this bug. However, while checking the tensor struct, I noticed this data type should be int because it is indexing the dimension (which is set to int) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15948: Fix a memory misalignment in topk operator
apeforest commented on a change in pull request #15948: Fix a memory misalignment in topk operator URL: https://github.com/apache/incubator-mxnet/pull/15948#discussion_r315824135 ## File path: src/operator/tensor/ordering_op-inl.h ## @@ -414,30 +414,23 @@ void TopKImpl(const RunContext , << element_num << ", but the selected index_t can only represent " << mxnet::common::MaxIntegerValue() << " elements"; Tensor dat = src.FlatTo3D(axis, axis, s); - size_t temp_size = 0; - // Temp space needed by the gpu-based full sorts. - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - // Additional temp space for gpu full sorts for batch ids. - temp_size += PadBytes(sizeof(index_t) * src.Size(), alignment); - // Temp space for cpu sorts. - temp_size = std::max(temp_size, static_cast(sizeof(DType) * src.Size())); + // Temp space needed by the full sorts. + size_t temp_size = std::max( + mxnet::op::SortByKeyWorkspaceSize(src.Size()), + mxnet::op::SortByKeyWorkspaceSize(src.Size())); + size_t workspace_size = temp_size + PadBytes(sizeof(DType) * src.Size(), alignment) + PadBytes(sizeof(index_t) * src.Size(), alignment); if (param.ret_typ == topk_enum::kReturnMask) { -workspace_size += PadBytes(sizeof(int) * batch_size * k, alignment); +workspace_size += PadBytes(sizeof(index_t) * batch_size * k, alignment); } workspace = resource.get_space_typed(Shape1(workspace_size), s); char* workspace_curr_ptr = workspace.dptr_; sorted_dat = Tensor(reinterpret_cast(workspace_curr_ptr), - Shape1(src.Size()), s); // contain sorted dat + Shape1(src.Size()), s); // contain sorted dat Review comment: it's 4 space continuation indent set by Google Style Guide This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest edited a comment on issue #15703: Storage manager / memory usage regression in v1.5
apeforest edited a comment on issue #15703: Storage manager / memory usage regression in v1.5 URL: https://github.com/apache/incubator-mxnet/issues/15703#issuecomment-523119231 @TaoLv This is not an issue (bug per se) but limitation of int32_t data types we used in MXNet. As I pointed to the line https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/ordering_op-inl.h#L434 the workspace is created using a 1D mshadow::Shape object, whose length is bounded by `index_t` which is int32_t by default. When the workspace size required is larger than 2^31, there will be overflow and causing OOM. @leezu #15948 is a partial fix because it only fixed the memory misalignment but not the OOM caused by int overflow. To really fix this issue, we need to support int64_t in mxnet by default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest edited a comment on issue #15703: Storage manager / memory usage regression in v1.5
apeforest edited a comment on issue #15703: Storage manager / memory usage regression in v1.5 URL: https://github.com/apache/incubator-mxnet/issues/15703#issuecomment-523119231 @TaoLv This is not an issue (bug per se) but limitation of int32_t data types we used in MXNet. As I pointed to the line https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/ordering_op-inl.h#L434 the workspace is created using a 1D mshadow::Shape object, whose length is bounded by `index_t` which is int32_t by default. When the workspace size required is larger than 2^31, there will be overflow and causing OOM. #15948 is a partial fix because it only fixed the memory misalignment but not the OOM caused by int overflow. To really fix this issue, we need to support int64_t in mxnet by default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #15703: Storage manager / memory usage regression in v1.5
apeforest commented on issue #15703: Storage manager / memory usage regression in v1.5 URL: https://github.com/apache/incubator-mxnet/issues/15703#issuecomment-523119231 @TaoLv This is not an issue (bug per se) but limitation of int32_t data types we used in MXNet. As I pointed to the line https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/ordering_op-inl.h#L434 the workspace is created using a 1D mshadow::Shape object, whose length is bounded by `index_t` which is int32_t by default. When the workspace size required is larger than 2^31, there will be overflow and causing OOM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ChaiBapchya commented on a change in pull request #15772: Add symbol api for randn and fix shape issue for randn ndarray and symbol api
ChaiBapchya commented on a change in pull request #15772: Add symbol api for randn and fix shape issue for randn ndarray and symbol api URL: https://github.com/apache/incubator-mxnet/pull/15772#discussion_r315816843 ## File path: mxnet_py3/include/python3.6m ## @@ -0,0 +1 @@ +/Users/bapac/anaconda3/include/python3.6m Review comment: bummer! removing it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #15899: Typedef cleanup
access2rohit commented on a change in pull request #15899: Typedef cleanup URL: https://github.com/apache/incubator-mxnet/pull/15899#discussion_r315794983 ## File path: include/mxnet/c_predict_api.h ## @@ -43,8 +43,6 @@ extern "C" { /*! \brief manually define unsigned int */ typedef uint32_t mx_uint; Review comment: java, C++, R, Scala This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #15899: Typedef cleanup
access2rohit commented on a change in pull request #15899: Typedef cleanup URL: https://github.com/apache/incubator-mxnet/pull/15899#discussion_r315795007 ## File path: include/mxnet/c_predict_api.h ## @@ -43,8 +43,6 @@ extern "C" { /*! \brief manually define unsigned int */ typedef uint32_t mx_uint; -/*! \brief manually define 64-bit int */ -typedef int64_t mx_int64; /*! \brief manually define float */ typedef float mx_float; Review comment: java, C++, R, Scala This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] kshitij12345 commented on a change in pull request #15772: Add symbol api for randn and fix shape issue for randn ndarray and symbol api
kshitij12345 commented on a change in pull request #15772: Add symbol api for randn and fix shape issue for randn ndarray and symbol api URL: https://github.com/apache/incubator-mxnet/pull/15772#discussion_r315790759 ## File path: mxnet_py3/include/python3.6m ## @@ -0,0 +1 @@ +/Users/bapac/anaconda3/include/python3.6m Review comment: Stray file? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] kshitij12345 commented on issue #15474: [MXNET-978] Higher Order Gradient Support `sqrt`, `cbrt`.
kshitij12345 commented on issue #15474: [MXNET-978] Higher Order Gradient Support `sqrt`, `cbrt`. URL: https://github.com/apache/incubator-mxnet/pull/15474#issuecomment-523094624 @roywei Have retriggered and successfully run. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] kshitij12345 commented on issue #15746: [MXNET-978] Higher Order Gradient Support `clip`, `dropout`.
kshitij12345 commented on issue #15746: [MXNET-978] Higher Order Gradient Support `clip`, `dropout`. URL: https://github.com/apache/incubator-mxnet/pull/15746#issuecomment-523094418 @roywei @apeforest Have retriggered and the tests have ran successfully. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] leezu commented on issue #12795: Deserialization problem with gluon `ValueError: There are multiple outputs with name ...`
leezu commented on issue #12795: Deserialization problem with gluon `ValueError: There are multiple outputs with name ...` URL: https://github.com/apache/incubator-mxnet/issues/12795#issuecomment-523090915 The "Minimum reproducible example" works for me on 1.5 and current master. This can probably be closed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #15948: Fix a memory misalignment in topk operator
access2rohit commented on a change in pull request #15948: Fix a memory misalignment in topk operator URL: https://github.com/apache/incubator-mxnet/pull/15948#discussion_r315777314 ## File path: src/operator/tensor/ordering_op-inl.h ## @@ -414,30 +414,23 @@ void TopKImpl(const RunContext , << element_num << ", but the selected index_t can only represent " << mxnet::common::MaxIntegerValue() << " elements"; Tensor dat = src.FlatTo3D(axis, axis, s); - size_t temp_size = 0; - // Temp space needed by the gpu-based full sorts. - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - // Additional temp space for gpu full sorts for batch ids. - temp_size += PadBytes(sizeof(index_t) * src.Size(), alignment); - // Temp space for cpu sorts. - temp_size = std::max(temp_size, static_cast(sizeof(DType) * src.Size())); + // Temp space needed by the full sorts. + size_t temp_size = std::max( + mxnet::op::SortByKeyWorkspaceSize(src.Size()), + mxnet::op::SortByKeyWorkspaceSize(src.Size())); + size_t workspace_size = temp_size + PadBytes(sizeof(DType) * src.Size(), alignment) + PadBytes(sizeof(index_t) * src.Size(), alignment); if (param.ret_typ == topk_enum::kReturnMask) { -workspace_size += PadBytes(sizeof(int) * batch_size * k, alignment); +workspace_size += PadBytes(sizeof(index_t) * batch_size * k, alignment); } workspace = resource.get_space_typed(Shape1(workspace_size), s); char* workspace_curr_ptr = workspace.dptr_; sorted_dat = Tensor(reinterpret_cast(workspace_curr_ptr), - Shape1(src.Size()), s); // contain sorted dat + Shape1(src.Size()), s); // contain sorted dat Review comment: nit: indentation ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #15948: Fix a memory misalignment in topk operator
access2rohit commented on a change in pull request #15948: Fix a memory misalignment in topk operator URL: https://github.com/apache/incubator-mxnet/pull/15948#discussion_r315777529 ## File path: src/operator/tensor/ordering_op-inl.h ## @@ -414,30 +414,23 @@ void TopKImpl(const RunContext , << element_num << ", but the selected index_t can only represent " << mxnet::common::MaxIntegerValue() << " elements"; Tensor dat = src.FlatTo3D(axis, axis, s); - size_t temp_size = 0; - // Temp space needed by the gpu-based full sorts. - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - // Additional temp space for gpu full sorts for batch ids. - temp_size += PadBytes(sizeof(index_t) * src.Size(), alignment); - // Temp space for cpu sorts. - temp_size = std::max(temp_size, static_cast(sizeof(DType) * src.Size())); + // Temp space needed by the full sorts. + size_t temp_size = std::max( + mxnet::op::SortByKeyWorkspaceSize(src.Size()), + mxnet::op::SortByKeyWorkspaceSize(src.Size())); + size_t workspace_size = temp_size + PadBytes(sizeof(DType) * src.Size(), alignment) + PadBytes(sizeof(index_t) * src.Size(), alignment); if (param.ret_typ == topk_enum::kReturnMask) { -workspace_size += PadBytes(sizeof(int) * batch_size * k, alignment); +workspace_size += PadBytes(sizeof(index_t) * batch_size * k, alignment); } workspace = resource.get_space_typed(Shape1(workspace_size), s); char* workspace_curr_ptr = workspace.dptr_; sorted_dat = Tensor(reinterpret_cast(workspace_curr_ptr), - Shape1(src.Size()), s); // contain sorted dat + Shape1(src.Size()), s); // contain sorted dat workspace_curr_ptr += PadBytes(sizeof(DType) * src.Size(), alignment); indices = Tensor(reinterpret_cast(workspace_curr_ptr), -Shape1(src.Size()), s); // indices in the original matrix + Shape1(src.Size()), s); // indices in the original matrix Review comment: nit: indentation ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #15948: Fix a memory misalignment in topk operator
access2rohit commented on a change in pull request #15948: Fix a memory misalignment in topk operator URL: https://github.com/apache/incubator-mxnet/pull/15948#discussion_r315777314 ## File path: src/operator/tensor/ordering_op-inl.h ## @@ -414,30 +414,23 @@ void TopKImpl(const RunContext , << element_num << ", but the selected index_t can only represent " << mxnet::common::MaxIntegerValue() << " elements"; Tensor dat = src.FlatTo3D(axis, axis, s); - size_t temp_size = 0; - // Temp space needed by the gpu-based full sorts. - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - temp_size = std::max(temp_size, -mxnet::op::SortByKeyWorkspaceSize(src.Size())); - // Additional temp space for gpu full sorts for batch ids. - temp_size += PadBytes(sizeof(index_t) * src.Size(), alignment); - // Temp space for cpu sorts. - temp_size = std::max(temp_size, static_cast(sizeof(DType) * src.Size())); + // Temp space needed by the full sorts. + size_t temp_size = std::max( + mxnet::op::SortByKeyWorkspaceSize(src.Size()), + mxnet::op::SortByKeyWorkspaceSize(src.Size())); + size_t workspace_size = temp_size + PadBytes(sizeof(DType) * src.Size(), alignment) + PadBytes(sizeof(index_t) * src.Size(), alignment); if (param.ret_typ == topk_enum::kReturnMask) { -workspace_size += PadBytes(sizeof(int) * batch_size * k, alignment); +workspace_size += PadBytes(sizeof(index_t) * batch_size * k, alignment); } workspace = resource.get_space_typed(Shape1(workspace_size), s); char* workspace_curr_ptr = workspace.dptr_; sorted_dat = Tensor(reinterpret_cast(workspace_curr_ptr), - Shape1(src.Size()), s); // contain sorted dat + Shape1(src.Size()), s); // contain sorted dat Review comment: nit: indentation This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #15948: Fix a memory misalignment in topk operator
access2rohit commented on a change in pull request #15948: Fix a memory misalignment in topk operator URL: https://github.com/apache/incubator-mxnet/pull/15948#discussion_r315777158 ## File path: 3rdparty/mshadow/mshadow/tensor.h ## @@ -69,15 +69,15 @@ struct Shape { * \param idx dimension index * \return the corresponding dimension size */ - MSHADOW_XINLINE index_t [](index_t idx) { + MSHADOW_XINLINE index_t [](int idx) { Review comment: Can you describe this change a bit more in detail ? Why is this required ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ptrendx commented on issue #15545: Softmax optimization for GPU
ptrendx commented on issue #15545: Softmax optimization for GPU URL: https://github.com/apache/incubator-mxnet/pull/15545#issuecomment-523070340 Ok, it seems that splitting softmax.cc into 3 files, 1 for each operator (softmax, softmin and log_softmax) did the trick fortunately. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] TaoLv commented on issue #15930: Fix dtype inference in arange_like operator
TaoLv commented on issue #15930: Fix dtype inference in arange_like operator URL: https://github.com/apache/incubator-mxnet/pull/15930#issuecomment-523068244 @eric-haibin-lin do you think the below code snippet can be used as a test case? ```python import mxnet as mx import numpy as np dtypes = [np.float16, np.float32, np.float64] for t in dtypes: x = mx.sym.Variable('x', dtype=t) y = mx.sym.reshape(x, shape=(0, 0, -1)) z = mx.sym.contrib.arange_like(y, axis=-1) mod = z.simple_bind(ctx=mx.gpu(0), x=(3, 4, 5, 6), graph_req='null') mod.arg_arrays[0][:] = np.random.normal(size=mod.arg_arrays[0].shape).astype(t) out = mod.forward(is_train=False) assert out[0].dtype == np.float32 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] TaoLv commented on issue #15703: Storage manager / memory usage regression in v1.5
TaoLv commented on issue #15703: Storage manager / memory usage regression in v1.5 URL: https://github.com/apache/incubator-mxnet/issues/15703#issuecomment-523037374 @apeforest Thank you for the analysis. What's the blocker to get this issue fixed on the v1.5.x branch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ciyongch opened a new pull request #15950: [MKLDNN]Support fullyconnected and element-wise ops fusion
ciyongch opened a new pull request #15950: [MKLDNN]Support fullyconnected and element-wise ops fusion URL: https://github.com/apache/incubator-mxnet/pull/15950 ## Description ## This PR is to add the support for fullyconnected and some element-wise (including activation/square/sqrt/exp/abs/clip) ops fusion. ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here @pengzhao-intel @TaoLv @ZhennanQin This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new c798770 Bump the publish timestamp. c798770 is described below commit c798770e49f96b40b14f06bf328f06a8c82c5c08 Author: mxnet-ci AuthorDate: Tue Aug 20 13:34:35 2019 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..ef652a0 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Tue Aug 20 13:34:35 UTC 2019
[GitHub] [incubator-mxnet] zoeygxy commented on issue #15942: Refines NDArray indexing and adds numpy ndarray indexing [DO NOT MERGE YET]
zoeygxy commented on issue #15942: Refines NDArray indexing and adds numpy ndarray indexing [DO NOT MERGE YET] URL: https://github.com/apache/incubator-mxnet/pull/15942#issuecomment-523009739 Waiting for CI result. Still fixing style. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] zixuanweeei commented on issue #15741: MKL-DNN LBR-GRU Inference Integration (FP32 LBR-GRU)
zixuanweeei commented on issue #15741: MKL-DNN LBR-GRU Inference Integration (FP32 LBR-GRU) URL: https://github.com/apache/incubator-mxnet/pull/15741#issuecomment-522955864 Cherry picked from commit 1cf63e1 according to https://github.com/apache/incubator-mxnet/pull/15847#issuecomment-522840815 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] yzhliu commented on a change in pull request #15938: Tvm broadcast backward
yzhliu commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315582881 ## File path: contrib/tvmop/basic/ufunc.py ## @@ -48,3 +50,71 @@ def vadd_gpu(dtype, ndim): s[C].bind(bx, tvm.thread_axis("blockIdx.x")) s[C].bind(tx, tvm.thread_axis("threadIdx.x")) return s, [A, B, C] + + +def assign_by_req(a, req): Review comment: move to sth like common.py? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] yzhliu commented on a change in pull request #15938: Tvm broadcast backward
yzhliu commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315584826 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); Review comment: better to use int and explicitly assign the value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] yzhliu commented on a change in pull request #15938: Tvm broadcast backward
yzhliu commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315583132 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); +for (int i = 0; i < ndim; ++i) { + if (i == 0 || (ograd.size(i) != igrad.size(i)) != (ograd.size(i - 1) != igrad.size(i - 1))) { +ov.push_back(ograd.size(i)); + } else { +ov.back() *= ograd.size(i); + } +} +for (int i = flag; i < ov.size(); i += 2) { + iv.push_back(ov[i]); +} +TShape oshape(ov.begin(), ov.end()), ishape(iv.begin(), iv.end()); +TBlob ograd_tvm(ograd.reshape(oshape).dltensor()); +TBlob igrad_tvm(igrad.reshape(ishape).dltensor()); +std::string funcname = std::string(func) + "reduce1st_" + std::to_string(flag); +// dispatch by req +funcname += "req_"; +MXNET_ASSIGN_REQ_SWITCH(req[k], req_type, { + if (req_type == kWriteTo) { +funcname += "kWriteTo"; Review comment: alignment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] yzhliu commented on a change in pull request #15938: Tvm broadcast backward
yzhliu commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315585709 ## File path: src/operator/contrib/tvmop/ufunc.cc ## @@ -37,29 +38,88 @@ namespace op { static constexpr char func_vadd_cpu[] = "vadd"; static constexpr char func_vadd_gpu[] = "cuda_vadd"; +static constexpr char func_bakcward_vadd_cpu[] = "backward_vadd"; +static constexpr char func_bakcward_vadd_gpu[] = "cuda_backward_vadd"; template -void TVMBroadcastCompute(const nnvm::NodeAttrs& attrs, - const mxnet::OpContext& ctx, - const std::vector& inputs, - const std::vector& req, - const std::vector& outputs) { +void TVMBinaryCompute(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { CHECK_EQ(inputs.size(), 2U); CHECK_EQ(outputs.size(), 1U); tvm::runtime::TVMOpModule::Get()->Call(func, ctx, {inputs[0], inputs[1], outputs[0]}); } +template +void TVMBinaryBackwardComputeUseNone(const nnvm::NodeAttrs& attrs, + const mxnet::OpContext& ctx, + const std::vector& inputs, + const std::vector& req, + const std::vector& outputs) { + CHECK_EQ(inputs.size(), 1U); + CHECK_EQ(outputs.size(), 2U); + int ndim = inputs[0].shape_.ndim(); + for (int k = 0; k < 2; ++k) { +// dispatch by backward +std::vector ov, iv; +const TBlob& ograd = inputs[0], igrad = outputs[k]; +bool flag = ograd.size(0) != igrad.size(0); +for (int i = 0; i < ndim; ++i) { + if (i == 0 || (ograd.size(i) != igrad.size(i)) != (ograd.size(i - 1) != igrad.size(i - 1))) { +ov.push_back(ograd.size(i)); + } else { +ov.back() *= ograd.size(i); + } +} +for (int i = flag; i < ov.size(); i += 2) { + iv.push_back(ov[i]); +} +TShape oshape(ov.begin(), ov.end()), ishape(iv.begin(), iv.end()); +TBlob ograd_tvm(ograd.reshape(oshape).dltensor()); +TBlob igrad_tvm(igrad.reshape(ishape).dltensor()); Review comment: please add some comments to elaborate the ideas. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] yzhliu commented on a change in pull request #15938: Tvm broadcast backward
yzhliu commented on a change in pull request #15938: Tvm broadcast backward URL: https://github.com/apache/incubator-mxnet/pull/15938#discussion_r315579522 ## File path: contrib/tvmop/basic/ufunc.py ## @@ -48,3 +50,71 @@ def vadd_gpu(dtype, ndim): s[C].bind(bx, tvm.thread_axis("blockIdx.x")) s[C].bind(tx, tvm.thread_axis("threadIdx.x")) return s, [A, B, C] + + +def assign_by_req(a, req): +b = tvm.placeholder(a.shape, name='assign_by_req_b', dtype=a.dtype) +if (req == "kAddTo"): +c = tvm.compute(a.shape, lambda *idx: a[idx] + b[idx]) +else: +c = tvm.compute(a.shape, lambda *idx: a[idx]) +return b, c + + +def reduce_axes(X, axes, reducer): Review comment: can we add some comments to elaborate the idea? e.g., meaning of axes. also can we move it to somewhere else so that other operators can reuse? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] leezu commented on issue #15703: Storage manager / memory usage regression in v1.5
leezu commented on issue #15703: Storage manager / memory usage regression in v1.5 URL: https://github.com/apache/incubator-mxnet/issues/15703#issuecomment-522917231 Thank you for diving deep to find the root cause! I'm not blocked by this fix having to wait for MXNet 1.6, but we may wan't ask @TaoLv as release manager. If the fix has to wait for 1.6 the regression should be documented, for example in the "known issues" section of the release notes? https://github.com/apache/incubator-mxnet/releases What do you mean with "**partially** fixes" in #15948? Could that fix in principle be cherry-picked for 1.5.X? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #15853: Float64 fallback for mkldnn subgraph and rnn op
pengzhao-intel commented on issue #15853: Float64 fallback for mkldnn subgraph and rnn op URL: https://github.com/apache/incubator-mxnet/pull/15853#issuecomment-522908818 @ZhennanQin could you try CI again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ElaineBao commented on issue #15884: [WIP] New Website: New Docs [1/3]
ElaineBao commented on issue #15884: [WIP] New Website: New Docs [1/3] URL: https://github.com/apache/incubator-mxnet/pull/15884#issuecomment-522898751 Hi, is it possible to make API docs easier to find ? I have to click many times to get to the function page: Main page -> Docs & Tutorials -> Python API Reference -> Python API -> mxnet.ndarray -> NDArray -> mxnet.ndarray.NDArray.dtype. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. marcoabreu pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 9d8c1d9 Bump the publish timestamp. 9d8c1d9 is described below commit 9d8c1d950a602b1463e429c2ff1ddac51509dfa2 Author: mxnet-ci AuthorDate: Tue Aug 20 07:35:09 2019 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..6ea3551 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Tue Aug 20 07:35:09 UTC 2019
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: [WIP] dynamic custom operator support
samskalicky commented on issue #15921: [WIP] dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-522889129 @wkcn while this PR is not quite done yet, it would be great to get some early feedback since the design/implementation has changed since our initial discussion. Let me know what you think, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] gyshi opened a new pull request #15949: Numpy op exp2
gyshi opened a new pull request #15949: Numpy op exp2 URL: https://github.com/apache/incubator-mxnet/pull/15949 this op exp2 test in numpy branch. (https://www.numpy.org/doc/1.17/reference/generated/numpy.exp2.html) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #15948: Fix a memory misalignment in topk operator
apeforest commented on issue #15948: Fix a memory misalignment in topk operator URL: https://github.com/apache/incubator-mxnet/pull/15948#issuecomment-522877070 @access2rohit @ChaiBapchya Please also help review. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services