[GitHub] [incubator-mxnet] apeforest edited a comment on issue #15703: Storage manager / memory usage regression in v1.5
apeforest edited a comment on issue #15703: Storage manager / memory usage regression in v1.5 URL: https://github.com/apache/incubator-mxnet/issues/15703#issuecomment-523119231 @TaoLv This is not an issue (bug per se) but limitation of int32_t data types we used in MXNet. As I pointed to the line https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/ordering_op-inl.h#L434 the workspace is created using a 1D mshadow::Shape object, whose length is bounded by `index_t` which is int32_t by default. When the workspace size required is larger than 2^31, there will be overflow and causing OOM. @leezu #15948 is a partial fix because it only fixed the memory misalignment but not the OOM caused by int overflow. To really fix this issue, we need to support int64_t in mxnet by default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest edited a comment on issue #15703: Storage manager / memory usage regression in v1.5
apeforest edited a comment on issue #15703: Storage manager / memory usage regression in v1.5 URL: https://github.com/apache/incubator-mxnet/issues/15703#issuecomment-523119231 @TaoLv This is not an issue (bug per se) but limitation of int32_t data types we used in MXNet. As I pointed to the line https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/ordering_op-inl.h#L434 the workspace is created using a 1D mshadow::Shape object, whose length is bounded by `index_t` which is int32_t by default. When the workspace size required is larger than 2^31, there will be overflow and causing OOM. #15948 is a partial fix because it only fixed the memory misalignment but not the OOM caused by int overflow. To really fix this issue, we need to support int64_t in mxnet by default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services