Hi While looking at the failures reporting in this issue: https://github.com/apache/incubator-mxnet/issues/14522
I have noticed that in mshadow when calling the BLAS Engine we are doing narrowing integer conversions from index_t (int64_t) to int, and then operations on dimensions that can overflow integer arithmetic such as i * m *k as seen in the second link below. Which when added to the pointer holding the matrix data results in a) undefined behaviour, and b) in x86_64 a subtraction instead of an addition due to platform dependent integer overflow semantics in x86 platforms. I think we should address this in a twofold manner: checking the shapes for possible overflows in the implementation (which will have some performance impact), and second we should widen the types of BLASEngine to index_t. https://github.com/dmlc/mshadow/blob/master/mshadow/tensor_cpu-inl.h#L613 Which in CPU ends up calling batched_gemm: https://github.com/dmlc/mshadow/blob/master/mshadow/dot_engine-inl.h#L339 Let me know if you have additional thoughts in this solution or see any blockers or better ideas, otherwise I will proceed to work on PRs fixing this in mshadow. Since it's an issue that seem to happen often I think we should be really careful with integer overflows and undefined behaviour related bugs and pay attention in CRs to this kind of traps. Thanks! Pedro.