UB, type narrowing and integer overflows (in this case in mshadow)

Pedro Larroy Thu, 28 Mar 2019 13:38:27 -0700

Hi

While looking at the failures reporting in this issue:
https://github.com/apache/incubator-mxnet/issues/14522


I have noticed that in mshadow when calling the BLAS Engine we are
doing narrowing integer conversions from index_t (int64_t) to int, and
then operations on dimensions that can overflow integer arithmetic
such as i * m *k as seen in the second link below. Which when added to
the pointer holding the matrix data results in a) undefined behaviour,
and b) in x86_64 a subtraction instead of an addition due to platform
dependent integer overflow semantics in x86 platforms.

I think we should address this in a twofold manner: checking the
shapes for possible overflows in the implementation (which will have
some performance impact), and second we should widen the types of
BLASEngine to index_t.

https://github.com/dmlc/mshadow/blob/master/mshadow/tensor_cpu-inl.h#L613

Which in CPU ends up calling batched_gemm:

https://github.com/dmlc/mshadow/blob/master/mshadow/dot_engine-inl.h#L339

Let me know if you have additional thoughts in this solution or see
any blockers or better ideas, otherwise I will proceed to work on PRs
fixing this in mshadow. Since it's an issue that seem to happen often
I think we should be really careful with integer overflows and
undefined behaviour related bugs and pay attention in CRs to this kind
of traps.

Thanks!

Pedro.

UB, type narrowing and integer overflows (in this case in mshadow)

Reply via email to