BiranLi commented on issue #8373: distribute training in fp16
URL: https://github.com/apache/incubator-mxnet/pull/8373#issuecomment-368188462
@rahul003
Because of the limited range of expression of FP16, gradient diffusion
occurs in BP. The easiest way to handle this is to scale the gra
BiranLi commented on issue #8373: distribute training in fp16
URL: https://github.com/apache/incubator-mxnet/pull/8373#issuecomment-368186682
@solin319
Is it possible to consider gradient diffusion in computational calculations,
such as adding a grad_scale processing interface?
--