ptrendx commented on issue #9774: mx.io.ImageRecordIter does not respect dtype
argument / FP16 performance on Volta
URL:
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367150719
Just in case try synthetic data with `--benchmark 1` - with 24 threads I bet
you are still
ptrendx commented on issue #9774: mx.io.ImageRecordIter does not respect dtype
argument / FP16 performance on Volta
URL:
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367138608
I was asking about the imagenet script. If you use smaller batch size like
256 for 8 GPUs
ptrendx commented on issue #9774: mx.io.ImageRecordIter does not respect dtype
argument / FP16 performance on Volta
URL:
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367057821
@rahul003 Could you paste here how you invoked the benchmark script? Did you
set the
ptrendx commented on issue #9774: mx.io.ImageRecordIter does not respect dtype
argument
URL:
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-365780010
There are few possible explanations.
The most probable reason is workspace size for convolutions. I tried
pitching
ptrendx commented on issue #9774: mx.io.ImageRecordIter does not respect dtype
argument
URL:
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-365484238
Engine does not seem to differentiate between first layer and subsequent
layers on that it considers data going into
ptrendx commented on issue #9774: mx.io.ImageRecordIter does not respect dtype
argument
URL:
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-365483069
I don't think it will perform better than producing fp32 and then casting to
fp16 at the beginning of the training.