[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013 Both suggestions didn't help improve the speed unforunately. Using MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this setting helps consistently. If it picks the fastest, why would it not help in all cases? I understand cases where it should be same speed as other algos. But sometimes, this is slower than setting it to 1. All else should remain same, right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-372480640 Sorry I was digressing from the topic of the issue. Regarding the iterator issue, we need to document the behavior that it will return fp32 data regardless. Keeping this open till we fix it or document it This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013 Both suggestions didn't help improve the speed unforunately. Using MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this setting helps consistently. If it picks the fastest, why would it not help in all cases? I understand cases where it should be same speed as other algos. But sometimes, this is slower than setting it to 1. All else should remain same, right? I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying to understand some of the changes you made. Here, https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L141 Why does softmax input need to be cast to fp32? Is it for precision reasons? Is that double buffering you mention with identity operator general enough to go in as an official guide? Thanks for your help :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013 Both suggestions didn't help improve the speed unforunately. Using MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this setting helps consistently. If it picks the fastest, why would it not help in all cases? I understand cases where it should be same speed as other algos. But sometimes, this is slower than setting it to 1. All else should remain same, right? I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying to understand some of the changes you made. Here, https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L140-143 Why does softmax input need to be cast to fp32? Is it for precision reasons? Is that double buffering you mention with identity operator general enough to go in as an official guide? Thanks for your help :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013 Both suggestions didn't help improve the speed unforunately. Using MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this setting helps consistently. If it picks the fastest, why would it not help in all cases? I understand cases where it should be same speed as other algos. But sometimes, this is slower than setting it to 1. All else should remain same, right? I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying to understand some of the changes you made. Here, https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L140 Why does softmax input need to be cast to fp32? Is it for precision reasons? Is that double buffering you mention with identity operator general enough to go in as an official guide? Thanks for your help :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013 Both suggestions didn't help improve the speed unforunately. Using MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this setting helps consistently. If it picks the fastest, why would it not help in all cases? I understand cases where it should be same speed as other algos. But sometimes, this is slower than setting it to 1. All else should remain same, right? I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying to understand some of the changes you made. Here, https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L142 Why does softmax input need to be cast to fp32? Is it for precision reasons? Is that double buffering you mention with identity operator general enough to go in as an official guide? Thanks for your help :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013 Both suggestions didn't help improve the speed unforunately. Using MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this setting helps consistently. If it picks the fastest, why would it not help in all cases? I understand cases where it should be same speed as other algos. But sometimes, this is slower than setting it to 1. All else should remain same, right? I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying to understand some of the changes you made. Here, https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L143 Why does softmax input need to be cast to fp32? Is it for precision reasons? Is that double buffering you mention with identity operator general enough to go in as an official guide? Thanks for your help :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388 Okay cool, I'll try to document that. I was using ``` python train_imagenet.py --data-train data/imagenet1k-train.rec --data-val data/imagenet1k-val.rec --gpus 0,1,2,3,4,5,6,7 --dtype float16 --num-epochs 1 --batch-size 1280 --data-nthreads 24 ``` ```python train_imagenet.py --data-train ... --data-val ... --batch-size 960 --gpus 0,1,2,3,4,5,6,7 --dtype float16 --data-nthreads 24``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388 Okay cool, I'll try to document that. I was using ``` python train_imagenet.py --data-train data/imagenet1k-train.rec --data-val data/imagenet1k-val.rec --gpus 0,1,2,3,4,5,6,7 --dtype float16 --num-epochs 1 --batch-size 1280 --data-nthreads 24 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388 Okay cool, I'll try to document that. I was using ``` python train_imagenet.py --data-train data/imagenet1k-train.rec --data-val data/imagenet1k-val.rec --gpus 0,1,2,3,4,5,6,7 --dtype float16 --num-epochs 1 --batch-size 1280 --data-nthreads 24 ``` ```python train_imagenet.py --data-train ... --data-val ... --batch-size 960 --gpus 0,1,2,3,4,5,6,7 --dtype float16 --data-nthreads 24``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367142407 I have cudnn v7005 and cuda 9.0 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388 Okay cool, I'll try to document that. I was using the maximum batch size which is a multiple of 8 that I could fit with p3.16x, i.e. 960 for the imagenet script, and set data-nthreads to 24. ```python train_imagenet.py --data-train ... --data-val ... --batch-size 960 --gpus 0,1,2,3,4,5,6,7 --dtype float16 --data-nthreads 24``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388 Okay cool, I'll try to document that. I was using the maximum batch size which is a multiple of 8 that I could fit with p3.16x, i.e. 960 for the imagenet script, and set data-nthreads to 24. ```python train_imagenet.py --data-train ... --data-val ... --batch-size 960 --gpus 0,1,2,3,4,5,6,7 --dtype float16``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367124681 I'm running this command. `python train_cifar10.py --batch-size 256 --network resnet --num-layers 50 --gpus 0,1,2,3,4,5,6,7` Are you sure you have the right variable? MXNET_CUDNN_AUTOTUNE_DEFAULT=2 doesn't seem to be a valid value. It still tries to autotune if I set it to 2 Relevant code is ``` if (param.cudnn_tune.value() && reg_.size() % 50 == 0) { LOG(INFO) << "Running performance tests to find the best convolution " "algorithm, " ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367124681 I'm running this command. `python train_cifar10.py --batch-size 256 --network resnet --num-layers 50 --gpus 0,1,2,3,4,5,6,7` Are you sure you have the right variable? MXNET_CUDNN_AUTOTUNE_DEFAULT=2 doesn't seem to be a valid value. It still tries to autotune if I set it to 2 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services