[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-12 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013
 
 
   Both suggestions didn't help improve the speed unforunately. Using 
MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this 
setting helps consistently. If it picks the fastest, why would it not help in 
all cases? I understand cases where it should be same speed as other algos. But 
sometimes, this is slower than setting it to 1. All else should remain same, 
right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-12 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-372480640
 
 
   Sorry I was digressing from the topic of the issue. Regarding the iterator 
issue, we need to document the behavior that it will return fp32 data 
regardless. Keeping this open till we fix it or document it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-08 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013
 
 
   Both suggestions didn't help improve the speed unforunately. Using 
MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this 
setting helps consistently. If it picks the fastest, why would it not help in 
all cases? I understand cases where it should be same speed as other algos. But 
sometimes, this is slower than setting it to 1. All else should remain same, 
right?
   
   I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying 
to understand some of the changes you made.
   Here, 
   
https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L141
   Why does softmax input need to be cast to fp32? Is it for precision reasons?
   
   Is that double buffering you mention with identity operator general enough 
to go in as an official guide? 
   
   Thanks for your help :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-08 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013
 
 
   Both suggestions didn't help improve the speed unforunately. Using 
MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this 
setting helps consistently. If it picks the fastest, why would it not help in 
all cases? I understand cases where it should be same speed as other algos. But 
sometimes, this is slower than setting it to 1. All else should remain same, 
right?
   
   I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying 
to understand some of the changes you made.
   Here, 
   
https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L140-143
   Why does softmax input need to be cast to fp32? Is it for precision reasons?
   
   Is that double buffering you mention with identity operator general enough 
to go in as an official guide? 
   
   Thanks for your help :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-08 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013
 
 
   Both suggestions didn't help improve the speed unforunately. Using 
MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this 
setting helps consistently. If it picks the fastest, why would it not help in 
all cases? I understand cases where it should be same speed as other algos. But 
sometimes, this is slower than setting it to 1. All else should remain same, 
right?
   
   I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying 
to understand some of the changes you made.
   Here, 
   
https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L140
   Why does softmax input need to be cast to fp32? Is it for precision reasons?
   
   Is that double buffering you mention with identity operator general enough 
to go in as an official guide? 
   
   Thanks for your help :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-08 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013
 
 
   Both suggestions didn't help improve the speed unforunately. Using 
MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this 
setting helps consistently. If it picks the fastest, why would it not help in 
all cases? I understand cases where it should be same speed as other algos. But 
sometimes, this is slower than setting it to 1. All else should remain same, 
right?
   
   I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying 
to understand some of the changes you made.
   Here, 
   
https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L142
   Why does softmax input need to be cast to fp32? Is it for precision reasons?
   
   Is that double buffering you mention with identity operator general enough 
to go in as an official guide? 
   
   Thanks for your help :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-03-08 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013
 
 
   Both suggestions didn't help improve the speed unforunately. Using 
MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this 
setting helps consistently. If it picks the fastest, why would it not help in 
all cases? I understand cases where it should be same speed as other algos. But 
sometimes, this is slower than setting it to 1. All else should remain same, 
right?
   
   I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying 
to understand some of the changes you made.
   Here, 
   
https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L143
   Why does softmax input need to be cast to fp32? Is it for precision reasons?
   
   Is that double buffering you mention with identity operator general enough 
to go in as an official guide? 
   
   Thanks for your help :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-02-20 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388
 
 
   Okay cool, I'll try to document that. 
   
   I was using 
   ```
   python train_imagenet.py --data-train data/imagenet1k-train.rec --data-val 
data/imagenet1k-val.rec --gpus 0,1,2,3,4,5,6,7 --dtype float16 --num-epochs 1 
--batch-size 1280 --data-nthreads 24
   ```
   
   ```python train_imagenet.py --data-train ... --data-val ... --batch-size 960 
--gpus 0,1,2,3,4,5,6,7 --dtype float16 --data-nthreads 24```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-02-20 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388
 
 
   Okay cool, I'll try to document that. 
   
   I was using 
   ```
   python train_imagenet.py --data-train data/imagenet1k-train.rec --data-val 
data/imagenet1k-val.rec --gpus 0,1,2,3,4,5,6,7 --dtype float16 --num-epochs 1 
--batch-size 1280 --data-nthreads 24
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-02-20 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388
 
 
   Okay cool, I'll try to document that. 
   
   I was using ```
   python train_imagenet.py --data-train data/imagenet1k-train.rec --data-val 
data/imagenet1k-val.rec --gpus 0,1,2,3,4,5,6,7 --dtype float16 --num-epochs 1 
--batch-size 1280 --data-nthreads 24
   ```
   
   ```python train_imagenet.py --data-train ... --data-val ... --batch-size 960 
--gpus 0,1,2,3,4,5,6,7 --dtype float16 --data-nthreads 24```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-02-20 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367142407
 
 
   I have cudnn v7005 and cuda 9.0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-02-20 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388
 
 
   Okay cool, I'll try to document that. 
   
   I was using the maximum batch size which is a multiple of 8 that I could fit 
with p3.16x, i.e. 960 for the imagenet script, and set data-nthreads to 24. 
   
   ```python train_imagenet.py --data-train ... --data-val ... --batch-size 960 
--gpus 0,1,2,3,4,5,6,7 --dtype float16 --data-nthreads 24```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-02-20 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367141388
 
 
   Okay cool, I'll try to document that. 
   
   I was using the maximum batch size which is a multiple of 8 that I could fit 
with p3.16x, i.e. 960 for the imagenet script, and set data-nthreads to 24. 
   
   ```python train_imagenet.py --data-train ... --data-val ... --batch-size 960 
--gpus 0,1,2,3,4,5,6,7 --dtype float16```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-02-20 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367124681
 
 
   I'm running this command.
   
   `python train_cifar10.py --batch-size 256 --network resnet --num-layers 50 
--gpus 0,1,2,3,4,5,6,7`
   
   Are you sure you have the right variable? 
   MXNET_CUDNN_AUTOTUNE_DEFAULT=2 doesn't seem to be a valid value. It still 
tries to autotune if I set it to 2
   
   Relevant code is
   ```
   if (param.cudnn_tune.value() && reg_.size() % 50 == 0) {
 LOG(INFO) << "Running performance tests to find the best convolution "
  "algorithm, "
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta

2018-02-20 Thread GitBox
rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype 
argument / FP16 performance on Volta
URL: 
https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-367124681
 
 
   I'm running this command.
   
   `python train_cifar10.py --batch-size 256 --network resnet --num-layers 50 
--gpus 0,1,2,3,4,5,6,7`
   
   Are you sure you have the right variable? 
   MXNET_CUDNN_AUTOTUNE_DEFAULT=2 doesn't seem to be a valid value. It still 
tries to autotune if I set it to 2


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services