Unfortunately, fastmath can do that. One of the possible reasons is
that denormalized numbers are not used, so you can have small non-zero
numbers become zero. It may also be something else that makes your
learning diverge. I would suspect you could get the same issues on CPU
if using -ffast-math for g++.

Out of curiosity, how much of a speed-up do you get using nvcc.fastmath?

On Tue, Dec 06, 2016, Alexander McDowell wrote:
> For some reason when I try to run this 
> <https://github.com/nlml/adversarial-neural-crypt/blob/master/adversarial_neural_cryptography.py>
>  
> code with the gpu with nvcc.fastmath = True, it runs fine, but eventually 
> starts producing NaNs as a loss. It works fine when I run it on cpu but not 
> on the gpu. If I try to run it with nvcc.fastmath = False, it runs 
> perfectly well but the cpu version is considerably faster than the gpu 
> version. Does anyone know why this is?
> 
> GPU result message (with fastmath = True):
> 
> Building Models
> 
> Training Model!
> 
> Training with device = gpu
> 
> Training on iteration #0
> 
> Receiver Training Error: nan. Interceptor Training Error: 1.004785
> 
> Training on iteration #100
> 
> Receiver Training Error: nan. Interceptor Training Error: nan
> 
> 
> ... (keeps going)
> 
> 
> GPU result message (with fastmath = False):
> 
> 
> Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN not available)
> 
> Building Models
> 
> Training Model!
> 
> Training with device = gpu
> 
> Training on iteration #0
> 
> Receiver Training Error: 0.995444. Interceptor Training Error: 1.002399
> 
> Training on iteration #100
> 
> Receiver Training Error: 0.990433. Interceptor Training Error: 1.002779
> 
> Training on iteration #200
> 
> Receiver Training Error: 0.991761. Interceptor Training Error: 1.000185
> 
> 
> ... (keeps going)
> 
> 
> CPU result message:
> 
> 
> Building Models
> 
> Training Model!
> 
> Training with device = cpu
> 
> Training on iteration #0
> 
> Receiver Training Error: 0.994140. Interceptor Training Error: 1.002878
> 
> Training on iteration #100
> 
> Receiver Training Error: 1.004477. Interceptor Training Error: 0.997820
> 
> Training on iteration #200
> 
> Receiver Training Error: 0.998176. Interceptor Training Error: 1.001941
> 
> 
> ... (keeps going)
> 
> 
> I also have my .theanorc file:
> 
> 
> [global]
> 
> device = gpu
> 
> floatX = float32
> 
> cxx = /Library/Developer/CommandLineTools/usr/bin/clang++
> 
> optimizer=fast_compile
> 
> 
> [blas]
> 
> blas.ldflags=
> 
> 
> [nvcc]
> 
> fastmath = True
> 
> nvcc.flags = -D_FORCE_INLINES
> 
> 
> [cuda]
> 
> root = /usr/local/cuda/
> 
> 
> 
> I also ran the CPU and GPU on the GPU Test program from here 
> <http://deeplearning.net/software/theano/tutorial/using_gpu.html> and got 
> the following results:
> 
> GPU (with fastmath = True):
> 
> Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN not available)
> 
> [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), 
> HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
> 
> Looping 1000 times took 0.856593 seconds
> 
> Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
> 
>   1.62323296]
> 
> Used the gpu
> 
> 
> GPU (with fastmath = False):
> 
> Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN not available)
> 
> [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), 
> HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
> 
> Looping 1000 times took 0.872737 seconds
> 
> Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
> 
>   1.62323296]
> 
> Used the gpu
> 
> 
> CPU (using .theanorc):
> 
> [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
> 
> Looping 1000 times took 2.067907 seconds
> 
> Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
> 
>   1.62323284]
> 
> Used the cpu
> 
> CPU (without .theanorc):
> 
> [Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
> 
> Looping 1000 times took 16.824746 seconds
> 
> Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
> 
>   1.62323285]
> 
> Used the cpu
> 
> 
> 
> I also have my computer specs if needed:
> 
> Mac OS Sierra, Version 10.12.1
> Processor: 2.9 GHz Intel Core i5
> 
> Memory: 8 GB 1600 MHz DDR3
> 
> Graphics Card: NVIDIA GeForce GT 650M 512 MB
> 
> Thanks in advance!
> - Alexander McDowell
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to