Unfortunately, fastmath can do that. One of the possible reasons is that denormalized numbers are not used, so you can have small non-zero numbers become zero. It may also be something else that makes your learning diverge. I would suspect you could get the same issues on CPU if using -ffast-math for g++.
Out of curiosity, how much of a speed-up do you get using nvcc.fastmath? On Tue, Dec 06, 2016, Alexander McDowell wrote: > For some reason when I try to run this > <https://github.com/nlml/adversarial-neural-crypt/blob/master/adversarial_neural_cryptography.py> > > code with the gpu with nvcc.fastmath = True, it runs fine, but eventually > starts producing NaNs as a loss. It works fine when I run it on cpu but not > on the gpu. If I try to run it with nvcc.fastmath = False, it runs > perfectly well but the cpu version is considerably faster than the gpu > version. Does anyone know why this is? > > GPU result message (with fastmath = True): > > Building Models > > Training Model! > > Training with device = gpu > > Training on iteration #0 > > Receiver Training Error: nan. Interceptor Training Error: 1.004785 > > Training on iteration #100 > > Receiver Training Error: nan. Interceptor Training Error: nan > > > ... (keeps going) > > > GPU result message (with fastmath = False): > > > Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN not available) > > Building Models > > Training Model! > > Training with device = gpu > > Training on iteration #0 > > Receiver Training Error: 0.995444. Interceptor Training Error: 1.002399 > > Training on iteration #100 > > Receiver Training Error: 0.990433. Interceptor Training Error: 1.002779 > > Training on iteration #200 > > Receiver Training Error: 0.991761. Interceptor Training Error: 1.000185 > > > ... (keeps going) > > > CPU result message: > > > Building Models > > Training Model! > > Training with device = cpu > > Training on iteration #0 > > Receiver Training Error: 0.994140. Interceptor Training Error: 1.002878 > > Training on iteration #100 > > Receiver Training Error: 1.004477. Interceptor Training Error: 0.997820 > > Training on iteration #200 > > Receiver Training Error: 0.998176. Interceptor Training Error: 1.001941 > > > ... (keeps going) > > > I also have my .theanorc file: > > > [global] > > device = gpu > > floatX = float32 > > cxx = /Library/Developer/CommandLineTools/usr/bin/clang++ > > optimizer=fast_compile > > > [blas] > > blas.ldflags= > > > [nvcc] > > fastmath = True > > nvcc.flags = -D_FORCE_INLINES > > > [cuda] > > root = /usr/local/cuda/ > > > > I also ran the CPU and GPU on the GPU Test program from here > <http://deeplearning.net/software/theano/tutorial/using_gpu.html> and got > the following results: > > GPU (with fastmath = True): > > Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN not available) > > [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), > HostFromGpu(GpuElemwise{exp,no_inplace}.0)] > > Looping 1000 times took 0.856593 seconds > > Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 > > 1.62323296] > > Used the gpu > > > GPU (with fastmath = False): > > Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN not available) > > [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), > HostFromGpu(GpuElemwise{exp,no_inplace}.0)] > > Looping 1000 times took 0.872737 seconds > > Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 > > 1.62323296] > > Used the gpu > > > CPU (using .theanorc): > > [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] > > Looping 1000 times took 2.067907 seconds > > Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 > > 1.62323284] > > Used the cpu > > CPU (without .theanorc): > > [Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)] > > Looping 1000 times took 16.824746 seconds > > Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753 > > 1.62323285] > > Used the cpu > > > > I also have my computer specs if needed: > > Mac OS Sierra, Version 10.12.1 > Processor: 2.9 GHz Intel Core i5 > > Memory: 8 GB 1600 MHz DDR3 > > Graphics Card: NVIDIA GeForce GT 650M 512 MB > > Thanks in advance! > - Alexander McDowell > > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- Pascal -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
