<https://lh3.googleusercontent.com/-p9SIQ7yAqHE/V7bi-4Lbz7I/AAAAAAAAAAU/GJE2btf6haYXVQZ-ZUflqnJMFV6uyt-BgCLcB/s1600/307_G1%25281%2529_059_01_testraw.tif>

<https://lh3.googleusercontent.com/-IQcqf8n4C-c/V7bjjMM3TbI/AAAAAAAAAAg/NDva85e_OyMZMBPFdgw4g8TpBZPmsUdfwCLcB/s1600/result_200_32.tif>

<https://lh3.googleusercontent.com/-zvuUaFz7wTg/V7biiKHGcFI/AAAAAAAAAAM/7YEbQCybl9AXLs-pBx8o7NO9zNd9MHp4wCLcB/s1600/predicted_4.tif>
Hi folks, could really use some help with this one. This is my first neural 
network, using a Theano backend, and first time posting in a help forum. 

The network tries to match sections of images to a ground truth outlined 
version, in order to predict the outcome for new data (super resolution 
approach to clean it up). 
Everything works fine on my *CPU*, and the *CPU* of the remote computer 
(Ubuntu) I am SSH'ing into - I need to run for more iterations/with more 
data, play about with my model etc to get better results. 

However, when I run on a *GPU*, using the same code, same datasets, just 
different folders, *GPU* enabled, I get the 'squigly' results. 

It is not an issue with how the images are saved, because if I save an 
image before I run the model on it, it is normal. The image files are the 
recommended float32 for use with theano.

However, as soon as I run the model in *GPU* on the image, things get weird 
with the output.

When I use the model compiled on the *GPU* and the saved weights from a 
*CPU* -on the *GPU*-, I get a *bad* result.
When I use the model compiled on the *GPU* with a the weights found on the 
*CPU* -on the *CPU*-, I get a *good* result.

If load the saved weights and a saved model found on my *CPU* to predict on 
the *GPU*, I get a bad result.
Also, when I use the weights found on the *GPU* with model compiled on the 
*CPU*, I get a bad result.

There is *no* difference between saved model.yaml files *GPU* to *CPU*. 
Essentially, it is the finding and application of weights on the *GPU* that 
is causing a problem, which runs it down to model.fit and model.predict

Relevant code section:
-----
for i in range(1,360):
    
    result=model.predict(X_test, batch_size=batch_size, verbose=0)
    
imsave(os.path.join(projfolder,'validationresults',outfolder,'predicted_'+str(i)+'.tif'),
 
img_as_uint(result[0]))
    model.fit(X_train[0:50], Y_train[0:50], batch_size=batch_size, 
nb_epoch=1, validation_data=(X_test, Y_test), shuffle=True)
-----    

My dependencies (running in Python3):
-----
import cv2
import os    
#os.environ['THEANO_FLAGS'] = "device=gpu1"    
import theano
theano.config.floatX = 'float32'
import numpy as np
from keras.optimizers import Adam
from keras.models import Sequential, model_from_yaml
from keras.layers import Convolution2D, MaxPooling2D, UpSampling2D
import matplotlib.pyplot as plt
from skimage.io import imsave
from skimage import img_as_uint
-----

My .theanorc file: (I have also tried ldflags = -lblas, but I am not sure 
all of the flags are correct)

-----

[global]

floatX = float32

device = gpu


[blas]

ldflags = -lf77blas -latlas -lgfortran


[nvcc]

fastmath = True


[gcc]

cxxflags = -ID:\MinGW\include 


[cuda]

root = /usr/local/cuda/bin/
-----

Short of trying to reinstall everything (took 2 weeks the first time - 
I  do not have admin sudo powers on the remote GPU but I can ask the guy to 
install stuff.) I don't know what to try. Can anyone point me in the right 
direction to narrow it down? (Please assume very little knowledge, I am 
learning everything for the first time here). The best guess I have right 
now is either BLAS library linking or Theano Flags, but I don't see how 
they could get such a different result. 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to