<https://lh3.googleusercontent.com/-p9SIQ7yAqHE/V7bi-4Lbz7I/AAAAAAAAAAU/GJE2btf6haYXVQZ-ZUflqnJMFV6uyt-BgCLcB/s1600/307_G1%25281%2529_059_01_testraw.tif>
<https://lh3.googleusercontent.com/-IQcqf8n4C-c/V7bjjMM3TbI/AAAAAAAAAAg/NDva85e_OyMZMBPFdgw4g8TpBZPmsUdfwCLcB/s1600/result_200_32.tif> <https://lh3.googleusercontent.com/-zvuUaFz7wTg/V7biiKHGcFI/AAAAAAAAAAM/7YEbQCybl9AXLs-pBx8o7NO9zNd9MHp4wCLcB/s1600/predicted_4.tif> Hi folks, could really use some help with this one. This is my first neural network, using a Theano backend, and first time posting in a help forum. The network tries to match sections of images to a ground truth outlined version, in order to predict the outcome for new data (super resolution approach to clean it up). Everything works fine on my *CPU*, and the *CPU* of the remote computer (Ubuntu) I am SSH'ing into - I need to run for more iterations/with more data, play about with my model etc to get better results. However, when I run on a *GPU*, using the same code, same datasets, just different folders, *GPU* enabled, I get the 'squigly' results. It is not an issue with how the images are saved, because if I save an image before I run the model on it, it is normal. The image files are the recommended float32 for use with theano. However, as soon as I run the model in *GPU* on the image, things get weird with the output. When I use the model compiled on the *GPU* and the saved weights from a *CPU* -on the *GPU*-, I get a *bad* result. When I use the model compiled on the *GPU* with a the weights found on the *CPU* -on the *CPU*-, I get a *good* result. If load the saved weights and a saved model found on my *CPU* to predict on the *GPU*, I get a bad result. Also, when I use the weights found on the *GPU* with model compiled on the *CPU*, I get a bad result. There is *no* difference between saved model.yaml files *GPU* to *CPU*. Essentially, it is the finding and application of weights on the *GPU* that is causing a problem, which runs it down to model.fit and model.predict Relevant code section: ----- for i in range(1,360): result=model.predict(X_test, batch_size=batch_size, verbose=0) imsave(os.path.join(projfolder,'validationresults',outfolder,'predicted_'+str(i)+'.tif'), img_as_uint(result[0])) model.fit(X_train[0:50], Y_train[0:50], batch_size=batch_size, nb_epoch=1, validation_data=(X_test, Y_test), shuffle=True) ----- My dependencies (running in Python3): ----- import cv2 import os #os.environ['THEANO_FLAGS'] = "device=gpu1" import theano theano.config.floatX = 'float32' import numpy as np from keras.optimizers import Adam from keras.models import Sequential, model_from_yaml from keras.layers import Convolution2D, MaxPooling2D, UpSampling2D import matplotlib.pyplot as plt from skimage.io import imsave from skimage import img_as_uint ----- My .theanorc file: (I have also tried ldflags = -lblas, but I am not sure all of the flags are correct) ----- [global] floatX = float32 device = gpu [blas] ldflags = -lf77blas -latlas -lgfortran [nvcc] fastmath = True [gcc] cxxflags = -ID:\MinGW\include [cuda] root = /usr/local/cuda/bin/ ----- Short of trying to reinstall everything (took 2 weeks the first time - I do not have admin sudo powers on the remote GPU but I can ask the guy to install stuff.) I don't know what to try. Can anyone point me in the right direction to narrow it down? (Please assume very little knowledge, I am learning everything for the first time here). The best guess I have right now is either BLAS library linking or Theano Flags, but I don't see how they could get such a different result. -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
