[theano-users] Re: theano_alexnet "train.py"

Goffredo Giordano Sat, 08 Apr 2017 10:42:32 -0700

Hi Petar,

I would like to extremely thank you! I think that the TypeError was related 
to the Python 3.x version I'm using, so in layers.py I have modified line 
56 with center_margin = int((image_shape[2] - cropsize) / 2). Then lines 
104 to 107 with


self.filter_shape[0] = self.filter_shape[0] // 2
self.filter_shape[3] = self.filter_shape[3] // 2
self.image_shape[0] = self.image_shape[0] // 2
self.image_shape[3] = self.image_shape[3] // 2

line 125 with input[:self.channel // 2, :, :, :])
line 133 input[self.channel // 2:, :, :, :])
also line 168 dnn.dnn_conv(img=input_shuffled[:, :int(self.channel / 2),
and line 179 dnn.dnn_conv(img=input_shuffled[:, self.channel // 2:,

Once I resolved this issue the traceback reminds me other errors, regarding 
to TypeError float64 vector. This TypeError for me was related to "dtype 
constructor" (I referred to the 
http://deeplearning.net/software/theano/library/tensor/basic.html) I 
resolved the other errors in alex_net.py modifying line 26 with  y = 
T.ivector('y')

Actually training is still working, I don't know what kind of results it 
will give me, but for me now it is just a great result!
Thank you to everyone for your advices.
Goffredo


Il giorno venerdì 7 aprile 2017 19:04:04 UTC+2, Petar Palasek ha scritto:
>
> Hi Goffredo,
>
> from the traceback you can see that the "TypeError: index must be 
> integers" is coming from file "./lib\layers.py", line 168, in __init__:
> dnn.dnn_conv(img=input_shuffled[:, :self.channel / 2,
>
> Seems like self.channel / 2 is not an integer.
>
> If you look further in the code you will see that self.channel is set 
> to image_shape[0] in line 89 in the same file.
>
> I would check what you pass as the image_shape parameter when you are 
> creating the ConvPoolLayer around line 62 in 
> "C:\deep_learning\alexnet\alex_net.py".
>
> Best,
> Petar
>
>
>
>
>
>
> On Friday, April 7, 2017 at 1:53:03 PM UTC+1, Goffredo Giordano wrote:
>>
>> Hi Arnold,
>>
>> I have some problems similar to yours and I'm trying to run train.py on a 
>> Windows 10 machine. Did you find some solutions to your problems and 
>> incompatiblity with pycuda as I could understand in this post? 
>> Thanks in advance for expert help and your time.
>> Greetings,
>> Goffredo
>>
>> ------
>> C:\deep_learning\alexnet>python train.py
>> WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be 
>> removed in the next release (v0.10). Please switch to the gpuarray backend. 
>> You can get more information about how to switch at this URL:
>>
>> https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
>>
>> Using gpu device 0: GeForce GT 740M (CNMeM is enabled with initial size: 
>> 80.0% of memory, cuDNN 5105)
>> WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be 
>> removed in the next release (v0.10). Please switch to the gpuarray backend. 
>> You can get more information about how to switch at this URL:
>>
>> https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
>>
>> ... building the model
>> conv (cudnn) layer with shape_in: (3, 227, 227, 256)
>> Process Process-1:
>> Traceback (most recent call last):
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\process.py",
>>  
>> line 254, in _bootstrap
>> self.run()
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\process.py",
>>  
>> line 93, in run
>> self._target(*self._args, **self._kwargs)
>> File "C:\deep_learning\alexnet\train.py", line 52, in train_net
>> model = AlexNet(config)
>> File "C:\deep_learning\alexnet\alex_net.py", line 62, in __init__
>> lib_conv=lib_conv,
>> File "./lib\layers.py", line 168, in __init__
>> dnn.dnn_conv(img=input_shuffled[:, :self.channel / 2,
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\var.py",
>>  
>> line 540, in __getitem__
>> return theano.tensor.subtensor.advanced_subtensor(self, *args)
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\gof\op.py",
>>  
>> line 604, in __call__
>> node = self.make_node(*inputs, **kwargs)
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\subtensor.py",
>>  
>> line 2140, in make_node
>> index = tuple(map(as_index_variable, index))
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\subtensor.py",
>>  
>> line 2081, in as_index_variable
>> return make_slice(idx)
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\gof\op.py",
>>  
>> line 604, in __call__
>> node = self.make_node(*inputs, **kwargs)
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\type_other.py",
>>  
>> line 39, in make_node
>> list(map(as_int_none_variable, inp)),
>> File 
>> "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\type_other.py",
>>  
>> line 20, in as_int_none_variable
>> raise TypeError('index must be integers')
>> TypeError: index must be integers
>> -------------------------------------------------------------------
>> PyCUDA ERROR: The context stack was not empty upon module cleanup.
>> -------------------------------------------------------------------
>> A context was still active when the context stack was being
>> cleaned up. At this point in our execution, CUDA may already
>> have been deinitialized, so there is no way we can finish
>> cleanly. The program will be aborted now.
>> Use Context.pop() to avoid this problem.
>> -------------------------------------------------------------------
>>
>>
>> Il giorno mercoledì 6 aprile 2016 00:10:06 UTC+2, Arnold Tunick ha 
>> scritto:
>>>
>>> Hello Petar,
>>>
>>>
>>> 1.  I received help from you on or about 15-17 March 2016 thru Google 
>>> groups theano-users (topic: theano_alexnet "train.py").
>>> 2.  I have made great progress to install and test the prerequisite 
>>> software to implement Theano-AlexNet on a Windows 10 notebook computer. 
>>> 3.  I have re-installed and tested the newer version of Theano (v0.8.0) 
>>> with CUDA 7.5, MS Visual Studio 12.0, python 2.7.9.4, pycuda 2015.1.3 , 
>>> boost 1.5.9, TDM-GCC (64-bit), numpy, zeromq, hickle and pylearn2.
>>> 4.  I have successfully pre-processed a subset of the ImageNet data 
>>> using the script generate_toy_data.sh, which generated all of the expected 
>>> folders and files.
>>> 5.  After fixing some problems related to TypeErrors, per your 
>>> instruction, I then went ahead and ran theano-alexnet train.py as 
>>> C:\SciSoft\Git\theano_alexnet>python train.py 
>>> THEANO_FLAGS=mode=FAST_RUN, floatX=float32. 
>>> 6. Now the program initializes fine, but when it starts the training, it 
>>> crashes with an error message that indicated something about the 
>>> operating system (OS). [see messages below].
>>> 7.  I have contacted Weiguang Ding, who co-authored a 06 April 2015 
>>> arXiv paper on theano-alexnet entitled, "Theano-based large-scale 
>>> visual recognition with multiple GPUs.
>>> 8.  Yet, he recommended that I continue to explore the Google groups 
>>> theano-users for help.
>>> 9.  Interestingly, both Fred Bastien and Pascal Lamblin advised running 
>>> the code on Linux because they think that the theano-alexnet code may 
>>> use features from CUDA that are only available on Linux.
>>> 10. Nevertheless, I would like to continue to work towards viable 
>>> solution using the setup that I have already established, so that I can use 
>>> Theano-AlexNet to explore feature recognition from various new images.
>>> 11. Any suggestions or recommendations that you may offer would be 
>>> greatly appreciated.
>>> .
>>> Thanks in advance for time and expert help.
>>> .
>>> Best,
>>> Arnold Tunick
>>>
>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++++++++++++++++
>>> > C:\SciSoft\Git\theano_alexnet>python train.py 
>>> THEANO_FLAGS=mode=FAST_RUN, floatX=float32
>>> >
>>> > Using gpu device 0: Quadro K4000M (CNMeM is disabled, CuDNN 3007)
>>> .
>>> > ... building the model
>>> .
>>> > conv (cudnn) layer with shape_in: (3, 227, 227, 256)
>>> > conv (cudnn) layer with shape_in: (96, 27, 27, 256)
>>> > conv (cudnn) layer with shape_in: (256, 13, 13, 256)
>>> > conv (cudnn) layer with shape_in: (384, 13, 13, 256)
>>> > conv (cudnn) layer with shape_in: (384, 13, 13, 256)
>>> > fc layer with num_in: 9216 num_out: 4096
>>> > dropout layer with P_drop: 0.5
>>> > fc layer with num_in: 4096 num_out: 4096
>>> > dropout layer with P_drop: 0.5
>>> > softmax layer with num_in: 4096 num_out: 1000
>>> .
>>> > ... training
>>> .
>>> > Process Process-1:
>>> > Traceback (most recent call last):
>>> >   File
>>> > "C:\SciSoft\WinPython-64bit-2.7.9.4\python-2.7.9.amd64\lib\
>>> multiprocessing\process.py",
>>> > line 266, in _bootstrap
>>> >     self.run()
>>> >   File
>>> > "C:\SciSoft\WinPython-64bit-2.7.9.4\python-2.7.9.amd64\lib\
>>> multiprocessing\process.py",
>>> > line 120, in run
>>> >     self._target(*self._args, **self._kwargs)
>>> >   File "C:\SciSoft\Git\theano_alexnet\train.py", line 69, in train_net
>>> >     h = drv.mem_get_ipc_handle(gpuarray_batch.ptr)
>>> .
>>> > LogicError: cuIpcGetMemHandle failed: OS call failed or operation not 
>>> > supported 
>>> on this OS
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: theano_alexnet "train.py"

Reply via email to