Re: [theano-users] different results for different runs on same GPU

Michael Klachko Fri, 11 Nov 2016 13:18:19 -0800

This is an old issue, see: 
https://groups.google.com/forum/#!topic/theano-users/Q9tD4Af_7ho



On Friday, November 11, 2016 at 10:07:22 AM UTC-8, Amin Farajian wrote:
>
> Hi Fred,
> I just followed your suggestion and hard coded the changes in my Theano 
> package, and ran multiple experiments with the same settings. What I 
> observed is that, after applying this patch, the non-determinism reduces to 
> only 2 cases but does not completely disappear.  In other words, before 
> applying the changes, each experiment would end up with a different cost, 
> while now there are only 2 points that each of the experiments end up. So, 
> the behavior is more deterministic, but not 100%. 
> Thanks to Ozan Çağlayan <https://github.com/ozancaglayan>, I found that 
> to solve the issue completely (at least for my case), I need to have a 
> recent version of Theano in which the following changes are applied (in 
> theano/scan_module/scan_op.py):
> scan/scan_op: Convert known_grads to OrderedDict 
> <https://github.com/Theano/Theano/commit/8769382ff661aab15dda474a4c74456037f73cc6>
> One can also manually change  theano/scan_module/scan_op.py according to 
> what is described in the above link.
>
> I still have not performed any real experiment (with large data sets and 
> large number of iterations) using this modification; but it sounds 
> promising. At least in 18 runs (on my toy example) I got exactly the same 
> cost after fixed number updates, while before they would differ.
> So, while my heavier experiments are running, I would like to start 
> working on introducing the *deterministic* flag to theano, in order to 
> avoid hard coding the changes, and also have the option to run different 
> experiments with different determinism behavior.
> May I ask you to point me to the portion of Theano code in which I can 
> introduce this flag?
>
> Thanks,
> Amin
>
>
> On Monday, February 1, 2016 at 3:59:43 PM UTC+1, nouiz wrote:
>>
>> Go in the file theano/sandbox/cuda/opt.py. Search for 
>> GpuAdvancedIncSubtensor1_dev20 and make sure that it is 
>> GpuAdvancedIncSubtensor1 that is used instead. We wanted to make a Theano 
>> flag for this, do you want to make it?
>>
>> On Sun, Jan 31, 2016 at 11:33 AM, Zhenyang Li <[email protected]> 
>> wrote:
>>
>>> Hi Fred,
>>>
>>> Yes, please, I want to make the result more consistent across different 
>>> machines.
>>>
>>> Thank you,
>>> Zhenyang
>>>
>>> On Thursday, January 28, 2016 at 8:34:14 PM UTC+1, nouiz wrote:
>>>>
>>>> About cudnn, you can use Theano flag to have it use deterministic 
>>>> algorithms.
>>>>
>>>> Theano have a few places where we use the atomic add operation on the 
>>>> GPU. This can cause in ordered addition. As this is done on floats this 
>>>> can 
>>>> lead to d different result. We do this in the grad of advanced subtensor. 
>>>> We have an older version that is deterministic but that is slower. There 
>>>> is 
>>>> no flag to use it, but of you want to try out, I can tell you which change 
>>>> is needed on Theano.
>>>>
>>>> Fred
>>>> Le 27 janv. 2016 04:52, "Zhenyang Li" <[email protected]> a écrit :
>>>>
>>>>> Hi Pascal,
>>>>>
>>>>> Thank you very much, in the end I solved it by removing cudnn lib, 
>>>>> then it's consistent on a same machine again.
>>>>>
>>>>> Another problem I have now is that, when I run a same RNN (standard 
>>>>> LSTM) model, on same type of GPUs (Titan X) on two machines (basically 
>>>>> two 
>>>>> nodes on a cluster, so almost same platform).
>>>>> Setting up proper gradient clipping, like what Keras do 
>>>>> <https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L48>, 
>>>>> I got the exactly same results on the two machines, but without gradient 
>>>>> clipping, I also observed that similar situation above, i.e.
>>>>> quite similar mini-batch cost in the beginning, but the difference 
>>>>> became larger and larger, is it expected?
>>>>>
>>>>> Best,
>>>>> Zhenyang
>>>>>
>>>>>
>>>>> On Tuesday, January 26, 2016 at 1:18:44 AM UTC+1, Pascal Lamblin wrote:
>>>>>>
>>>>>> This is possible, depending on what your model is. 
>>>>>> More information at https://github.com/Theano/Theano/issues/3029 
>>>>>>
>>>>>> On Sun, Jan 24, 2016, Zhenyang Li wrote: 
>>>>>> > Hi folks, 
>>>>>> > 
>>>>>> > I ran my theano code on a same GPU multiple times and found that 
>>>>>> for 
>>>>>> > different runs, I got different results (i mean mini-batch cost 
>>>>>> here), 
>>>>>> > it's always the same for the beginning ~15 (param updating) rounds, 
>>>>>> then 
>>>>>> > got 10e-5 difference and became larger and larger, in the end, I 
>>>>>> got very 
>>>>>> > different results on a evaluation set. 
>>>>>> > 
>>>>>> > However, I also tried the same code on CPU multiple times, and I 
>>>>>> got 
>>>>>> > consistently same results. 
>>>>>> > 
>>>>>> > What would be the issue, since I could not reproduce same results 
>>>>>> if 
>>>>>> > running on GPU? And my theano GPU config is: 
>>>>>> > 
>>>>>> > floatX = float32 
>>>>>> > 
>>>>>> > device = gpu0 
>>>>>> > 
>>>>>> > mode = FAST_RUN 
>>>>>> > 
>>>>>> > optimizer = fast_run 
>>>>>> > 
>>>>>> > warn_float64 = warn 
>>>>>> > 
>>>>>> > Any help will be appreciated! 
>>>>>> > 
>>>>>> > 
>>>>>> > Best, 
>>>>>> > Zhenyang 
>>>>>> > 
>>>>>> > -- 
>>>>>> > 
>>>>>> > --- 
>>>>>> > You received this message because you are subscribed to the Google 
>>>>>> Groups "theano-users" group. 
>>>>>> > To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected]. 
>>>>>> > For more options, visit https://groups.google.com/d/optout. 
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Pascal 
>>>>>>
>>>>> -- 
>>>>>
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "theano-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] different results for different runs on same GPU

Reply via email to