Re: [theano-users] different results for different runs on same GPU

Frédéric Bastien Tue, 22 Nov 2016 06:08:22 -0800

Just a link to the Theano issue about this:
https://github.com/Theano/Theano/issues/3029


If you wan to know how to make a flag, there is information there from
memory.

thanks

Fred

On Fri, Nov 11, 2016 at 4:18 PM, Michael Klachko <[email protected]>
wrote:

> This is an old issue, see: https://groups.google.com/
> forum/#!topic/theano-users/Q9tD4Af_7ho
>
>
>
> On Friday, November 11, 2016 at 10:07:22 AM UTC-8, Amin Farajian wrote:
>>
>> Hi Fred,
>> I just followed your suggestion and hard coded the changes in my Theano
>> package, and ran multiple experiments with the same settings. What I
>> observed is that, after applying this patch, the non-determinism reduces to
>> only 2 cases but does not completely disappear.  In other words, before
>> applying the changes, each experiment would end up with a different cost,
>> while now there are only 2 points that each of the experiments end up. So,
>> the behavior is more deterministic, but not 100%.
>> Thanks to Ozan Çağlayan <https://github.com/ozancaglayan>, I found that
>> to solve the issue completely (at least for my case), I need to have a
>> recent version of Theano in which the following changes are applied (in
>> theano/scan_module/scan_op.py):
>> scan/scan_op: Convert known_grads to OrderedDict
>> <https://github.com/Theano/Theano/commit/8769382ff661aab15dda474a4c74456037f73cc6>
>> One can also manually change  theano/scan_module/scan_op.py according to
>> what is described in the above link.
>>
>> I still have not performed any real experiment (with large data sets and
>> large number of iterations) using this modification; but it sounds
>> promising. At least in 18 runs (on my toy example) I got exactly the same
>> cost after fixed number updates, while before they would differ.
>> So, while my heavier experiments are running, I would like to start
>> working on introducing the *deterministic* flag to theano, in order to
>> avoid hard coding the changes, and also have the option to run different
>> experiments with different determinism behavior.
>> May I ask you to point me to the portion of Theano code in which I can
>> introduce this flag?
>>
>> Thanks,
>> Amin
>>
>>
>> On Monday, February 1, 2016 at 3:59:43 PM UTC+1, nouiz wrote:
>>>
>>> Go in the file theano/sandbox/cuda/opt.py. Search for
>>> GpuAdvancedIncSubtensor1_dev20 and make sure that it is
>>> GpuAdvancedIncSubtensor1 that is used instead. We wanted to make a Theano
>>> flag for this, do you want to make it?
>>>
>>> On Sun, Jan 31, 2016 at 11:33 AM, Zhenyang Li <[email protected]>
>>> wrote:
>>>
>>>> Hi Fred,
>>>>
>>>> Yes, please, I want to make the result more consistent across different
>>>> machines.
>>>>
>>>> Thank you,
>>>> Zhenyang
>>>>
>>>> On Thursday, January 28, 2016 at 8:34:14 PM UTC+1, nouiz wrote:
>>>>>
>>>>> About cudnn, you can use Theano flag to have it use deterministic
>>>>> algorithms.
>>>>>
>>>>> Theano have a few places where we use the atomic add operation on the
>>>>> GPU. This can cause in ordered addition. As this is done on floats this 
>>>>> can
>>>>> lead to d different result. We do this in the grad of advanced subtensor.
>>>>> We have an older version that is deterministic but that is slower. There 
>>>>> is
>>>>> no flag to use it, but of you want to try out, I can tell you which change
>>>>> is needed on Theano.
>>>>>
>>>>> Fred
>>>>> Le 27 janv. 2016 04:52, "Zhenyang Li" <[email protected]> a écrit :
>>>>>
>>>>>> Hi Pascal,
>>>>>>
>>>>>> Thank you very much, in the end I solved it by removing cudnn lib,
>>>>>> then it's consistent on a same machine again.
>>>>>>
>>>>>> Another problem I have now is that, when I run a same RNN (standard
>>>>>> LSTM) model, on same type of GPUs (Titan X) on two machines (basically 
>>>>>> two
>>>>>> nodes on a cluster, so almost same platform).
>>>>>> Setting up proper gradient clipping, like what Keras do
>>>>>> <https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L48>,
>>>>>> I got the exactly same results on the two machines, but without gradient
>>>>>> clipping, I also observed that similar situation above, i.e.
>>>>>> quite similar mini-batch cost in the beginning, but the difference
>>>>>> became larger and larger, is it expected?
>>>>>>
>>>>>> Best,
>>>>>> Zhenyang
>>>>>>
>>>>>>
>>>>>> On Tuesday, January 26, 2016 at 1:18:44 AM UTC+1, Pascal Lamblin
>>>>>> wrote:
>>>>>>>
>>>>>>> This is possible, depending on what your model is.
>>>>>>> More information at https://github.com/Theano/Theano/issues/3029
>>>>>>>
>>>>>>> On Sun, Jan 24, 2016, Zhenyang Li wrote:
>>>>>>> > Hi folks,
>>>>>>> >
>>>>>>> > I ran my theano code on a same GPU multiple times and found that
>>>>>>> for
>>>>>>> > different runs, I got different results (i mean mini-batch cost
>>>>>>> here),
>>>>>>> > it's always the same for the beginning ~15 (param updating)
>>>>>>> rounds, then
>>>>>>> > got 10e-5 difference and became larger and larger, in the end, I
>>>>>>> got very
>>>>>>> > different results on a evaluation set.
>>>>>>> >
>>>>>>> > However, I also tried the same code on CPU multiple times, and I
>>>>>>> got
>>>>>>> > consistently same results.
>>>>>>> >
>>>>>>> > What would be the issue, since I could not reproduce same results
>>>>>>> if
>>>>>>> > running on GPU? And my theano GPU config is:
>>>>>>> >
>>>>>>> > floatX = float32
>>>>>>> >
>>>>>>> > device = gpu0
>>>>>>> >
>>>>>>> > mode = FAST_RUN
>>>>>>> >
>>>>>>> > optimizer = fast_run
>>>>>>> >
>>>>>>> > warn_float64 = warn
>>>>>>> >
>>>>>>> > Any help will be appreciated!
>>>>>>> >
>>>>>>> >
>>>>>>> > Best,
>>>>>>> > Zhenyang
>>>>>>> >
>>>>>>> > --
>>>>>>> >
>>>>>>> > ---
>>>>>>> > You received this message because you are subscribed to the Google
>>>>>>> Groups "theano-users" group.
>>>>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pascal
>>>>>>>
>>>>>> --
>>>>>>
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "theano-users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "theano-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] different results for different runs on same GPU

Reply via email to