Hi Fred, I just followed your suggestion and hard coded the changes in my Theano package, and ran multiple experiments with the same settings. What I observed is that, after applying this patch, the non-determinism reduces to only 2 cases but does not completely disappear. In other words, before applying the changes, each experiment would end up with a different cost, while now there are only 2 points that each of the experiments end up. So, the behavior is more deterministic, but not 100%. Thanks to Ozan Çağlayan <https://github.com/ozancaglayan>, I found that to solve the issue completely (at least for my case), I need to have a recent version of Theano in which the following changes are applied (in theano/scan_module/scan_op.py): scan/scan_op: Convert known_grads to OrderedDict <https://github.com/Theano/Theano/commit/8769382ff661aab15dda474a4c74456037f73cc6> One can also manually change theano/scan_module/scan_op.py according to what is described in the above link.
I still have not performed any real experiment (with large data sets and large number of iterations) using this modification; but it sounds promising. At least in 18 runs (on my toy example) I got exactly the same cost after fixed number updates, while before they would differ. So, while my heavier experiments are running, I would like to start working on introducing the *deterministic* flag to theano, in order to avoid hard coding the changes, and also have the option to run different experiments with different determinism behavior. May I ask you to point me to the portion of Theano code in which I can introduce this flag? Thanks, Amin On Monday, February 1, 2016 at 3:59:43 PM UTC+1, nouiz wrote: > > Go in the file theano/sandbox/cuda/opt.py. Search for > GpuAdvancedIncSubtensor1_dev20 and make sure that it is > GpuAdvancedIncSubtensor1 that is used instead. We wanted to make a Theano > flag for this, do you want to make it? > > On Sun, Jan 31, 2016 at 11:33 AM, Zhenyang Li <[email protected] > <javascript:>> wrote: > >> Hi Fred, >> >> Yes, please, I want to make the result more consistent across different >> machines. >> >> Thank you, >> Zhenyang >> >> On Thursday, January 28, 2016 at 8:34:14 PM UTC+1, nouiz wrote: >>> >>> About cudnn, you can use Theano flag to have it use deterministic >>> algorithms. >>> >>> Theano have a few places where we use the atomic add operation on the >>> GPU. This can cause in ordered addition. As this is done on floats this can >>> lead to d different result. We do this in the grad of advanced subtensor. >>> We have an older version that is deterministic but that is slower. There is >>> no flag to use it, but of you want to try out, I can tell you which change >>> is needed on Theano. >>> >>> Fred >>> Le 27 janv. 2016 04:52, "Zhenyang Li" <[email protected]> a écrit : >>> >>>> Hi Pascal, >>>> >>>> Thank you very much, in the end I solved it by removing cudnn lib, then >>>> it's consistent on a same machine again. >>>> >>>> Another problem I have now is that, when I run a same RNN (standard >>>> LSTM) model, on same type of GPUs (Titan X) on two machines (basically two >>>> nodes on a cluster, so almost same platform). >>>> Setting up proper gradient clipping, like what Keras do >>>> <https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L48>, >>>> I got the exactly same results on the two machines, but without gradient >>>> clipping, I also observed that similar situation above, i.e. >>>> quite similar mini-batch cost in the beginning, but the difference >>>> became larger and larger, is it expected? >>>> >>>> Best, >>>> Zhenyang >>>> >>>> >>>> On Tuesday, January 26, 2016 at 1:18:44 AM UTC+1, Pascal Lamblin wrote: >>>>> >>>>> This is possible, depending on what your model is. >>>>> More information at https://github.com/Theano/Theano/issues/3029 >>>>> >>>>> On Sun, Jan 24, 2016, Zhenyang Li wrote: >>>>> > Hi folks, >>>>> > >>>>> > I ran my theano code on a same GPU multiple times and found that for >>>>> > different runs, I got different results (i mean mini-batch cost >>>>> here), >>>>> > it's always the same for the beginning ~15 (param updating) rounds, >>>>> then >>>>> > got 10e-5 difference and became larger and larger, in the end, I got >>>>> very >>>>> > different results on a evaluation set. >>>>> > >>>>> > However, I also tried the same code on CPU multiple times, and I got >>>>> > consistently same results. >>>>> > >>>>> > What would be the issue, since I could not reproduce same results if >>>>> > running on GPU? And my theano GPU config is: >>>>> > >>>>> > floatX = float32 >>>>> > >>>>> > device = gpu0 >>>>> > >>>>> > mode = FAST_RUN >>>>> > >>>>> > optimizer = fast_run >>>>> > >>>>> > warn_float64 = warn >>>>> > >>>>> > Any help will be appreciated! >>>>> > >>>>> > >>>>> > Best, >>>>> > Zhenyang >>>>> > >>>>> > -- >>>>> > >>>>> > --- >>>>> > You received this message because you are subscribed to the Google >>>>> Groups "theano-users" group. >>>>> > To unsubscribe from this group and stop receiving emails from it, >>>>> send an email to [email protected]. >>>>> > For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> -- >>>>> Pascal >>>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "theano-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "theano-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
