Hi, I train for an epoch containing 30K samples with batches of 64 and I divide the time spent to number of updates. So the first call to Theano function should be smoothed out when we average, am I wrong?
I re-did the test yesterday, the timings are pretty equivalent with old/new backends but the new one is definitely not "faster". The code is at: https://github.com/lium-lst/nmtpy/blob/master/nmtpy/layers.py You can search for theano.scan inside. Basically we have 2 gru_layer's for source encoder and 1 gru_cond_layer in decoder. gru_cond_layer is actually 2-GRU's intertwined with some complex interactions. -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
