[theano-users] Vanilla Theano replication of a Keras model performs bad. What am I missing?

wondaround Sat, 15 Oct 2016 09:55:45 -0700

I wanted to start using Theano because of the numerous positive reviews I 
read around. To make my life easier, I decided to start with some wrappers, 
specifically Keras. I find Keras a very useful and well done tool. It is 
perfect to start using Theano and it is really easily understandable and 
usable.
Now, I created a Keras model (using Theano interface), which works 
perfectly well and I would like to replicate it using only Theano code.
Since Keras is actually using Theano code, I should be able, in principle, 
to do this.
The neural net is a convolutional neural network for a one output 
regression task, with the following layers: conv2d, maxpool2d, conv2d, 
maxpool2d, dense, dense, output and using Adam optimizer.
Unfortunately, despite it seems to me that I implemented exactly the same 
neural network with vanilla Theano code, the performance is consistently 
different.
So, I guess I must be wrong somewhere, but I can't see where.


I will put here a link to the codes I'm using, in order to make the post 
too long. They are short and simple codes, do not worry =)

Keras Impl <http://pastebin.com/7eNubwxw>

Theano main <http://pastebin.com/Lvdn6UAc>
Dense layer with MSE loss function <http://pastebin.com/RyUH07Te>
Conv layer + max pooling <http://pastebin.com/VVmXm1Uk>
Updates rule <http://pastebin.com/fp5Draq7>


Keep in mind that the Theano code is mostly adapted from the Theano 
tutorial found on the website, and the update rules for Adam optimizer is 
adapted from Keras source code.
I have a large training set, so I usually check after a few epochs the 
behaviour of the code and this is what I see:
Keras Model: the validation error keeps decreasing, and already after two 
or three epochs I see a very good match between prediction and true values 
(points quite close around the bisector in the plot at the bottom of the 
code)
Vanilla Theano Impl: the validation decreases at first, but then some kind 
of oscillating/overfitting features appear (and the absolute value is 10 
times higher than in the Keras impl.), and the match between prediction and 
true values is worse (points quite spread around the bisector in the plot 
at the bottom of the code)

So, is someone able to tell me where is the difference between the Keras 
model and my Theano implementation? Since I'm new to the field, I would 
really like to understand what I'm doing wrong that so strongly affects the 
performance of the network.
Any help would be appreciated. Thanks

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Vanilla Theano replication of a Keras model performs bad. What am I missing?

Reply via email to