On Tue, May 15, 2012 at 12:12:34AM +0200, David Marek wrote:
> Hi,
> 
> I have worked on multilayer perceptron and I've got a basic
> implementation working. You can see it at
> https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp The most
> important part is the sgd implementation, which can be found here
> https://github.com/davidmarek/scikit-learn/blob/gsoc_mlp/sklearn/mlp/mlp_fast.pyx
> 
> I have encountered a few problems and I would like to know your opinion.
> 
> 1) There are classes like SequentialDataset and WeightVector which are
> used in sgd for linear_model, but I am not sure if I should use them
> here as well. I have to do more with samples and weights than just
> multiply and add them together. I wouldn't be able to use numpy
> functions like tanh and do batch updates, would I? What do you think?

I haven't had a look at these classes myself but I think working with raw
NumPy arrays is a better idea in terms of efficiency.

> Am I missing something that would help me do everything I need with
> SequentialDataset? I implemented my own LossFunction because I need a
> vectorized version, I think that is the same problem.
> 
> 2) I used Andreas' implementation as an inspiration and I am not sure
> I understand some parts of it:
>  * Shouldn't the bias vector be initialized with ones instead of
> zeros? I guess there is no difference.

If the training set is mean-centered, then absolutely, yes.

Otherwise the biases should in the hidden layer should be initialized to 
the mean over the training set of -Wx, where W are the initial weights.
This ensures that the activation function is near its linear regime.

>  * I am not sure why is the bias updated with:
>    bias_output += lr * np.mean(delta_o, axis=0)
>    shouldn't it be:
>    bias_output += lr / batch_size * np.mean(delta_o, axis=0)?

As Andy said, the former allows you to set the learning rate without taking
into account the batch size, which makes things a little simpler.

>  * Shouldn't the backward step for computing delta_h be:
>    delta_h[:] = np.dot(delta_o, weights_output.T) * hidden.doutput(x_hidden)
>    where hidden.doutput is a derivation of the activation function for
> hidden layer?

Offhand that sounds right. You can use Theano as a sanity check for your
implementation.

David

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to