Re: [Scikit-learn-general] multilayer perceptron questions

David Marek Thu, 31 May 2012 17:50:08 -0700

Hi,

I don't have much time these days because I have got exams in school. I am
sorry I haven't informed you.


I have implemented a multi class cross entropy and soft max function and
turned off some of the cython checks, the result is that the cython
implementation is only slightly better, I guess that's because I am using
objects as an output functions, I will have to benchmark them to know more.

The next step is to test that the gradient descent is working correctly. I
am a little unsure how to approach this. One thing I will do is to compute
one step of backpropagation by hand and check that the implementation is
doing the same. Another thing I will try to do is to compute the gradients
numerically, I am not exactly sure if its enough to use the derivative from
scipy and apply it on the forward step.

David
Dne 31.5.2012 21:02 "Andreas Mueller" <[email protected]> napsal(a):

> Hey David.
> How is it going?
> I haven't heard from you in a while.
> Did you blog anything about your progress?
>
> Cheers,
> Andy
>
> Am 16.05.2012 12:15, schrieb David Marek:
> > On Tue, May 15, 2012 at 4:59 PM, David Warde-Farley
> > <[email protected]>  wrote:
> >> On Tue, May 15, 2012 at 12:12:34AM +0200, David Marek wrote:
> >>> Hi,
> >>>
> >>> I have worked on multilayer perceptron and I've got a basic
> >>> implementation working. You can see it at
> >>> https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp The most
> >>> important part is the sgd implementation, which can be found here
> >>>
> https://github.com/davidmarek/scikit-learn/blob/gsoc_mlp/sklearn/mlp/mlp_fast.pyx
> >>>
> >>> I have encountered a few problems and I would like to know your
> opinion.
> >>>
> >>> 1) There are classes like SequentialDataset and WeightVector which are
> >>> used in sgd for linear_model, but I am not sure if I should use them
> >>> here as well. I have to do more with samples and weights than just
> >>> multiply and add them together. I wouldn't be able to use numpy
> >>> functions like tanh and do batch updates, would I? What do you think?
> >> I haven't had a look at these classes myself but I think working with
> raw
> >> NumPy arrays is a better idea in terms of efficiency.
> >>
> >>> Am I missing something that would help me do everything I need with
> >>> SequentialDataset? I implemented my own LossFunction because I need a
> >>> vectorized version, I think that is the same problem.
> >>>
> >>> 2) I used Andreas' implementation as an inspiration and I am not sure
> >>> I understand some parts of it:
> >>>   * Shouldn't the bias vector be initialized with ones instead of
> >>> zeros? I guess there is no difference.
> >> If the training set is mean-centered, then absolutely, yes.
> >>
> >> Otherwise the biases should in the hidden layer should be initialized to
> >> the mean over the training set of -Wx, where W are the initial weights.
> >> This ensures that the activation function is near its linear regime.
> > Ok, the rule of thumb is that the bias should be initialized so the
> > activation function starts in linear regime.
> >
> >>>   * I am not sure why is the bias updated with:
> >>>     bias_output += lr * np.mean(delta_o, axis=0)
> >>>     shouldn't it be:
> >>>     bias_output += lr / batch_size * np.mean(delta_o, axis=0)?
> >> As Andy said, the former allows you to set the learning rate without
> taking
> >> into account the batch size, which makes things a little simpler.
> > I see, it's pretty obvious when I look at it now.
> >
> >>>   * Shouldn't the backward step for computing delta_h be:
> >>>     delta_h[:] = np.dot(delta_o, weights_output.T) *
> hidden.doutput(x_hidden)
> >>>     where hidden.doutput is a derivation of the activation function for
> >>> hidden layer?
> >> Offhand that sounds right. You can use Theano as a sanity check for your
> >> implementation.
> > Thank you David and Andreas for answering my questions. I will look at
> Theano.
> >
> > David
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] multilayer perceptron questions

Reply via email to