Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

Peter Prettenhofer Fri, 26 Apr 2013 00:45:09 -0700

(first-order) GD uses a single learning rate for all features - if features
have a different variability its difficult to find a one-size-fits-all
learning rate - the parameters of high variability features will tend
to oscillate whereas the parameters of low variability features will
converge too slowly.


There is a huge amount of literature on the topic - the Neural Network FAQ
[1] is a good (practical) starting point.

[1] ftp://ftp.sas.com/pub/neural/FAQ2.html#A_std


2013/4/26 Ronnie Ghose <[email protected]>

> afaik fits tend to work better and so do classifiers. it's much easier to
> have a classifier try to fit between -1 and 1 then 0 and 10000 so it also
> helps convergence.
>
>
> http://stats.stackexchange.com/questions/41704/how-and-why-do-normalization-and-feature-scaling-work
> and then
>
> http://en.wikipedia.org/wiki/Feature_scaling
>
>
> On Fri, Apr 26, 2013 at 2:57 AM, Shishir Pandey <[email protected]>wrote:
>
>> Even scikit-learn mentions on its stochastic gradient descent page:
>> http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use
>> one should scale data. An example which shows what really happens to one
>> cost function (say squared loss) on scaling the data would be great.
>>
>> On 26-04-2013 04:31, [email protected]
>> wrote:
>> > Date: Fri, 26 Apr 2013 02:37:27 +0530
>> > From: Shishir Pandey<[email protected]>
>> > Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
>> >       Gradient Descent
>> > To:[email protected]
>> > Message-ID:<[email protected]>
>> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>> >
>> > I did not mean parameters of the cost function. I only want to scale the
>> > input variables. Suppose one of the independent variables has a range
>> > from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and
>> > others say in their machine learning lectures that one should rescale
>> > the input data to bring all variables to similar range
>> > (
>> http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100
>> )
>> > . This will affect how the gradient descent will behave.
>> >
>> > We can choose cost function right now to be the squared loss function.
>> >
>> > On 26-04-2013 01:56,[email protected]
>> > wrote:
>> >> >Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher
>> >> ><[email protected]>  Subject: Re: [Scikit-learn-general]
>> >> >Effects of shifting and scaling on Gradient Descent To:
>> >> >[email protected]  Message-ID:
>> >> ><cahcackjlawxii48q5ftvf8-9-m0bre_rdpn8z7lj6by0xvs...@mail.gmail.com>
>> >> >Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean scaling
>> >> >the parameters of the cost function? If so, scaling will change the
>> >> >surface of the cost function, of course. It's kind of complicated to
>> >> >say anything about how the surface will behave, it completely depends
>> >> >of the cost function you are using. A cost function that is linear
>> >> >will have the same scale applied to the surface, but anything fancier
>> >> >will behave differently (squared sum, robust cost...) This also means
>> >> >that the gradient descent will be different ans may converge to a
>> >> >different location. As Ga?l said, this is a generic
>> >> >optimization-related question, it is not machine-learning related.
>> >> >Matthieu 2013/4/25 Shishir Pandey<[email protected]>
>> >>>> >> >Thanks Ronnie for pointing out the exact method in the
>> scikit-learn
>> >>>> >> >library.  Yes, that is exactly what I was asking how does the
>> rescaling
>> >>>> >> >of features affect the gradient descent algorithm. Since,
>> stochastic
>> >>>> >> >gradient descent is an algorithm which is used in machine
>> learning quite
>> >>>> >> >a lot. It will be good to understand how its performance is
>> affected
>> >>>> >> >after rescaling features.
>> >>>> >> >
>> >>>> >> >Jaques, I am having some trouble running the example. But yes it
>> will be
>> >>>> >> >good if we can have gui example.
>> >>>> >> >
>> >>>> >> >On 25-04-2013 19:12,
>> [email protected]
>> >>>> >> >wrote:
>> >>>>>> >>> > >Date: Thu, 25 Apr 2013 09:10:35 -0400
>> >>>>>> >>> > >From: Ronnie Ghose<[email protected]>
>> >>>>>> >>> > >Subject: Re: [Scikit-learn-general] Effects of shifting and
>> scaling on
>> >>>>>> >>> > >       Gradient Descent
>> >>>>>> >>> > >To:[email protected]
>> >>>>>> >>> > >Message-ID:
>> >>>>>> >>> > >       <CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
>> >>>> >> >[email protected]>
>> >>>>>> >>> > >Content-Type: text/plain; charset="iso-8859-1"
>> >>>>>> >>> > >
>> >>>>>> >>> > >I think he means what increases/benefits do you get from
>> rescaling
>> >>>> >> >features
>> >>>>>> >>> > >e.g. minmax or preprocessing.scale
>> >>>>>> >>> > >On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler
>> wrote:
>> >>>>>>>>>>>>>> >>>>>>> > >>> > >I also think it will be great to have
>> this example on the website.
>> >>>>>>>>>> >>>>> > >> >Do you mean like an interactive example that works
>> similiar to the SVM
>> >>>>>>>>>> >>>>> > >> >Gui example , but for understand the effects
>> shifting and scaling of
>> >>>>>>>>>> >>>>> > >> >data has on the rate of convergence of gradient
>> descent and the surface
>> >>>>>>>>>> >>>>> > >> >of the cost function?
>> >>>>>> >>> > >This is out of scope for the project: scikit-learn is a
>> machine learning
>> >>>>>> >>> > >toolkit. Gradient descent is a general class of
>> optimization algorithms.
>> >>>>>> >>> > >
>> >>>>>> >>> > >Ga?l
>> >>>> >> >
>> >>>> >> >--
>> >>>> >> >sp
>> >>>> >> >
>> >>>> >> >
>> > -- sp
>>
>> --
>> sp
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Try New Relic Now & We'll Send You this Cool Shirt
>> New Relic is the only SaaS-based application performance monitoring
>> service
>> that delivers powerful full stack analytics. Optimize and monitor your
>> browser, app, & servers with just a few lines of code. Try New Relic
>> and get this awesome Nerd Life shirt!
>> http://p.sf.net/sfu/newrelic_d2d_apr
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Try New Relic Now & We'll Send You this Cool Shirt
> New Relic is the only SaaS-based application performance monitoring service
> that delivers powerful full stack analytics. Optimize and monitor your
> browser, app, & servers with just a few lines of code. Try New Relic
> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

Reply via email to