Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

Shishir Pandey Thu, 25 Apr 2013 23:58:41 -0700

Even scikit-learn mentions on its stochastic gradient descent page: 
http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use
one should scale data. An example which shows what really happens to one 
cost function (say squared loss) on scaling the data would be great.


On 26-04-2013 04:31, [email protected] 
wrote:
> Date: Fri, 26 Apr 2013 02:37:27 +0530
> From: Shishir Pandey<[email protected]>
> Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
>       Gradient Descent
> To:[email protected]
> Message-ID:<[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> I did not mean parameters of the cost function. I only want to scale the
> input variables. Suppose one of the independent variables has a range
> from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and
> others say in their machine learning lectures that one should rescale
> the input data to bring all variables to similar range
> (http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100)
> . This will affect how the gradient descent will behave.
>
> We can choose cost function right now to be the squared loss function.
>
> On 26-04-2013 01:56,[email protected]  
> wrote:
>> >Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher
>> ><[email protected]>  Subject: Re: [Scikit-learn-general]
>> >Effects of shifting and scaling on Gradient Descent To:
>> >[email protected]  Message-ID:
>> ><cahcackjlawxii48q5ftvf8-9-m0bre_rdpn8z7lj6by0xvs...@mail.gmail.com>  
>> >Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean scaling
>> >the parameters of the cost function? If so, scaling will change the
>> >surface of the cost function, of course. It's kind of complicated to
>> >say anything about how the surface will behave, it completely depends
>> >of the cost function you are using. A cost function that is linear
>> >will have the same scale applied to the surface, but anything fancier
>> >will behave differently (squared sum, robust cost...) This also means
>> >that the gradient descent will be different ans may converge to a
>> >different location. As Ga?l said, this is a generic
>> >optimization-related question, it is not machine-learning related.
>> >Matthieu 2013/4/25 Shishir Pandey<[email protected]>
>>>> >> >Thanks Ronnie for pointing out the exact method in the scikit-learn
>>>> >> >library.  Yes, that is exactly what I was asking how does the rescaling
>>>> >> >of features affect the gradient descent algorithm. Since, stochastic
>>>> >> >gradient descent is an algorithm which is used in machine learning 
>>>> >> >quite
>>>> >> >a lot. It will be good to understand how its performance is affected
>>>> >> >after rescaling features.
>>>> >> >
>>>> >> >Jaques, I am having some trouble running the example. But yes it will 
>>>> >> >be
>>>> >> >good if we can have gui example.
>>>> >> >
>>>> >> >On 25-04-2013 19:12,[email protected]
>>>> >> >wrote:
>>>>>> >>> > >Date: Thu, 25 Apr 2013 09:10:35 -0400
>>>>>> >>> > >From: Ronnie Ghose<[email protected]>
>>>>>> >>> > >Subject: Re: [Scikit-learn-general] Effects of shifting and 
>>>>>> >>> > >scaling on
>>>>>> >>> > >       Gradient Descent
>>>>>> >>> > >To:[email protected]
>>>>>> >>> > >Message-ID:
>>>>>> >>> > >       <CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
>>>> >> >[email protected]>
>>>>>> >>> > >Content-Type: text/plain; charset="iso-8859-1"
>>>>>> >>> > >
>>>>>> >>> > >I think he means what increases/benefits do you get from rescaling
>>>> >> >features
>>>>>> >>> > >e.g. minmax or preprocessing.scale
>>>>>> >>> > >On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler wrote:
>>>>>>>>>>>>>> >>>>>>> > >>> > >I also think it will be great to have this 
>>>>>>>>>>>>>> >>>>>>> > >>> > >example on the website.
>>>>>>>>>> >>>>> > >> >Do you mean like an interactive example that works 
>>>>>>>>>> >>>>> > >> >similiar to the SVM
>>>>>>>>>> >>>>> > >> >Gui example , but for understand the effects shifting 
>>>>>>>>>> >>>>> > >> >and scaling of
>>>>>>>>>> >>>>> > >> >data has on the rate of convergence of gradient descent 
>>>>>>>>>> >>>>> > >> >and the surface
>>>>>>>>>> >>>>> > >> >of the cost function?
>>>>>> >>> > >This is out of scope for the project: scikit-learn is a machine 
>>>>>> >>> > >learning
>>>>>> >>> > >toolkit. Gradient descent is a general class of optimization 
>>>>>> >>> > >algorithms.
>>>>>> >>> > >
>>>>>> >>> > >Ga?l
>>>> >> >
>>>> >> >--
>>>> >> >sp
>>>> >> >
>>>> >> >
> -- sp

-- 
sp


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

Reply via email to