Even scikit-learn mentions on its stochastic gradient descent page: http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use one should scale data. An example which shows what really happens to one cost function (say squared loss) on scaling the data would be great.
On 26-04-2013 04:31, [email protected] wrote: > Date: Fri, 26 Apr 2013 02:37:27 +0530 > From: Shishir Pandey<[email protected]> > Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on > Gradient Descent > To:[email protected] > Message-ID:<[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > I did not mean parameters of the cost function. I only want to scale the > input variables. Suppose one of the independent variables has a range > from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and > others say in their machine learning lectures that one should rescale > the input data to bring all variables to similar range > (http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100) > . This will affect how the gradient descent will behave. > > We can choose cost function right now to be the squared loss function. > > On 26-04-2013 01:56,[email protected] > wrote: >> >Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher >> ><[email protected]> Subject: Re: [Scikit-learn-general] >> >Effects of shifting and scaling on Gradient Descent To: >> >[email protected] Message-ID: >> ><cahcackjlawxii48q5ftvf8-9-m0bre_rdpn8z7lj6by0xvs...@mail.gmail.com> >> >Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean scaling >> >the parameters of the cost function? If so, scaling will change the >> >surface of the cost function, of course. It's kind of complicated to >> >say anything about how the surface will behave, it completely depends >> >of the cost function you are using. A cost function that is linear >> >will have the same scale applied to the surface, but anything fancier >> >will behave differently (squared sum, robust cost...) This also means >> >that the gradient descent will be different ans may converge to a >> >different location. As Ga?l said, this is a generic >> >optimization-related question, it is not machine-learning related. >> >Matthieu 2013/4/25 Shishir Pandey<[email protected]> >>>> >> >Thanks Ronnie for pointing out the exact method in the scikit-learn >>>> >> >library. Yes, that is exactly what I was asking how does the rescaling >>>> >> >of features affect the gradient descent algorithm. Since, stochastic >>>> >> >gradient descent is an algorithm which is used in machine learning >>>> >> >quite >>>> >> >a lot. It will be good to understand how its performance is affected >>>> >> >after rescaling features. >>>> >> > >>>> >> >Jaques, I am having some trouble running the example. But yes it will >>>> >> >be >>>> >> >good if we can have gui example. >>>> >> > >>>> >> >On 25-04-2013 19:12,[email protected] >>>> >> >wrote: >>>>>> >>> > >Date: Thu, 25 Apr 2013 09:10:35 -0400 >>>>>> >>> > >From: Ronnie Ghose<[email protected]> >>>>>> >>> > >Subject: Re: [Scikit-learn-general] Effects of shifting and >>>>>> >>> > >scaling on >>>>>> >>> > > Gradient Descent >>>>>> >>> > >To:[email protected] >>>>>> >>> > >Message-ID: >>>>>> >>> > > <CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C= >>>> >> >[email protected]> >>>>>> >>> > >Content-Type: text/plain; charset="iso-8859-1" >>>>>> >>> > > >>>>>> >>> > >I think he means what increases/benefits do you get from rescaling >>>> >> >features >>>>>> >>> > >e.g. minmax or preprocessing.scale >>>>>> >>> > >On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler wrote: >>>>>>>>>>>>>> >>>>>>> > >>> > >I also think it will be great to have this >>>>>>>>>>>>>> >>>>>>> > >>> > >example on the website. >>>>>>>>>> >>>>> > >> >Do you mean like an interactive example that works >>>>>>>>>> >>>>> > >> >similiar to the SVM >>>>>>>>>> >>>>> > >> >Gui example , but for understand the effects shifting >>>>>>>>>> >>>>> > >> >and scaling of >>>>>>>>>> >>>>> > >> >data has on the rate of convergence of gradient descent >>>>>>>>>> >>>>> > >> >and the surface >>>>>>>>>> >>>>> > >> >of the cost function? >>>>>> >>> > >This is out of scope for the project: scikit-learn is a machine >>>>>> >>> > >learning >>>>>> >>> > >toolkit. Gradient descent is a general class of optimization >>>>>> >>> > >algorithms. >>>>>> >>> > > >>>>>> >>> > >Ga?l >>>> >> > >>>> >> >-- >>>> >> >sp >>>> >> > >>>> >> > > -- sp -- sp ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
