If you have to use SGD then scaling will usually help your algorithm to converge quicker. If possible you should try using Linear Regression in the newer ml library: http://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression
------------------------------------------------------------------------------- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action> > On 7 Nov 2016, at 15:47, Carlo.Allocca <carlo.allo...@open.ac.uk> wrote: > > Hi Masood, > > thank you very much for the reply. It is very a good point as I am getting > very bed result so far. > > If I understood well what you suggest is to scale the date below (it is part > of my dataset) before applying linear regression SGD. > > is it correct? > > Many Thanks in advance. > > Best Regards, > Carlo > > <Screen Shot 2016-11-07 at 15.44.51.png> > >> On 7 Nov 2016, at 15:31, Masood Krohy <masood.kr...@intact.net >> <mailto:masood.kr...@intact.net>> wrote: >> >> If you go down this route (look at actual coefficients/weights), then make >> sure your features are scaled first and have more or less the same mean when >> feeding them into the algo. If not, then actual coefficients/weights >> wouldn't tell you much. In any case, SGD performs badly with unscaled >> features, so you gain if you scale the features beforehand. >> Masood >> >> ------------------------------ >> Masood Krohy, Ph.D. >> Data Scientist, Intact Lab-R&D >> Intact Financial Corporation >> http://ca.linkedin.com/in/masoodkh <http://ca.linkedin.com/in/masoodkh> >> >> >> >> De : Carlo.Allocca <carlo.allo...@open.ac.uk >> <mailto:carlo.allo...@open.ac.uk>> >> A : Mohit Jaggi <mohitja...@gmail.com <mailto:mohitja...@gmail.com>> >> Cc : Carlo.Allocca <carlo.allo...@open.ac.uk >> <mailto:carlo.allo...@open.ac.uk>>, "user@spark.apache.org >> <mailto:user@spark.apache.org>" <user@spark.apache.org >> <mailto:user@spark.apache.org>> >> Date : 2016-11-04 03:39 >> Objet : Re: LinearRegressionWithSGD and Rank Features By Importance >> >> >> >> Hi Mohit, >> >> Thank you for your reply. >> OK. it means coefficient with high score are more important that other with >> low score… >> >> Many Thanks, >> Best Regards, >> Carlo >> >> >> > On 3 Nov 2016, at 20:41, Mohit Jaggi <mohitja...@gmail.com >> > <mailto:mohitja...@gmail.com>> wrote: >> > >> > For linear regression, it should be fairly easy. Just sort the >> > co-efficients :) >> > >> > Mohit Jaggi >> > Founder, >> > Data Orchard LLC >> > www.dataorchardllc.com <x-msg://61/www.dataorchardllc.com> >> > >> > >> > >> > >> >> On Nov 3, 2016, at 3:35 AM, Carlo.Allocca <carlo.allo...@open.ac.uk >> >> <mailto:carlo.allo...@open.ac.uk>> wrote: >> >> >> >> Hi All, >> >> >> >> I am using SPARK and in particular the MLib library. >> >> >> >> import org.apache.spark.mllib.regression.LabeledPoint; >> >> import org.apache.spark.mllib.regression.LinearRegressionModel; >> >> import org.apache.spark.mllib.regression.LinearRegressionWithSGD; >> >> >> >> For my problem I am using the LinearRegressionWithSGD and I would like to >> >> perform a “Rank Features By Importance”. >> >> >> >> I checked the documentation and it seems that does not provide such >> >> methods. >> >> >> >> Am I missing anything? Please, could you provide any help on this? >> >> Should I change the approach? >> >> >> >> Many Thanks in advance, >> >> >> >> Best Regards, >> >> Carlo >> >> >> >> >> >> -- The Open University is incorporated by Royal Charter (RC 000391), an >> >> exempt charity in England & Wales and a charity registered in Scotland >> >> (SC 038302). The Open University is authorised and regulated by the >> >> Financial Conduct Authority. >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> <mailto:user-unsubscr...@spark.apache.org> >> >> >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> >> >> >