Re: LinearRegressionWithSGD and Rank Features By Importance

Robin East Mon, 07 Nov 2016 08:21:11 -0800

If you have to use SGD then scaling will usually help your algorithm to 
converge quicker. If possible you should try using Linear Regression in the 
newer ml library: 
http://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression



-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 
<http://www.manning.com/books/spark-graphx-in-action>





> On 7 Nov 2016, at 15:47, Carlo.Allocca <carlo.allo...@open.ac.uk> wrote:
> 
> Hi Masood, 
> 
> thank you very much for the reply. It is very a good point as I am getting 
> very bed result so far. 
> 
> If I understood well what you suggest is to scale the date below (it is part 
> of my dataset) before applying linear regression SGD.
> 
> is it correct?
> 
> Many Thanks in advance. 
> 
> Best Regards,
> Carlo 
> 
> <Screen Shot 2016-11-07 at 15.44.51.png>
> 
>> On 7 Nov 2016, at 15:31, Masood Krohy <masood.kr...@intact.net 
>> <mailto:masood.kr...@intact.net>> wrote:
>> 
>> If you go down this route (look at actual coefficients/weights), then make 
>> sure your features are scaled first and have more or less the same mean when 
>> feeding them into the algo. If not, then actual coefficients/weights 
>> wouldn't tell you much. In any case, SGD performs badly with unscaled 
>> features, so you gain if you scale the features beforehand.
>> Masood 
>> 
>> ------------------------------
>> Masood Krohy, Ph.D. 
>> Data Scientist, Intact Lab-R&D 
>> Intact Financial Corporation 
>> http://ca.linkedin.com/in/masoodkh <http://ca.linkedin.com/in/masoodkh> 
>> 
>> 
>> 
>> De :        Carlo.Allocca <carlo.allo...@open.ac.uk 
>> <mailto:carlo.allo...@open.ac.uk>> 
>> A :        Mohit Jaggi <mohitja...@gmail.com <mailto:mohitja...@gmail.com>> 
>> Cc :        Carlo.Allocca <carlo.allo...@open.ac.uk 
>> <mailto:carlo.allo...@open.ac.uk>>, "user@spark.apache.org 
>> <mailto:user@spark.apache.org>" <user@spark.apache.org 
>> <mailto:user@spark.apache.org>> 
>> Date :        2016-11-04 03:39 
>> Objet :        Re: LinearRegressionWithSGD and Rank Features By Importance 
>> 
>> 
>> 
>> Hi Mohit, 
>> 
>> Thank you for your reply. 
>> OK. it means coefficient with high score are more important that other with 
>> low score…
>> 
>> Many Thanks,
>> Best Regards,
>> Carlo
>> 
>> 
>> > On 3 Nov 2016, at 20:41, Mohit Jaggi <mohitja...@gmail.com 
>> > <mailto:mohitja...@gmail.com>> wrote:
>> > 
>> > For linear regression, it should be fairly easy. Just sort the 
>> > co-efficients :)
>> > 
>> > Mohit Jaggi
>> > Founder,
>> > Data Orchard LLC
>> > www.dataorchardllc.com <x-msg://61/www.dataorchardllc.com>
>> > 
>> > 
>> > 
>> > 
>> >> On Nov 3, 2016, at 3:35 AM, Carlo.Allocca <carlo.allo...@open.ac.uk 
>> >> <mailto:carlo.allo...@open.ac.uk>> wrote:
>> >> 
>> >> Hi All,
>> >> 
>> >> I am using SPARK and in particular the MLib library.
>> >> 
>> >> import org.apache.spark.mllib.regression.LabeledPoint;
>> >> import org.apache.spark.mllib.regression.LinearRegressionModel;
>> >> import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
>> >> 
>> >> For my problem I am using the LinearRegressionWithSGD and I would like to 
>> >> perform a “Rank Features By Importance”.
>> >> 
>> >> I checked the documentation and it seems that does not provide such 
>> >> methods.
>> >> 
>> >> Am I missing anything?  Please, could you provide any help on this?
>> >> Should I change the approach?
>> >> 
>> >> Many Thanks in advance,
>> >> 
>> >> Best Regards,
>> >> Carlo
>> >> 
>> >> 
>> >> -- The Open University is incorporated by Royal Charter (RC 000391), an 
>> >> exempt charity in England & Wales and a charity registered in Scotland 
>> >> (SC 038302). The Open University is authorised and regulated by the 
>> >> Financial Conduct Authority.
>> >> 
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>> >> <mailto:user-unsubscr...@spark.apache.org>
>> >> 
>> > 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>> <mailto:user-unsubscr...@spark.apache.org>
>> 
>> 
>> 
>

Re: LinearRegressionWithSGD and Rank Features By Importance

Reply via email to