Re: MLLib regression model weights

Xiangrui Meng Thu, 18 Sep 2014 13:18:57 -0700

The importance should be based on some statistics, for example, the
standard deviation of the feature column and the magnitude of the
weight. If the columns are scaled to unit standard deviation (using
StandardScaler), you can tell the importance by the absolute value of
the weight. But there are other statistics for feature importance. It
would be great if you are interested in working on this. -Xiangrui


On Thu, Sep 18, 2014 at 12:17 PM, Debasish Das <debasish.da...@gmail.com> wrote:
> sc.parallelize(model.weights.toArray, blocks).top(k) will get that right ?
>
> For logistic you might want both positive and negative feature...so just
> pass it through a filter on abs and then pick top(k)
>
>
> On Thu, Sep 18, 2014 at 10:30 AM, Sameer Tilak <ssti...@live.com> wrote:
>>
>> Hi All,
>>
>> I am able to run LinearRegressionWithSGD on a small sample dataset (~60MB
>> Libsvm file of sparse data) with 6700 features.
>>
>> val model = LinearRegressionWithSGD.train(examples, numIterations)
>>
>> At the end I get a model that
>>
>> model.weights.size
>> res6: Int = 6699
>>
>> I am assuming each entry in the model is weight for the corresponding
>> feature/index.  However,, if I want to get the top10 most important features
>> or all features with weights higher than certain threshold, is that
>> functionality available out-of-box? I can implement that on my own, but
>> seems like a common feature that most of the people will need when they are
>> working on high-dimensional dataset.
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: MLLib regression model weights

Reply via email to