The importance should be based on some statistics, for example, the
standard deviation of the feature column and the magnitude of the
weight. If the columns are scaled to unit standard deviation (using
StandardScaler), you can tell the importance by the absolute value of
the weight. But there are o
sc.parallelize(model.weights.toArray, blocks).top(k) will get that right ?
For logistic you might want both positive and negative feature...so just
pass it through a filter on abs and then pick top(k)
On Thu, Sep 18, 2014 at 10:30 AM, Sameer Tilak wrote:
> Hi All,
>
> I am able to run LinearReg
Hi All,
I am able to run LinearRegressionWithSGD on a small sample dataset (~60MB
Libsvm file of sparse data) with 6700 features.
val model = LinearRegressionWithSGD.train(examples, numIterations)
At the end I get a model that
model.weights.sizeres6: Int = 6699
I am assuming each entry in the mo