The problem was that I had a loop like

for i in xrange(len(clf.feature_importances_)):
    print clf.feature_importances_[i]

which recomputes the feature importance array in every step.

Obvious in hindsight.

Raphael


On 21 July 2016 at 16:22, Raphael C <drr...@gmail.com> wrote:
> I have a set of feature vectors associated with binary class labels,
> each of which has about 40,000 features. I can train a random forest
> classifier in sklearn which works well. I would however like to see
> the most important features.
>
> I tried simply printing out forest.feature_importances_ but this takes
> about 1 second per feature making about 40,000 seconds overall. This
> is much much longer than the time needed to train the classifier in
> the first place?
>
> Is there a more efficient way to find out which features are most important?
>
> Raphael
>
> On 21 July 2016 at 15:58, Nelson Liu <nf...@uw.edu> wrote:
>> Hi,
>> If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the
>> maintainers don't have control over downtime and issues like the one you're
>> having). Can you connect to GitHub, or any site on GitHub Pages?
>>
>> Thanks
>> Nelson
>>
>> On Thu, Jul 21, 2016, 07:52 Rahul Ahuja <rahul.ah...@live.com> wrote:
>>>
>>> Hi there,
>>>
>>>
>>> Sklearn website has been down for couple of days. Please look into it.
>>>
>>>
>>> I reside in Pakistan, Karachi city.
>>>
>>>
>>>
>>>
>>>
>>>
>>> Kind regards,
>>> Rahul Ahuja
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to