Re: [Scikit-learn-general] Why sorted feature_names_ in dict_vectorizer.fit?

Lars Buitinck Thu, 01 May 2014 07:20:31 -0700

2014-05-01 15:59 GMT+02:00 Ian Ozsvald <i...@ianozsvald.com>:
> Hello. I'm looking at feature_extraction.dict_vectorizer and I'm
> wondering why fit() and restrict() use a sorted list of feature names
> rather than their naturally-encountered order?
>
> Is there an algorithmic requirement somewhere for sorted feature names?


Off the top of my head: no, it just makes the output easier to inspect
and compare, so it could be made optional.

> Reusing (and probably inheriting) the sklearn vectorizer would be
> nice, rather than rolling a custom solution in numpy. If anyone's
> curious, my best approach to resizing the csr array is via
> http://stackoverflow.com/questions/6844998/is-there-an-efficient-way-of-concatenating-scipy-sparse-matrices/6853880#6853880
> which costs 10 seconds and a temporary +2GB overall.
> (and if you have a better suggestion for growing a csr matrix, I'd
> love to hear it)

Inheriting from sklearn classes that are not marked public base
classes or mixins is probably a bad idea. The final classes in the
hierarchy aren't really designed with inheritance in mind.

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Why *sorted* feature_names_ in dict_vectorizer.fit?

Reply via email to

Re: [Scikit-learn-general] Why sorted feature_names_ in dict_vectorizer.fit?