2014-05-01 15:59 GMT+02:00 Ian Ozsvald <i...@ianozsvald.com>: > Hello. I'm looking at feature_extraction.dict_vectorizer and I'm > wondering why fit() and restrict() use a sorted list of feature names > rather than their naturally-encountered order? > > Is there an algorithmic requirement somewhere for sorted feature names?
Off the top of my head: no, it just makes the output easier to inspect and compare, so it could be made optional. > Reusing (and probably inheriting) the sklearn vectorizer would be > nice, rather than rolling a custom solution in numpy. If anyone's > curious, my best approach to resizing the csr array is via > http://stackoverflow.com/questions/6844998/is-there-an-efficient-way-of-concatenating-scipy-sparse-matrices/6853880#6853880 > which costs 10 seconds and a temporary +2GB overall. > (and if you have a better suggestion for growing a csr matrix, I'd > love to hear it) Inheriting from sklearn classes that are not marked public base classes or mixins is probably a bad idea. The final classes in the hierarchy aren't really designed with inheritance in mind. ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general