[Scikit-learn-general] Vectorizing input

Ark Wed, 13 Mar 2013 20:12:05 -0700

The vectorized input with the same training data set differs with versions 
0.13.1
 and 0.14-git.


For: 
vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2),
             smooth_idf=True, sublinear_tf=True, max_df=0.5,
             token_pattern=ur'\b(?!\d)\w\w+\b'))

On fit_transform the shape of the input data
- with version 0.13.1 is (12440, 1270712)
- with version 0.14-git is (12440, 484762)

I do not change code and run the same on two different machines in parallel,
 apart from the number of features the size of the classifier goes from 8.4 to
 26G, but I guess that is due to the number of features. Does this seem correct?



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Vectorizing input

Reply via email to