The vectorized input with the same training data set differs with versions
0.13.1
and 0.14-git.
For:
vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2),
smooth_idf=True, sublinear_tf=True, max_df=0.5,
token_pattern=ur'\b(?!\d)\w\w+\b'))
On fit_transform the shape of the input data
- with version 0.13.1 is (12440, 1270712)
- with version 0.14-git is (12440, 484762)
I do not change code and run the same on two different machines in parallel,
apart from the number of features the size of the classifier goes from 8.4 to
26G, but I guess that is due to the number of features. Does this seem correct?
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general