2012/9/7 Lars Buitinck <[email protected]>:
> 2012/9/7 Olivier Grisel <[email protected]>:
>> Maybe the default feature extraction has changed and made the matrix
>> much denser that it used to be for this example? Although recent
>> changes to the vectorizer would tend to decrease the number of
>> features (min_df=2) hence make the problem smaller to solve.
>
> What would be the vectorizer settings to make it work? With min_df=1,
> max_df=1.0 the ridge classifier still wants all my RAM.

Maybe Vectorizer has changed in other ways between 0.11 and 0.12. Best
thing to do would be to vectorize 20 newsgroups with default
parameters (as in the example) in each version and compare the `shape`
and `data.nbytes` for each resulting coo matrix.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to