Hi Awhan,
Sparse support in random forest is currently under code review. You can
pull in the branch locally. See
https://github.com/scikit-learn/scikit-learn/pull/3173
On 15 June 2014 22:40, Awhan Patnaik <[email protected]> wrote:
> Hello all,
>
> 2 class classification problem. 13 features - mostly categorical. Some
> features have 2000, 700 etc different values. So a 1-of-N encoding
> transform expands the data set up to 4.5k features. Data has around
> 1.5 million samples.
>
> On trying to transform the data using DictVectorizer(sparse=False) I
> get a ValueError: array is too big. If I omit the sparse=False option
> I get a scipy sparse matrix which the fit() method of
> RandomForestClassifier does not accept. Also a .toarray() method does
> not work as that too results in a huge array.
>
> What is the way out of this?
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general