Re: [Scikit-learn-general] Array too big error on using DictVectorizer

Joel Nothman Sun, 15 Jun 2014 06:10:27 -0700

Hi Awhan,

Sparse support in random forest is currently under code review. You can
pull in the branch locally. See
https://github.com/scikit-learn/scikit-learn/pull/3173



On 15 June 2014 22:40, Awhan Patnaik <[email protected]> wrote:

> Hello all,
>
> 2 class classification problem. 13 features - mostly categorical. Some
> features have 2000, 700 etc different values. So a 1-of-N encoding
> transform expands the data set up to 4.5k features. Data has around
> 1.5 million samples.
>
> On trying to transform the data using DictVectorizer(sparse=False) I
> get a ValueError: array is too big. If I omit the sparse=False option
> I get a scipy sparse matrix which the fit() method of
> RandomForestClassifier does not accept. Also a .toarray() method does
> not work as that too results in a huge array.
>
> What is the way out of this?
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Array too big error on using DictVectorizer

Reply via email to