Re: [Scikit-learn-general] Array too big error on using DictVectorizer

2014-06-15 Thread Joel Nothman
Hi Awhan, Sparse support in random forest is currently under code review. You can pull in the branch locally. See https://github.com/scikit-learn/scikit-learn/pull/3173 On 15 June 2014 22:40, Awhan Patnaik wrote: > Hello all, > > 2 class classification problem. 13 features - mostly categorical

[Scikit-learn-general] Array too big error on using DictVectorizer

2014-06-15 Thread Awhan Patnaik
Hello all, 2 class classification problem. 13 features - mostly categorical. Some features have 2000, 700 etc different values. So a 1-of-N encoding transform expands the data set up to 4.5k features. Data has around 1.5 million samples. On trying to transform the data using DictVectorizer(sparse