Hi Andras, Try setting min_split=10 or higher. With a dataset of that size, there is no point in using min_split=1, you will 1) consume indeed too much memory and 2) overfit.
Gilles PS: I have just started to change to doc. Expect a PR later today :) On 3 January 2012 09:27, Andreas <[email protected]> wrote: > Hi Brian. > The dataset itself is 60000 * 786 * 8 bytes (I converted from unit8 to > float which is 8 bytes in Numpy I guess) > which is ~ 360 MB (also I can load it ;). > I trained linear SVMs and Neural networks without much trouble. I > haven't really studied the > decision tree code (which I know you made quite an effort to optimize) > so I don't really > have an idea how the construction works. Maybe I just had a > misconception of the memory > usage of the algorithm. I just started playing with it. > > Thanks for any comments :) > > Cheers, > Andy > > > On 01/03/2012 09:06 AM, [email protected] wrote: >> Hi Andy, >> >> IIRC MNIST is 60000 samples, each with dimension 28x28, so the 2GB limit >> doesn't seem unreasonable (especially since you don't have all of that at >> your disposal). Does the dataset fit in mem? >> >> Brian >> >> -----Original Message----- >> From: Andreas<[email protected]> >> Date: Tue, 03 Jan 2012 09:00:47 >> To:<[email protected]> >> Reply-To: [email protected] >> Subject: Re: [Scikit-learn-general] Question and comments on RandomForests >> >> One other question: >> I tried to run a forest on MNIST, that actually consisted of only one tree. >> That gave me a memory error. I only have 2gb ram in this machine >> (this is my desktop at IST Austria !?) which is obviously not that much. >> Still this kind of surprised me. Is it expected that a tree takes >> this "much" ram? Should I change "min_density"? >> >> Thanks :) >> >> Andy >> >> ------------------------------------------------------------------------------ >> Write once. Port to many. >> Get the SDK and tools to simplify cross-platform app development. Create >> new or port existing apps to sell to consumers worldwide. Explore the >> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >> http://p.sf.net/sfu/intel-appdev >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> ------------------------------------------------------------------------------ >> Write once. Port to many. >> Get the SDK and tools to simplify cross-platform app development. Create >> new or port existing apps to sell to consumers worldwide. Explore the >> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >> http://p.sf.net/sfu/intel-appdev >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > ------------------------------------------------------------------------------ > Write once. Port to many. > Get the SDK and tools to simplify cross-platform app development. Create > new or port existing apps to sell to consumers worldwide. Explore the > Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join > http://p.sf.net/sfu/intel-appdev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
