Hi Andy, I'll investigate the issue with an artificial dataset of comparable size - to be honest I suspect that we focused on speed at the cost of memory usage...
As a quick fix you could set `min_density=1` which will result in less memory copies at the cost of runtime. best, Peter 2012/1/3 Andreas <[email protected]>: > Hi Gilles. > Thanks! Will try that. > > Also thanks for working on the docs! :) > > Cheers, > Andy > > > On 01/03/2012 09:30 AM, Gilles Louppe wrote: >> Hi Andras, >> >> Try setting min_split=10 or higher. With a dataset of that size, there >> is no point in using min_split=1, you will 1) consume indeed too much >> memory and 2) overfit. >> >> Gilles >> >> PS: I have just started to change to doc. Expect a PR later today :) >> >> On 3 January 2012 09:27, Andreas<[email protected]> wrote: >> >>> Hi Brian. >>> The dataset itself is 60000 * 786 * 8 bytes (I converted from unit8 to >>> float which is 8 bytes in Numpy I guess) >>> which is ~ 360 MB (also I can load it ;). >>> I trained linear SVMs and Neural networks without much trouble. I >>> haven't really studied the >>> decision tree code (which I know you made quite an effort to optimize) >>> so I don't really >>> have an idea how the construction works. Maybe I just had a >>> misconception of the memory >>> usage of the algorithm. I just started playing with it. >>> >>> Thanks for any comments :) >>> >>> Cheers, >>> Andy >>> >>> >>> On 01/03/2012 09:06 AM, [email protected] wrote: >>> >>>> Hi Andy, >>>> >>>> IIRC MNIST is 60000 samples, each with dimension 28x28, so the 2GB limit >>>> doesn't seem unreasonable (especially since you don't have all of that at >>>> your disposal). Does the dataset fit in mem? >>>> >>>> Brian >>>> >>>> -----Original Message----- >>>> From: Andreas<[email protected]> >>>> Date: Tue, 03 Jan 2012 09:00:47 >>>> To:<[email protected]> >>>> Reply-To: [email protected] >>>> Subject: Re: [Scikit-learn-general] Question and comments on RandomForests >>>> >>>> One other question: >>>> I tried to run a forest on MNIST, that actually consisted of only one tree. >>>> That gave me a memory error. I only have 2gb ram in this machine >>>> (this is my desktop at IST Austria !?) which is obviously not that much. >>>> Still this kind of surprised me. Is it expected that a tree takes >>>> this "much" ram? Should I change "min_density"? >>>> >>>> Thanks :) >>>> >>>> Andy >>>> >>>> ------------------------------------------------------------------------------ >>>> Write once. Port to many. >>>> Get the SDK and tools to simplify cross-platform app development. Create >>>> new or port existing apps to sell to consumers worldwide. Explore the >>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>> http://p.sf.net/sfu/intel-appdev >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> ------------------------------------------------------------------------------ >>>> Write once. Port to many. >>>> Get the SDK and tools to simplify cross-platform app development. Create >>>> new or port existing apps to sell to consumers worldwide. Explore the >>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>> http://p.sf.net/sfu/intel-appdev >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>>> >>> >>> ------------------------------------------------------------------------------ >>> Write once. Port to many. >>> Get the SDK and tools to simplify cross-platform app development. Create >>> new or port existing apps to sell to consumers worldwide. Explore the >>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>> http://p.sf.net/sfu/intel-appdev >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> ------------------------------------------------------------------------------ >> Write once. Port to many. >> Get the SDK and tools to simplify cross-platform app development. Create >> new or port existing apps to sell to consumers worldwide. Explore the >> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >> http://p.sf.net/sfu/intel-appdev >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > ------------------------------------------------------------------------------ > Write once. Port to many. > Get the SDK and tools to simplify cross-platform app development. Create > new or port existing apps to sell to consumers worldwide. Explore the > Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join > http://p.sf.net/sfu/intel-appdev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Peter Prettenhofer ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
