Hi, I just checked DecisionTreeClassifier - it basically requires the same amout of memory for its internal data structures (= `X_sorted` which is also 60.000 x 786 x 4 bytes). I haven't checked RandomForest but you have to make sure that joblib does not fork a new process. If so, the new process will have the same memory footprint as the parent process (which is 2x the input size because X_sorted is precomputed). Furthermore, because of pythons memory management I assume that the data will be copied once more due to copy on write (actually, we dont write X or X_sorted but we increment their reference counts which should be enough to trigger a copy).
best 2012/1/3 Gilles Louppe <[email protected]>: > Note also that when using bootstrap=True, copies of X have to be > created for each tree. > > But this should work anyway since you only build 1 tree... Hmmm. > > Gilles > > On 3 January 2012 09:41, Peter Prettenhofer > <[email protected]> wrote: >> Hi Andy, >> >> I'll investigate the issue with an artificial dataset of comparable >> size - to be honest I suspect that we focused on speed at the cost of >> memory usage... >> >> As a quick fix you could set `min_density=1` which will result in less >> memory copies at the cost of runtime. >> >> best, >> Peter >> >> 2012/1/3 Andreas <[email protected]>: >>> Hi Gilles. >>> Thanks! Will try that. >>> >>> Also thanks for working on the docs! :) >>> >>> Cheers, >>> Andy >>> >>> >>> On 01/03/2012 09:30 AM, Gilles Louppe wrote: >>>> Hi Andras, >>>> >>>> Try setting min_split=10 or higher. With a dataset of that size, there >>>> is no point in using min_split=1, you will 1) consume indeed too much >>>> memory and 2) overfit. >>>> >>>> Gilles >>>> >>>> PS: I have just started to change to doc. Expect a PR later today :) >>>> >>>> On 3 January 2012 09:27, Andreas<[email protected]> wrote: >>>> >>>>> Hi Brian. >>>>> The dataset itself is 60000 * 786 * 8 bytes (I converted from unit8 to >>>>> float which is 8 bytes in Numpy I guess) >>>>> which is ~ 360 MB (also I can load it ;). >>>>> I trained linear SVMs and Neural networks without much trouble. I >>>>> haven't really studied the >>>>> decision tree code (which I know you made quite an effort to optimize) >>>>> so I don't really >>>>> have an idea how the construction works. Maybe I just had a >>>>> misconception of the memory >>>>> usage of the algorithm. I just started playing with it. >>>>> >>>>> Thanks for any comments :) >>>>> >>>>> Cheers, >>>>> Andy >>>>> >>>>> >>>>> On 01/03/2012 09:06 AM, [email protected] wrote: >>>>> >>>>>> Hi Andy, >>>>>> >>>>>> IIRC MNIST is 60000 samples, each with dimension 28x28, so the 2GB limit >>>>>> doesn't seem unreasonable (especially since you don't have all of that >>>>>> at your disposal). Does the dataset fit in mem? >>>>>> >>>>>> Brian >>>>>> >>>>>> -----Original Message----- >>>>>> From: Andreas<[email protected]> >>>>>> Date: Tue, 03 Jan 2012 09:00:47 >>>>>> To:<[email protected]> >>>>>> Reply-To: [email protected] >>>>>> Subject: Re: [Scikit-learn-general] Question and comments on >>>>>> RandomForests >>>>>> >>>>>> One other question: >>>>>> I tried to run a forest on MNIST, that actually consisted of only one >>>>>> tree. >>>>>> That gave me a memory error. I only have 2gb ram in this machine >>>>>> (this is my desktop at IST Austria !?) which is obviously not that much. >>>>>> Still this kind of surprised me. Is it expected that a tree takes >>>>>> this "much" ram? Should I change "min_density"? >>>>>> >>>>>> Thanks :) >>>>>> >>>>>> Andy >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Write once. Port to many. >>>>>> Get the SDK and tools to simplify cross-platform app development. Create >>>>>> new or port existing apps to sell to consumers worldwide. Explore the >>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>>>> http://p.sf.net/sfu/intel-appdev >>>>>> _______________________________________________ >>>>>> Scikit-learn-general mailing list >>>>>> [email protected] >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>> ------------------------------------------------------------------------------ >>>>>> Write once. Port to many. >>>>>> Get the SDK and tools to simplify cross-platform app development. Create >>>>>> new or port existing apps to sell to consumers worldwide. Explore the >>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>>>> http://p.sf.net/sfu/intel-appdev >>>>>> _______________________________________________ >>>>>> Scikit-learn-general mailing list >>>>>> [email protected] >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>> >>>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Write once. Port to many. >>>>> Get the SDK and tools to simplify cross-platform app development. Create >>>>> new or port existing apps to sell to consumers worldwide. Explore the >>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>>> http://p.sf.net/sfu/intel-appdev >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>> ------------------------------------------------------------------------------ >>>> Write once. Port to many. >>>> Get the SDK and tools to simplify cross-platform app development. Create >>>> new or port existing apps to sell to consumers worldwide. Explore the >>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>> http://p.sf.net/sfu/intel-appdev >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Write once. Port to many. >>> Get the SDK and tools to simplify cross-platform app development. Create >>> new or port existing apps to sell to consumers worldwide. Explore the >>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>> http://p.sf.net/sfu/intel-appdev >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> -- >> Peter Prettenhofer >> >> ------------------------------------------------------------------------------ >> Write once. Port to many. >> Get the SDK and tools to simplify cross-platform app development. Create >> new or port existing apps to sell to consumers worldwide. Explore the >> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >> http://p.sf.net/sfu/intel-appdev >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > Write once. Port to many. > Get the SDK and tools to simplify cross-platform app development. Create > new or port existing apps to sell to consumers worldwide. Explore the > Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join > http://p.sf.net/sfu/intel-appdev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Peter Prettenhofer ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
