Hi. I just switched to DecisionTreeClassifier to make analysis easier. There should be no joblib there, right? One thing I noticed is that there is often ``np.argsort(X.T, axis=1).astype(np.int32)`` which always does a copy. Still, as these should be garbage collected, I don't really see where all the memory goes... I'll give it a closer look later but I'll move to another box for now. Thanks everybody for the help! And sorry for keeping you @peter. Cheers, Andy
On 01/03/2012 09:52 AM, Peter Prettenhofer wrote: > Hi, > > I just checked DecisionTreeClassifier - it basically requires the same > amout of memory for its internal data structures (= `X_sorted` which > is also 60.000 x 786 x 4 bytes). I haven't checked RandomForest but > you have to make sure that joblib does not fork a new process. If so, > the new process will have the same memory footprint as the parent > process (which is 2x the input size because X_sorted is precomputed). > Furthermore, because of pythons memory management I assume that the > data will be copied once more due to copy on write (actually, we dont > write X or X_sorted but we increment their reference counts which > should be enough to trigger a copy). > > best > > 2012/1/3 Gilles Louppe<[email protected]>: > >> Note also that when using bootstrap=True, copies of X have to be >> created for each tree. >> >> But this should work anyway since you only build 1 tree... Hmmm. >> >> Gilles >> >> On 3 January 2012 09:41, Peter Prettenhofer >> <[email protected]> wrote: >> >>> Hi Andy, >>> >>> I'll investigate the issue with an artificial dataset of comparable >>> size - to be honest I suspect that we focused on speed at the cost of >>> memory usage... >>> >>> As a quick fix you could set `min_density=1` which will result in less >>> memory copies at the cost of runtime. >>> >>> best, >>> Peter >>> >>> 2012/1/3 Andreas<[email protected]>: >>> >>>> Hi Gilles. >>>> Thanks! Will try that. >>>> >>>> Also thanks for working on the docs! :) >>>> >>>> Cheers, >>>> Andy >>>> >>>> >>>> On 01/03/2012 09:30 AM, Gilles Louppe wrote: >>>> >>>>> Hi Andras, >>>>> >>>>> Try setting min_split=10 or higher. With a dataset of that size, there >>>>> is no point in using min_split=1, you will 1) consume indeed too much >>>>> memory and 2) overfit. >>>>> >>>>> Gilles >>>>> >>>>> PS: I have just started to change to doc. Expect a PR later today :) >>>>> >>>>> On 3 January 2012 09:27, Andreas<[email protected]> wrote: >>>>> >>>>> >>>>>> Hi Brian. >>>>>> The dataset itself is 60000 * 786 * 8 bytes (I converted from unit8 to >>>>>> float which is 8 bytes in Numpy I guess) >>>>>> which is ~ 360 MB (also I can load it ;). >>>>>> I trained linear SVMs and Neural networks without much trouble. I >>>>>> haven't really studied the >>>>>> decision tree code (which I know you made quite an effort to optimize) >>>>>> so I don't really >>>>>> have an idea how the construction works. Maybe I just had a >>>>>> misconception of the memory >>>>>> usage of the algorithm. I just started playing with it. >>>>>> >>>>>> Thanks for any comments :) >>>>>> >>>>>> Cheers, >>>>>> Andy >>>>>> >>>>>> >>>>>> On 01/03/2012 09:06 AM, [email protected] wrote: >>>>>> >>>>>> >>>>>>> Hi Andy, >>>>>>> >>>>>>> IIRC MNIST is 60000 samples, each with dimension 28x28, so the 2GB >>>>>>> limit doesn't seem unreasonable (especially since you don't have all of >>>>>>> that at your disposal). Does the dataset fit in mem? >>>>>>> >>>>>>> Brian >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Andreas<[email protected]> >>>>>>> Date: Tue, 03 Jan 2012 09:00:47 >>>>>>> To:<[email protected]> >>>>>>> Reply-To: [email protected] >>>>>>> Subject: Re: [Scikit-learn-general] Question and comments on >>>>>>> RandomForests >>>>>>> >>>>>>> One other question: >>>>>>> I tried to run a forest on MNIST, that actually consisted of only one >>>>>>> tree. >>>>>>> That gave me a memory error. I only have 2gb ram in this machine >>>>>>> (this is my desktop at IST Austria !?) which is obviously not that much. >>>>>>> Still this kind of surprised me. Is it expected that a tree takes >>>>>>> this "much" ram? Should I change "min_density"? >>>>>>> >>>>>>> Thanks :) >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Write once. Port to many. >>>>>>> Get the SDK and tools to simplify cross-platform app development. Create >>>>>>> new or port existing apps to sell to consumers worldwide. Explore the >>>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>>>>> http://p.sf.net/sfu/intel-appdev >>>>>>> _______________________________________________ >>>>>>> Scikit-learn-general mailing list >>>>>>> [email protected] >>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Write once. Port to many. >>>>>>> Get the SDK and tools to simplify cross-platform app development. Create >>>>>>> new or port existing apps to sell to consumers worldwide. Explore the >>>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>>>>> http://p.sf.net/sfu/intel-appdev >>>>>>> _______________________________________________ >>>>>>> Scikit-learn-general mailing list >>>>>>> [email protected] >>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>>> >>>>>>> >>>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Write once. Port to many. >>>>>> Get the SDK and tools to simplify cross-platform app development. Create >>>>>> new or port existing apps to sell to consumers worldwide. Explore the >>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>>>> http://p.sf.net/sfu/intel-appdev >>>>>> _______________________________________________ >>>>>> Scikit-learn-general mailing list >>>>>> [email protected] >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>> >>>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Write once. Port to many. >>>>> Get the SDK and tools to simplify cross-platform app development. Create >>>>> new or port existing apps to sell to consumers worldwide. Explore the >>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>>> http://p.sf.net/sfu/intel-appdev >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Write once. Port to many. >>>> Get the SDK and tools to simplify cross-platform app development. Create >>>> new or port existing apps to sell to consumers worldwide. Explore the >>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>>> http://p.sf.net/sfu/intel-appdev >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> -- >>> Peter Prettenhofer >>> >>> ------------------------------------------------------------------------------ >>> Write once. Port to many. >>> Get the SDK and tools to simplify cross-platform app development. Create >>> new or port existing apps to sell to consumers worldwide. Explore the >>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >>> http://p.sf.net/sfu/intel-appdev >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> ------------------------------------------------------------------------------ >> Write once. Port to many. >> Get the SDK and tools to simplify cross-platform app development. Create >> new or port existing apps to sell to consumers worldwide. Explore the >> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join >> http://p.sf.net/sfu/intel-appdev >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
