Hi,

I just checked DecisionTreeClassifier - it basically requires the same
amout of memory for its internal data structures (= `X_sorted` which
is also 60.000 x 786 x 4 bytes). I haven't checked RandomForest but
you have to make sure that joblib does not fork a new process. If so,
the new process will have the same memory footprint as the parent
process (which is 2x the input size because X_sorted is precomputed).
Furthermore, because of pythons memory management I assume that the
data will be copied once more due to copy on write (actually, we dont
write X or X_sorted but we increment their reference counts which
should be enough to trigger a copy).

best

2012/1/3 Gilles Louppe <[email protected]>:
> Note also that when using bootstrap=True, copies of X have to be
> created for each tree.
>
> But this should work anyway since you only build 1 tree... Hmmm.
>
> Gilles
>
> On 3 January 2012 09:41, Peter Prettenhofer
> <[email protected]> wrote:
>> Hi Andy,
>>
>> I'll investigate the issue with an artificial dataset of comparable
>> size - to be honest I suspect that we focused on speed at the cost of
>> memory usage...
>>
>> As a quick fix you could set `min_density=1` which will result in less
>> memory copies at the cost of runtime.
>>
>> best,
>>  Peter
>>
>> 2012/1/3 Andreas <[email protected]>:
>>> Hi Gilles.
>>> Thanks! Will try that.
>>>
>>> Also thanks for working on the docs! :)
>>>
>>> Cheers,
>>> Andy
>>>
>>>
>>> On 01/03/2012 09:30 AM, Gilles Louppe wrote:
>>>> Hi Andras,
>>>>
>>>> Try setting min_split=10 or higher. With a dataset of that size, there
>>>> is no point in using min_split=1, you will 1) consume indeed too much
>>>> memory and 2) overfit.
>>>>
>>>> Gilles
>>>>
>>>> PS: I have just started to change to doc. Expect a PR later today :)
>>>>
>>>> On 3 January 2012 09:27, Andreas<[email protected]>  wrote:
>>>>
>>>>> Hi Brian.
>>>>> The dataset itself is 60000 * 786 * 8 bytes (I converted from unit8 to
>>>>> float which is 8 bytes in Numpy I guess)
>>>>> which is ~ 360 MB (also I can load it ;).
>>>>> I trained linear SVMs and Neural networks without much trouble. I
>>>>> haven't really studied the
>>>>> decision tree code (which I know you made quite an effort to optimize)
>>>>> so I don't really
>>>>> have an idea how the construction works. Maybe I just had a
>>>>> misconception of the memory
>>>>> usage of the algorithm. I just started playing with it.
>>>>>
>>>>> Thanks for any comments :)
>>>>>
>>>>> Cheers,
>>>>> Andy
>>>>>
>>>>>
>>>>> On 01/03/2012 09:06 AM, [email protected] wrote:
>>>>>
>>>>>> Hi Andy,
>>>>>>
>>>>>> IIRC MNIST is 60000 samples, each with dimension 28x28, so the 2GB limit 
>>>>>> doesn't seem unreasonable (especially since you don't have all of that 
>>>>>> at your disposal). Does the dataset fit in mem?
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andreas<[email protected]>
>>>>>> Date: Tue, 03 Jan 2012 09:00:47
>>>>>> To:<[email protected]>
>>>>>> Reply-To: [email protected]
>>>>>> Subject: Re: [Scikit-learn-general] Question and comments on 
>>>>>> RandomForests
>>>>>>
>>>>>> One other question:
>>>>>> I tried to run a forest on MNIST, that actually consisted of only one 
>>>>>> tree.
>>>>>> That gave me a memory error. I only have 2gb ram in this machine
>>>>>> (this is my desktop at IST Austria !?) which is obviously not that much.
>>>>>> Still this kind of surprised me. Is it expected that a tree takes
>>>>>> this "much" ram? Should I change "min_density"?
>>>>>>
>>>>>> Thanks :)
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Write once. Port to many.
>>>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>>>> http://p.sf.net/sfu/intel-appdev
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> [email protected]
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>> ------------------------------------------------------------------------------
>>>>>> Write once. Port to many.
>>>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>>>> http://p.sf.net/sfu/intel-appdev
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> [email protected]
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Write once. Port to many.
>>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>>> http://p.sf.net/sfu/intel-appdev
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>> ------------------------------------------------------------------------------
>>>> Write once. Port to many.
>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>> http://p.sf.net/sfu/intel-appdev
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Write once. Port to many.
>>> Get the SDK and tools to simplify cross-platform app development. Create
>>> new or port existing apps to sell to consumers worldwide. Explore the
>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>> http://p.sf.net/sfu/intel-appdev
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> --
>> Peter Prettenhofer
>>
>> ------------------------------------------------------------------------------
>> Write once. Port to many.
>> Get the SDK and tools to simplify cross-platform app development. Create
>> new or port existing apps to sell to consumers worldwide. Explore the
>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>> http://p.sf.net/sfu/intel-appdev
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Write once. Port to many.
> Get the SDK and tools to simplify cross-platform app development. Create
> new or port existing apps to sell to consumers worldwide. Explore the
> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
> http://p.sf.net/sfu/intel-appdev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to