Hi.
I just switched to DecisionTreeClassifier to make analysis easier.
There should be no joblib there, right?
One thing I noticed is that there is often
``np.argsort(X.T, axis=1).astype(np.int32)``
which always does a copy.
Still, as these should be garbage collected, I don't really see
where all the memory goes...
I'll give it a closer look later but I'll move to another box
for now.
Thanks everybody for the help!
And sorry for keeping you @peter.
Cheers,
Andy


On 01/03/2012 09:52 AM, Peter Prettenhofer wrote:
> Hi,
>
> I just checked DecisionTreeClassifier - it basically requires the same
> amout of memory for its internal data structures (= `X_sorted` which
> is also 60.000 x 786 x 4 bytes). I haven't checked RandomForest but
> you have to make sure that joblib does not fork a new process. If so,
> the new process will have the same memory footprint as the parent
> process (which is 2x the input size because X_sorted is precomputed).
> Furthermore, because of pythons memory management I assume that the
> data will be copied once more due to copy on write (actually, we dont
> write X or X_sorted but we increment their reference counts which
> should be enough to trigger a copy).
>
> best
>
> 2012/1/3 Gilles Louppe<[email protected]>:
>    
>> Note also that when using bootstrap=True, copies of X have to be
>> created for each tree.
>>
>> But this should work anyway since you only build 1 tree... Hmmm.
>>
>> Gilles
>>
>> On 3 January 2012 09:41, Peter Prettenhofer
>> <[email protected]>  wrote:
>>      
>>> Hi Andy,
>>>
>>> I'll investigate the issue with an artificial dataset of comparable
>>> size - to be honest I suspect that we focused on speed at the cost of
>>> memory usage...
>>>
>>> As a quick fix you could set `min_density=1` which will result in less
>>> memory copies at the cost of runtime.
>>>
>>> best,
>>>   Peter
>>>
>>> 2012/1/3 Andreas<[email protected]>:
>>>        
>>>> Hi Gilles.
>>>> Thanks! Will try that.
>>>>
>>>> Also thanks for working on the docs! :)
>>>>
>>>> Cheers,
>>>> Andy
>>>>
>>>>
>>>> On 01/03/2012 09:30 AM, Gilles Louppe wrote:
>>>>          
>>>>> Hi Andras,
>>>>>
>>>>> Try setting min_split=10 or higher. With a dataset of that size, there
>>>>> is no point in using min_split=1, you will 1) consume indeed too much
>>>>> memory and 2) overfit.
>>>>>
>>>>> Gilles
>>>>>
>>>>> PS: I have just started to change to doc. Expect a PR later today :)
>>>>>
>>>>> On 3 January 2012 09:27, Andreas<[email protected]>    wrote:
>>>>>
>>>>>            
>>>>>> Hi Brian.
>>>>>> The dataset itself is 60000 * 786 * 8 bytes (I converted from unit8 to
>>>>>> float which is 8 bytes in Numpy I guess)
>>>>>> which is ~ 360 MB (also I can load it ;).
>>>>>> I trained linear SVMs and Neural networks without much trouble. I
>>>>>> haven't really studied the
>>>>>> decision tree code (which I know you made quite an effort to optimize)
>>>>>> so I don't really
>>>>>> have an idea how the construction works. Maybe I just had a
>>>>>> misconception of the memory
>>>>>> usage of the algorithm. I just started playing with it.
>>>>>>
>>>>>> Thanks for any comments :)
>>>>>>
>>>>>> Cheers,
>>>>>> Andy
>>>>>>
>>>>>>
>>>>>> On 01/03/2012 09:06 AM, [email protected] wrote:
>>>>>>
>>>>>>              
>>>>>>> Hi Andy,
>>>>>>>
>>>>>>> IIRC MNIST is 60000 samples, each with dimension 28x28, so the 2GB 
>>>>>>> limit doesn't seem unreasonable (especially since you don't have all of 
>>>>>>> that at your disposal). Does the dataset fit in mem?
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Andreas<[email protected]>
>>>>>>> Date: Tue, 03 Jan 2012 09:00:47
>>>>>>> To:<[email protected]>
>>>>>>> Reply-To: [email protected]
>>>>>>> Subject: Re: [Scikit-learn-general] Question and comments on 
>>>>>>> RandomForests
>>>>>>>
>>>>>>> One other question:
>>>>>>> I tried to run a forest on MNIST, that actually consisted of only one 
>>>>>>> tree.
>>>>>>> That gave me a memory error. I only have 2gb ram in this machine
>>>>>>> (this is my desktop at IST Austria !?) which is obviously not that much.
>>>>>>> Still this kind of surprised me. Is it expected that a tree takes
>>>>>>> this "much" ram? Should I change "min_density"?
>>>>>>>
>>>>>>> Thanks :)
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Write once. Port to many.
>>>>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>>>>> http://p.sf.net/sfu/intel-appdev
>>>>>>> _______________________________________________
>>>>>>> Scikit-learn-general mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Write once. Port to many.
>>>>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>>>>> http://p.sf.net/sfu/intel-appdev
>>>>>>> _______________________________________________
>>>>>>> Scikit-learn-general mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> ------------------------------------------------------------------------------
>>>>>> Write once. Port to many.
>>>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>>>> http://p.sf.net/sfu/intel-appdev
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> [email protected]
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>>              
>>>>> ------------------------------------------------------------------------------
>>>>> Write once. Port to many.
>>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>>> http://p.sf.net/sfu/intel-appdev
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>            
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Write once. Port to many.
>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>> http://p.sf.net/sfu/intel-appdev
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>          
>>>
>>>
>>> --
>>> Peter Prettenhofer
>>>
>>> ------------------------------------------------------------------------------
>>> Write once. Port to many.
>>> Get the SDK and tools to simplify cross-platform app development. Create
>>> new or port existing apps to sell to consumers worldwide. Explore the
>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>> http://p.sf.net/sfu/intel-appdev
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>        
>> ------------------------------------------------------------------------------
>> Write once. Port to many.
>> Get the SDK and tools to simplify cross-platform app development. Create
>> new or port existing apps to sell to consumers worldwide. Explore the
>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>> http://p.sf.net/sfu/intel-appdev
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>      
>
>
>    


------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to