Hi Andras,

Try setting min_split=10 or higher. With a dataset of that size, there
is no point in using min_split=1, you will 1) consume indeed too much
memory and 2) overfit.

Gilles

PS: I have just started to change to doc. Expect a PR later today :)

On 3 January 2012 09:27, Andreas <[email protected]> wrote:
> Hi Brian.
> The dataset itself is 60000 * 786 * 8 bytes (I converted from unit8 to
> float which is 8 bytes in Numpy I guess)
> which is ~ 360 MB (also I can load it ;).
> I trained linear SVMs and Neural networks without much trouble. I
> haven't really studied the
> decision tree code (which I know you made quite an effort to optimize)
> so I don't really
> have an idea how the construction works. Maybe I just had a
> misconception of the memory
> usage of the algorithm. I just started playing with it.
>
> Thanks for any comments :)
>
> Cheers,
> Andy
>
>
> On 01/03/2012 09:06 AM, [email protected] wrote:
>> Hi Andy,
>>
>> IIRC MNIST is 60000 samples, each with dimension 28x28, so the 2GB limit 
>> doesn't seem unreasonable (especially since you don't have all of that at 
>> your disposal). Does the dataset fit in mem?
>>
>> Brian
>>
>> -----Original Message-----
>> From: Andreas<[email protected]>
>> Date: Tue, 03 Jan 2012 09:00:47
>> To:<[email protected]>
>> Reply-To: [email protected]
>> Subject: Re: [Scikit-learn-general] Question and comments on RandomForests
>>
>> One other question:
>> I tried to run a forest on MNIST, that actually consisted of only one tree.
>> That gave me a memory error. I only have 2gb ram in this machine
>> (this is my desktop at IST Austria !?) which is obviously not that much.
>> Still this kind of surprised me. Is it expected that a tree takes
>> this "much" ram? Should I change "min_density"?
>>
>> Thanks :)
>>
>> Andy
>>
>> ------------------------------------------------------------------------------
>> Write once. Port to many.
>> Get the SDK and tools to simplify cross-platform app development. Create
>> new or port existing apps to sell to consumers worldwide. Explore the
>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>> http://p.sf.net/sfu/intel-appdev
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> ------------------------------------------------------------------------------
>> Write once. Port to many.
>> Get the SDK and tools to simplify cross-platform app development. Create
>> new or port existing apps to sell to consumers worldwide. Explore the
>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>> http://p.sf.net/sfu/intel-appdev
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> Write once. Port to many.
> Get the SDK and tools to simplify cross-platform app development. Create
> new or port existing apps to sell to consumers worldwide. Explore the
> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
> http://p.sf.net/sfu/intel-appdev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to