I also get the same error when using max_depth=1.
It's here:
File "/home/amueller/checkout/scikit-learn/sklearn/tree/tree.py", line
357, in _build_tree
np.argsort(X.T, axis=1).astype(np.int32).T)
The parameters of my forest are:
RandomForestClassifier(bootstrap=True, compute_importances=False,
criterion='gini', max_depth=1, max_features=10,
min_density=0.1, min_split=1, n_estimators=1, n_jobs=1,
random_state=<mtrand.RandomState object at 0x7f451ac2b0d8>)
It's quite possible that this is some stupid mistake on my part.
I just want to understand what is happening. I would like to use
this code on much larger datasets (with more ram ;) where data
copies can really be an issue.
Thanks again for your help.
Andy
On 01/03/2012 09:34 AM, Andreas wrote:
> Hi Gilles.
> Thanks! Will try that.
>
> Also thanks for working on the docs! :)
>
> Cheers,
> Andy
>
>
> On 01/03/2012 09:30 AM, Gilles Louppe wrote:
>
>> Hi Andras,
>>
>> Try setting min_split=10 or higher. With a dataset of that size, there
>> is no point in using min_split=1, you will 1) consume indeed too much
>> memory and 2) overfit.
>>
>> Gilles
>>
>> PS: I have just started to change to doc. Expect a PR later today :)
>>
>> On 3 January 2012 09:27, Andreas<[email protected]> wrote:
>>
>>
>>> Hi Brian.
>>> The dataset itself is 60000 * 786 * 8 bytes (I converted from unit8 to
>>> float which is 8 bytes in Numpy I guess)
>>> which is ~ 360 MB (also I can load it ;).
>>> I trained linear SVMs and Neural networks without much trouble. I
>>> haven't really studied the
>>> decision tree code (which I know you made quite an effort to optimize)
>>> so I don't really
>>> have an idea how the construction works. Maybe I just had a
>>> misconception of the memory
>>> usage of the algorithm. I just started playing with it.
>>>
>>> Thanks for any comments :)
>>>
>>> Cheers,
>>> Andy
>>>
>>>
>>> On 01/03/2012 09:06 AM, [email protected] wrote:
>>>
>>>
>>>> Hi Andy,
>>>>
>>>> IIRC MNIST is 60000 samples, each with dimension 28x28, so the 2GB limit
>>>> doesn't seem unreasonable (especially since you don't have all of that at
>>>> your disposal). Does the dataset fit in mem?
>>>>
>>>> Brian
>>>>
>>>> -----Original Message-----
>>>> From: Andreas<[email protected]>
>>>> Date: Tue, 03 Jan 2012 09:00:47
>>>> To:<[email protected]>
>>>> Reply-To: [email protected]
>>>> Subject: Re: [Scikit-learn-general] Question and comments on RandomForests
>>>>
>>>> One other question:
>>>> I tried to run a forest on MNIST, that actually consisted of only one tree.
>>>> That gave me a memory error. I only have 2gb ram in this machine
>>>> (this is my desktop at IST Austria !?) which is obviously not that much.
>>>> Still this kind of surprised me. Is it expected that a tree takes
>>>> this "much" ram? Should I change "min_density"?
>>>>
>>>> Thanks :)
>>>>
>>>> Andy
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Write once. Port to many.
>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>> http://p.sf.net/sfu/intel-appdev
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>> ------------------------------------------------------------------------------
>>>> Write once. Port to many.
>>>> Get the SDK and tools to simplify cross-platform app development. Create
>>>> new or port existing apps to sell to consumers worldwide. Explore the
>>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>>> http://p.sf.net/sfu/intel-appdev
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>> ------------------------------------------------------------------------------
>>> Write once. Port to many.
>>> Get the SDK and tools to simplify cross-platform app development. Create
>>> new or port existing apps to sell to consumers worldwide. Explore the
>>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>>> http://p.sf.net/sfu/intel-appdev
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>> ------------------------------------------------------------------------------
>> Write once. Port to many.
>> Get the SDK and tools to simplify cross-platform app development. Create
>> new or port existing apps to sell to consumers worldwide. Explore the
>> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
>> http://p.sf.net/sfu/intel-appdev
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> Write once. Port to many.
> Get the SDK and tools to simplify cross-platform app development. Create
> new or port existing apps to sell to consumers worldwide. Explore the
> Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
> http://p.sf.net/sfu/intel-appdev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general