Yes, there are no doubt more efficient ways to store forests, but it
seems unlikely to be a worthwhile investment.

I think this is a documentation rather than an engineering issue. We
frequently get issues raised that relate to "size": runtime, memory
consumption, model size on disk, (in)effectiveness of parallelism.

We could provide methods on models that estimate these costs (analytically
or, indeed, via a pre-fit GP regressor!), but merely documenting them more
clearly up front in the general case (even just "parameters can affect
model size drastically") would be worthwhile.

On 12 April 2016 at 02:47, Sebastian Raschka <se.rasc...@gmail.com> wrote:

> Just curious how it could be made more efficient. ~14.9 Mb for 50 trees on
> a 20 mb dataset doesn't sound too bad actually since we are not pruning the
> trees in Random Forests. Sth I could think would be to summarize similar
> trees in buckets or building a "fragment" library of shared decision rules.
> However, I am not sure how much effort it would be to implement such a
> thing plus the computational efficiency may suffer. Hm, I am curious, how
> large would 1 single, fully grown decision tree be based on your dataset?
>
>
> On Apr 11, 2016, at 12:17 PM, Piotr Płoński <pplonsk...@gmail.com> wrote:
>
> I am using 0.17.1, did you consider writing custom save methods for this
> classifier?
>
>
> 2016-04-11 18:11 GMT+02:00 Andreas Mueller <t3k...@gmail.com>:
>
>> Which version of scikit-learn are you using?
>> We recently (0.17) removed storing of data point indices in trees which
>> greatly reduced the size in some cases.
>>
>>
>>
>> On 04/10/2016 09:28 AM, Piotr Płoński wrote:
>>
>> Thanks for comments! I put more details of my problem here
>> <http://stackoverflow.com/questions/36523989/why-sklearn-randomforest-model-take-a-lot-of-disk-space-after-save>
>> http://stackoverflow.com/questions/36523989/why-sklearn-randomforest-model-take-a-lot-of-disk-space-after-save
>>
>>
>> Indeed, saving with joblib takes less space but there is still a lot of
>> space used on the disk.
>>
>> Best,
>> Piotr
>>
>> 2016-04-10 15:24 GMT+02:00 Mathieu Blondel <math...@mblondel.org>:
>>
>>> You may also want to save your model using joblib (possibly with
>>> compression enabled) instead of cPickle.
>>>
>>> Mathieu
>>>
>>> On Sun, Apr 10, 2016 at 9:13 AM, Piotr Płoński < <pplonsk...@gmail.com>
>>> pplonsk...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am saving RandomForestClassifier model from sklearn library with code
>>>> below
>>>>
>>>> with open('/tmp/rf.model', 'wb') as f: cPickle.dump(RF_model, f)
>>>>
>>>> It takes a lot of space on my hard drive. There are only 50 trees in
>>>> the model, however it takes over 50 MB on disk (analyzed dataset is ~ 20MB,
>>>> with 21 features). Does anybody have idea why? I observe similar behavior
>>>> for ExtraTreesClassifier.
>>>>
>>>> Best,
>>>>
>>>> Piotr
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Find and fix application performance issues faster with Applications
>>>> Manager
>>>> Applications Manager provides deep performance insights into multiple
>>>> tiers of
>>>> your business applications. It resolves application problems quickly and
>>>> reduces your MTTR. Get your free trial!
>>>> <http://pubads.g.doubleclick.net/gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532>
>>>> http://pubads.g.doubleclick.net/
>>>> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Find and fix application performance issues faster with Applications
>>> Manager
>>> Applications Manager provides deep performance insights into multiple
>>> tiers of
>>> your business applications. It resolves application problems quickly and
>>> reduces your MTTR. Get your free trial!
>>> <http://pubads.g.doubleclick.net/%0Agampad/clk?id=1444514301&iu=/ca-pub-7940484522588532>
>>> http://pubads.g.doubleclick.net/
>>> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Find and fix application performance issues faster with Applications Manager
>> Applications Manager provides deep performance insights into multiple tiers 
>> of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
>> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Find and fix application performance issues faster with Applications
>> Manager
>> Applications Manager provides deep performance insights into multiple
>> tiers of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
>> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
>> <http://pubads.g.doubleclick.net/gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications
> Manager
> Applications Manager provides deep performance insights into multiple
> tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
>
> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications
> Manager
> Applications Manager provides deep performance insights into multiple
> tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to