Hm, that does sound a bit odd.
Maybe the memory_profiler will shed light on it?
https://pypi.python.org/pypi/memory_profiler

So if you use less than 100 trees it runs through?

Andy


On 10/08/2015 06:12 PM, Peter Rickwood wrote:


Hello all,

I'm puzzled by the memory use of sklearns GBM implementation. It takes up all available memory and is forced to terminate by the OS, and I cant think of why it is using as much memory as it does.

Here is the siituation:

I have modest data set of size ~ 4GB (1800 columns, 550000 rows, all read in to a float32 matrix)


I can read this in and start training a GBM with no memory issues, but the memory use climbs rapidly as I add more estimators to the GBM. Once I get to about 100 trees it is using ~50GB of memory, which kills my laptop.

I dont understand why this is happening. Each tree is shallow (depth 3) so shouldn't take up much memory. The only way I can understand the behaviour is if the data is somehow getting copied and stored for each instance of the tree.

What am I missing?



Thanks in advance



Peter




















------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to