Hello all,

I'm puzzled by the memory use of sklearns GBM implementation. It takes up
all available memory and is forced to terminate by the OS, and I cant think
of why it is using as much memory as it does.

Here is the siituation:

I have modest data set of size ~ 4GB (1800 columns, 550000 rows, all read
in to a float32 matrix)


I can read this in and start training a GBM with no memory issues, but the
memory use climbs rapidly as I add more estimators to the GBM. Once I get
to about 100 trees it is using ~50GB of memory, which kills my laptop.

I dont understand why this is happening. Each tree is shallow (depth 3) so
shouldn't take up much memory. The only way I can understand the behaviour
is if the data is somehow getting copied and stored for each instance of
the tree.

What am I missing?



Thanks in advance



Peter
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to