Hm, that does sound a bit odd.
Maybe the memory_profiler will shed light on it?
https://pypi.python.org/pypi/memory_profiler
So if you use less than 100 trees it runs through?
Andy
On 10/08/2015 06:12 PM, Peter Rickwood wrote:
Hello all,
I'm puzzled by the memory use of sklearns GBM implementation. It takes
up all available memory and is forced to terminate by the OS, and I
cant think of why it is using as much memory as it does.
Here is the siituation:
I have modest data set of size ~ 4GB (1800 columns, 550000 rows, all
read in to a float32 matrix)
I can read this in and start training a GBM with no memory issues, but
the memory use climbs rapidly as I add more estimators to the GBM.
Once I get to about 100 trees it is using ~50GB of memory, which kills
my laptop.
I dont understand why this is happening. Each tree is shallow (depth
3) so shouldn't take up much memory. The only way I can understand the
behaviour is if the data is somehow getting copied and stored for each
instance of the tree.
What am I missing?
Thanks in advance
Peter
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general