Next question about DecisionTrees:
I am not sure if I understand the documentation correctly. It says:
"Setting min_density to 0 will always use the sample mask to select the subset of samples at each node. This results in little to no additional memory being allocated, making it appropriate for massive datasets or within ensemble learners,
but at the expense of being slower when training deep trees. "

This sounds to me as if "min_density=0" is slowest but takes the least memory. Is that what is meant?

When doing benchmarking, I found "min_density=0" to be the fastest version on my dataset.
It has set n_samples = 6180, n_features = 2000, n_class=10,

The I tried with MNIST (n_samples=60000, n_features=786, n_class=10) and found min_density=0 to be slower than .1 (twice as long) but .5 to be slower than .1


On digits, since training is very fast, it was hard do measure any real difference. Still, min_density=0 was fastest and min_density=1 was slowest (1.5 times as slow).

I use the default settings from RandomForest with has max_depth=None, n_features=auto
and I am using only one tree (n_estimators=1).
On which data sets did the statement in the documentation hold?

It seems to me that there is some sweet spot for each dataset and that on the datasets
I tested, low values seem faster. Setting min_density=1 was often very slow

What are your experiences?

While .1 seems a good default value, it doesn't seem to be a tradeoff between
time and memory on the datasets I tested. Rather it seems to be
the value that makes the algorithm runs fastest.

Any help / comment / remarks very welcome!

Thanks,
Andy

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to