Next question about DecisionTrees:
I am not sure if I understand the documentation correctly. It says:
"Setting min_density to 0 will always use the sample mask to select the
subset of samples at each node.
This results in little to no additional memory being allocated, making
it appropriate for massive datasets or within ensemble learners,
but at the expense of being slower when training deep trees. "
This sounds to me as if "min_density=0" is slowest but takes the least
memory. Is that what is meant?
When doing benchmarking, I found "min_density=0" to be the fastest
version on my dataset.
It has set n_samples = 6180, n_features = 2000, n_class=10,
The I tried with MNIST (n_samples=60000, n_features=786, n_class=10) and
found
min_density=0 to be slower than .1 (twice as long) but .5 to be slower
than .1
On digits, since training is very fast, it was hard do measure any real
difference.
Still, min_density=0 was fastest and min_density=1 was slowest (1.5
times as slow).
I use the default settings from RandomForest with has max_depth=None,
n_features=auto
and I am using only one tree (n_estimators=1).
On which data sets did the statement in the documentation hold?
It seems to me that there is some sweet spot for each dataset and that
on the datasets
I tested, low values seem faster. Setting min_density=1 was often very slow
What are your experiences?
While .1 seems a good default value, it doesn't seem to be a tradeoff
between
time and memory on the datasets I tested. Rather it seems to be
the value that makes the algorithm runs fastest.
Any help / comment / remarks very welcome!
Thanks,
Andy
------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general