Hi Yang,

I think I understand it better now, as well. So this is what I think it
does:

First of all, I think it only affects the categorical node splits. It will
work as following in this scenario:
Let us consider a dataset D we want to build a decision tree from.
Let's say the tree has been partially built, and we've reached a
categorical attribute C that we want to split on.

As I understand it, when parametrized = false, on that node we might only
branch on a subset of possible values of C.

When parametrized = true, however, we will 'force' branching on all
possible values of C from the entire dataset, and replace the missing data
with leaves having a label computed from the parent data (line 307):

if (data.getDataset
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.7/org/apache/mahout/classifier/df/data/Data.java#Data.getDataset%28%29>().isNumerical
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.7/org/apache/mahout/classifier/df/data/Dataset.java#Dataset.isNumerical%28int%29>(data.getDataset
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.7/org/apache/mahout/classifier/df/data/Data.java#Data.getDataset%28%29>().getLabelId
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.7/org/apache/mahout/classifier/df/data/Dataset.java#Dataset.getLabelId%28%29>()))
{

label = sum / data.size
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.7/org/apache/mahout/classifier/df/data/Data.java#Data.size%28%29>();

} else {

label = data.majorityLabel
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.7/org/apache/mahout/classifier/df/data/Data.java#Data.majorityLabel%28java.util.Random%29>(rng);

}


I hope this is correct and helps with understanding it better.


Also, I found this <https://issues.apache.org/jira/browse/MAHOUT-840>,
it's the Jira task that introduced the DecisionTreeBuilder, take a
look at the comments, maybe it'll help you as well.



Anca

Reply via email to