On Tue, 15 Feb 2011, Thorsten Kranz wrote:
> If I have 4 labels in my data, the tree I want to use might look like:

>             /\
>            /  \
>           /    \
>          3    / \
>              1  /\
>                2 4

so what happens, as you correctly pointed out,  there comes heavy
disbalance at each node besides 2 and 4; that is why classifier, if
decision is not obvious, goes after majoritylabel-takes-all with SVM.

Therefor first it chooses (1,2,4), then (2,4) and only then decides
correctly between the two which come to the classifier balanced.

Logical would be per label weighting to compensate (checkout
weight_label in SVM) or some other classifier which is not prone to such
"race" conditions, e.g. GNB... but your example brought up an
"interesting" usecase which shows problems with TreeClassifier
assumptions (e.g. there should be no dangling single-class choice)
and GNB inability to train on a single label.... more tomorrow,
meanwhile you can try something like

clf = GNB
tclf = TreeClassifier(clf(),
                      {"g3": ([3], SVM()),
                       "g6": ([1,2,4], TreeClassifier(clf(),
                             {"g1": ([1], SVM()),
                              "g5": ([2,4],clf())}))})

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

Reply via email to