On Tue, 15 Feb 2011, Thorsten Kranz wrote:
> If I have 4 labels in my data, the tree I want to use might look like:
> /\
> / \
> / \
> 3 / \
> 1 /\
> 2 4
so what happens, as you correctly pointed out, there comes heavy
disbalance at each node besides 2 and 4; that is why classifier, if
decision is not obvious, goes after majoritylabel-takes-all with SVM.
Therefor first it chooses (1,2,4), then (2,4) and only then decides
correctly between the two which come to the classifier balanced.
Logical would be per label weighting to compensate (checkout
weight_label in SVM) or some other classifier which is not prone to such
"race" conditions, e.g. GNB... but your example brought up an
"interesting" usecase which shows problems with TreeClassifier
assumptions (e.g. there should be no dangling single-class choice)
and GNB inability to train on a single label.... more tomorrow,
meanwhile you can try something like
clf = GNB
tclf = TreeClassifier(clf(),
{"g3": ([3], SVM()),
"g6": ([1,2,4], TreeClassifier(clf(),
{"g1": ([1], SVM()),
"g5": ([2,4],clf())}))})
--
=------------------------------------------------------------------=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa