@federico haha! thanks for the motivation!

@Josh, I'm not aware of polling  in Github, but it sounds very 
convenient, a polling feature would be a great addition to scikit ;)

@Gael, I also thought that AUC is not suitable for multi-labels, but if 
you check the latest kaggle competitions such as this one 
`http://www.kaggle.com/c/mlsp-2013-birds/forums` they have established 
AUC measure for multi-label classification. I thought of a simple way to 
do it which is to first label Binarize the output so lets say 
y=[[1,2],[1]] which means sample 1 belongs to class 1 and 2 and sample 2 
belongs to class 1, then the binarized form would be 
y=[[0,1,1],[0,1,0]], finally this can be rasterized to form a vector on 
which the predicted probabilities can be evaluated against, using the 
trivial AUC metrics already implemented in scikit, this could be wrong, 
however, the scores achieved were quite as to the leader-board.

There are quite a number of papers that use AUC for multilabels, for 
example, http://www.cse.msu.edu/~rongjin/publications/iccv_camera.pdf 
<http://www.cse.msu.edu/%7Erongjin/publications/iccv_camera.pdf>


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to