I've tried to plot the svm hyperline in two cases, when the dataset is
imbalanced 50 of class 0 and 7 samples of class 1. Then I just balance the
dataset by simply replicating the samples of class 1 such that I get
approximately 50 samples also in class 1 (I know that SVM have a parameter
for weighting samples, but my purpose is not to use that).

The code is here: http://pastebin.com/P4LhZhHE
The plot is here:
http://imagesup.org/functions/preview.php?file=1368133857-sss.png

Running gives:
SVM trained on the imbalanced set:
Overall scores =  91.22  66.66
Scores for minority class =  71.42  83.33

SVM trained on the balanced set:
Overall scores =  91.22  73.68
Scores for minority class =  100.0 100.0

But the plotted hyperline in the svm trained on the balanced set, do not
correspond to this results. Indeed, the results says that f1_score is
better for the balanced case, and that all samples of the minority class
are correctly classified (which not the case for the imbalanced case).
However, we can see from the plotted hyperline that "not all red samples
are correctly classified".
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to