Jack, sorry for the late answer. I agree that my last post is misleading. Here a new try: * * Increasing the value of *C* (...) forces the creation of a more accurate model, that may not generalise well.(Try to imagine the feature space with the two mapped sets very far from each other ) A model that fits better the training data is done by adding more SV ( till we get a convex hull of the data ), this is done reducing the "soft margin" (i.e. decreasing C ) ( and again that may not generalise well, maybe you can do a program witch cross-validation )
>Here is another question: is the complexity of the boundary determined by number of total SVs (bounded SV + free SV) or free >SVs only? What do you mean by complexity of the boundary ? Regards Pau 2010/7/28 Jack Luo <jluo.rh...@gmail.com> > Pau, > > Sorry for getting back to you for this again. I am getting confused about > your interpretation of 3). It is obvious from your code that increasing C > results in* smaller *number of SVs, this seems to contradict with your > interpretation " * Increasing the value of C (...) forces the creation of > a more accurate model.* A more accurate model is done my adding more SV". > In addition, I got to know that the number of SVs increases with C > decreasing is because there are many bounded SVs (whose alpha = C, remember > 0 < alpha <= C), those SVs with alpha smaller than C is called free SVs. > Here is another question: is the complexity of the boundary determined by > number of total SVs (bounded SV + free SV) or free SVs only? > > Thanks a bunch, > > -Jack > > > On Thu, Jul 15, 2010 at 4:17 AM, Pau Carrio Gaspar <paucar...@gmail.com>wrote: > >> Hi Jack, >> >> to 1) and 2) there are telling you the same. I recommend you to read the >> first sections of the article it is very well writen and clear. There you >> will read about duality. >> >> to 3) I interpret the scatter plot so: * Increasing the value of C (...) >> forces the creation of a more accurate model.* A more accurate model is >> done my adding more SV ( till we get a convex hull of the data ) >> >> hope it helps >> Regards >> Pau >> >> 2010/7/14 Jack Luo <jluo.rh...@gmail.com> >> >>> Pau, >>> >>> Thanks a lot for your email, I found it very helpful. Please see below >>> for my reply, thanks. >>> >>> -Jack >>> >>> On Wed, Jul 14, 2010 at 10:36 AM, Pau Carrio Gaspar <paucar...@gmail.com >>> > wrote: >>> >>>> Hello Jack, >>>> >>>> 1 ) why do you thought that " larger C is prone to overfitting than >>>> smaller C" ? >>>> >>> >>> *There is some statement in the link http://www.dtreg.com/svm.htm >>> >>> "To allow some flexibility in separating the categories, SVM models have >>> a cost parameter, C, that controls the trade off between allowing >>> training errors and forcing rigid margins. It creates a soft marginthat >>> permits some misclassifications. Increasing the value of >>> C increases the cost of misclassifying points and forces the creation of >>> a more accurate model that may not generalize well." >>> >>> My understanding is that this means larger C may not generalize well >>> (prone to overfitting). >>> * >>> >>> 2 ) if you look at the formulation of the quadratic program problem you >>> will see that C rules the error of the "cutting plane " ( and overfitting >>> ). Therfore for hight C you allow that the "cutting plane" cuts worse the >>> set, so SVM needs less points to build it. a proper explanation is in >>> Kristin P. Bennett and Colin Campbell, "Support Vector Machines: Hype or >>> Hallelujah?", SIGKDD Explorations, 2,2, 2000, 1-13. >>> http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf >>> >>> *Could you be more specific about this? I don't quite understand. * >>> >>>> >>>> 3) you might find usefull this plots: >>>> >>>> library(e1071) >>>> m1 <- matrix( c( >>>> 0, 0, 0, 1, 1, 2, 1, 2, 3, 2, 3, 3, 0, >>>> 1,2,3, 0, 1, 2, 3, >>>> 1, 2, 3, 2, 3, 3, 0, 0, 0, 1, 1, 2, 4, >>>> 4,4,4, 0, 1, 2, 3, >>>> 1, 1, 1, 1, 1, 1, -1,-1, -1,-1,-1,-1, 1 ,1,1,1, >>>> 1, 1,-1,-1 >>>> ), ncol = 3 ) >>>> >>>> Y = m1[,3] >>>> X = m1[,1:2] >>>> >>>> df = data.frame( X , Y ) >>>> >>>> par(mfcol=c(4,2)) >>>> for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0, 1e+1, 1e+2 ,1e+3)) { >>>> #cost <- 1 >>>> model.svm <- svm( Y ~ . , data = df , type = "C-classification" , >>>> kernel = "linear", cost = cost, >>>> scale =FALSE ) >>>> #print(model.svm$SV) >>>> >>>> plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( "cost: ",cost, "#SV: ", >>>> nrow(model.svm$SV) )) >>>> points(m1[m1[,3]>0,1], m1[m1[,3]>0,2], pch=3, col="green") >>>> points(m1[m1[,3]<0,1], m1[m1[,3]<0,2], pch=4, col="blue") >>>> points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = "red") >>>> } >>>> * >>>> * >>> >>> *Thanks a lot for the code, I really appreciate it. I've run it, but I >>> am not sure how should I interpret the scatter plot, although it is obvious >>> that number of SVs decreases with cost increasing. * >>> >>>> >>>> Regards >>>> Pau >>>> >>>> >>>> 2010/7/14 Jack Luo <jluo.rh...@gmail.com> >>>> >>>>> Hi, >>>>> >>>>> I have a question about the parameter C (cost) in svm function in >>>>> e1071. I >>>>> thought larger C is prone to overfitting than smaller C, and hence >>>>> leads to >>>>> more support vectors. However, using the Wisconsin breast cancer >>>>> example on >>>>> the link: >>>>> http://planatscher.net/svmtut/svmtut.html >>>>> I found that the largest cost have fewest support vectors, which is >>>>> contrary >>>>> to what I think. please see the scripts below: >>>>> Am I misunderstanding something here? >>>>> >>>>> Thanks a bunch, >>>>> >>>>> -Jack >>>>> >>>>> > model1 <- svm(databctrain, classesbctrain, kernel = "linear", cost = >>>>> 0.01) >>>>> > model2 <- svm(databctrain, classesbctrain, kernel = "linear", cost = >>>>> 1) >>>>> > model3 <- svm(databctrain, classesbctrain, kernel = "linear", cost = >>>>> 100) >>>>> > model1 >>>>> >>>>> Call: >>>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear", >>>>> cost = 0.01) >>>>> >>>>> >>>>> Parameters: >>>>> SVM-Type: C-classification >>>>> SVM-Kernel: linear >>>>> cost: 0.01 >>>>> gamma: 0.1111111 >>>>> >>>>> Number of Support Vectors: 99 >>>>> >>>>> > model2 >>>>> >>>>> Call: >>>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear", >>>>> cost = 1) >>>>> >>>>> >>>>> Parameters: >>>>> SVM-Type: C-classification >>>>> SVM-Kernel: linear >>>>> cost: 1 >>>>> gamma: 0.1111111 >>>>> >>>>> Number of Support Vectors: 46 >>>>> >>>>> > model3 >>>>> >>>>> Call: >>>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear", >>>>> cost = 100) >>>>> >>>>> >>>>> Parameters: >>>>> SVM-Type: C-classification >>>>> SVM-Kernel: linear >>>>> cost: 100 >>>>> gamma: 0.1111111 >>>>> >>>>> Number of Support Vectors: 44 >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.