Re: [R] question about SVM in e1071

Pau Carrio Gaspar Thu, 05 Aug 2010 02:52:15 -0700

Jack,

sorry for the late answer.
I agree that my last post is misleading. Here a new try:
* *
Increasing the value of *C* (...) forces the creation of a more accurate
model, that may not generalise well.(Try to imagine the feature space with
the two mapped sets very far from each other ) A model that fits better the
training data is done by adding more SV ( till we get a convex hull of the
data ), this is done reducing the "soft margin" (i.e. decreasing C ) ( and
again that may not generalise well, maybe you can do a  program witch
cross-validation )


>Here is another question: is the complexity of the boundary determined by
number of total SVs (bounded SV + free SV) or free >SVs only?
What do you mean by complexity of the boundary ?

Regards
Pau

2010/7/28 Jack Luo <jluo.rh...@gmail.com>

> Pau,
>
> Sorry for getting back to you for this again. I am getting confused about
> your interpretation of 3). It is obvious from your code that increasing C
> results in* smaller *number of SVs, this seems to contradict with your
> interpretation " * Increasing the value of C (...) forces the creation of
> a more accurate model.* A more accurate model is done my adding more SV".
> In addition, I got to know that the number of SVs increases with C
> decreasing is because there are many bounded SVs (whose alpha = C, remember
> 0 < alpha <= C), those SVs with alpha smaller than C is called free SVs.
> Here is another question: is the complexity of the boundary determined by
> number of total SVs (bounded SV + free SV) or free SVs only?
>
> Thanks a bunch,
>
> -Jack
>
>
> On Thu, Jul 15, 2010 at 4:17 AM, Pau Carrio Gaspar <paucar...@gmail.com>wrote:
>
>> Hi Jack,
>>
>> to 1) and 2) there are telling you the same. I recommend you to read the
>> first sections of the article it is very well writen and clear. There you
>> will read about duality.
>>
>> to 3) I interpret the scatter plot so: * Increasing the value of C (...)
>> forces the creation of a more accurate model.* A more accurate model is
>> done my adding more SV ( till we get a convex hull of the data )
>>
>> hope it helps
>> Regards
>> Pau
>>
>> 2010/7/14 Jack Luo <jluo.rh...@gmail.com>
>>
>>> Pau,
>>>
>>> Thanks a lot for your email, I found it very helpful. Please see below
>>> for my reply, thanks.
>>>
>>> -Jack
>>>
>>> On Wed, Jul 14, 2010 at 10:36 AM, Pau Carrio Gaspar <paucar...@gmail.com
>>> > wrote:
>>>
>>>>  Hello Jack,
>>>>
>>>> 1 ) why do you thought that " larger C is prone to overfitting than
>>>> smaller C" ?
>>>>
>>>
>>>   *There is some statement in the link http://www.dtreg.com/svm.htm
>>>
>>> "To allow some flexibility in separating the categories, SVM models have
>>> a cost parameter, C, that controls the trade off between allowing
>>> training errors and forcing rigid margins. It   creates a soft marginthat 
>>> permits some misclassifications. Increasing the value of
>>> C increases the cost of misclassifying points and forces the creation of
>>> a more accurate model that may not generalize well."
>>>
>>> My understanding is that this means larger C may not generalize well
>>> (prone to overfitting).
>>> *
>>>
>>> 2 ) if you look at the formulation of the quadratic program problem you
>>> will see that  C rules the error of the "cutting plane " ( and overfitting
>>> ). Therfore for hight C you allow that the "cutting plane" cuts worse the
>>> set, so SVM needs less points to build it. a proper explanation is in
>>> Kristin P. Bennett and Colin Campbell, "Support Vector Machines: Hype or
>>> Hallelujah?", SIGKDD Explorations, 2,2, 2000, 1-13.
>>> http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf
>>>
>>> *Could you be more specific about this? I don't quite understand. *
>>>
>>>>
>>>> 3) you might find usefull this plots:
>>>>
>>>> library(e1071)
>>>> m1 <- matrix( c(
>>>> 0,    0,    0,    1,    1,    2,     1, 2,    3,    2,    3, 3, 0,
>>>> 1,2,3,    0, 1, 2, 3,
>>>> 1,    2,    3,    2,    3,    3,     0, 0,    0,    1, 1, 2, 4,
>>>> 4,4,4,    0, 1, 2, 3,
>>>> 1,    1,    1,    1,    1,    1,    -1,-1,  -1,-1,-1,-1, 1 ,1,1,1,
>>>>  1, 1,-1,-1
>>>> ), ncol = 3 )
>>>>
>>>> Y = m1[,3]
>>>> X = m1[,1:2]
>>>>
>>>> df = data.frame( X , Y )
>>>>
>>>> par(mfcol=c(4,2))
>>>> for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0,  1e+1, 1e+2 ,1e+3)) {
>>>> #cost <- 1
>>>> model.svm <- svm( Y ~ . , data = df ,  type = "C-classification" ,
>>>> kernel = "linear", cost = cost,
>>>>                          scale =FALSE )
>>>> #print(model.svm$SV)
>>>>
>>>> plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( "cost: ",cost, "#SV: ",
>>>> nrow(model.svm$SV) ))
>>>> points(m1[m1[,3]>0,1], m1[m1[,3]>0,2], pch=3, col="green")
>>>> points(m1[m1[,3]<0,1], m1[m1[,3]<0,2], pch=4, col="blue")
>>>> points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = "red")
>>>> }
>>>> *
>>>> *
>>>
>>> *Thanks a lot for the code, I really appreciate it. I've run it, but I
>>> am not sure how should I interpret the scatter plot, although it is obvious
>>> that number of SVs decreases with cost increasing. *
>>>
>>>>
>>>> Regards
>>>> Pau
>>>>
>>>>
>>>> 2010/7/14 Jack Luo <jluo.rh...@gmail.com>
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a question about the parameter C (cost) in svm function in
>>>>> e1071. I
>>>>> thought larger C is prone to overfitting than smaller C, and hence
>>>>> leads to
>>>>> more support vectors. However, using the Wisconsin breast cancer
>>>>> example on
>>>>> the link:
>>>>> http://planatscher.net/svmtut/svmtut.html
>>>>> I found that the largest cost have fewest support vectors, which is
>>>>> contrary
>>>>> to what I think. please see the scripts below:
>>>>> Am I misunderstanding something here?
>>>>>
>>>>> Thanks a bunch,
>>>>>
>>>>> -Jack
>>>>>
>>>>> > model1 <- svm(databctrain, classesbctrain, kernel = "linear", cost =
>>>>> 0.01)
>>>>> > model2 <- svm(databctrain, classesbctrain, kernel = "linear", cost =
>>>>> 1)
>>>>> > model3 <- svm(databctrain, classesbctrain, kernel = "linear", cost =
>>>>> 100)
>>>>> > model1
>>>>>
>>>>> Call:
>>>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>>>>    cost = 0.01)
>>>>>
>>>>>
>>>>> Parameters:
>>>>>   SVM-Type:  C-classification
>>>>>  SVM-Kernel:  linear
>>>>>       cost:  0.01
>>>>>      gamma:  0.1111111
>>>>>
>>>>> Number of Support Vectors:  99
>>>>>
>>>>> > model2
>>>>>
>>>>> Call:
>>>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>>>>    cost = 1)
>>>>>
>>>>>
>>>>> Parameters:
>>>>>   SVM-Type:  C-classification
>>>>>  SVM-Kernel:  linear
>>>>>       cost:  1
>>>>>      gamma:  0.1111111
>>>>>
>>>>> Number of Support Vectors:  46
>>>>>
>>>>> > model3
>>>>>
>>>>> Call:
>>>>> svm.default(x = databctrain, y = classesbctrain, kernel = "linear",
>>>>>    cost = 100)
>>>>>
>>>>>
>>>>> Parameters:
>>>>>   SVM-Type:  C-classification
>>>>>  SVM-Kernel:  linear
>>>>>       cost:  100
>>>>>      gamma:  0.1111111
>>>>>
>>>>> Number of Support Vectors:  44
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about SVM in e1071

Reply via email to