support vector machine question

prem09 Sun, 20 Mar 2016 16:21:17 -0700

Hi, 
I created a dataset of 100 points, ranging from X=1.0 to to X=100.0. I let
the y variable be 0.0 if X < 51.0 and 1.0 otherwise. I then fit a
SVMwithSGD. When I predict the y values for the same values of X as in the
sample, I get back 1.0 for each predicted y!


Incidentally, I don't get perfect separation when I replace SVMwithSGD with
LogisticRegressionWithSGD or NaiveBayes. 

Here's the code: 

==================================== 
import sys 
from pyspark import SparkContext 
from pyspark.mllib.classification import LogisticRegressionWithSGD,
LogisticRegressionModel 
from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel 
from pyspark.mllib.classification import SVMWithSGD, SVMModel 
from pyspark.mllib.regression import LabeledPoint 
import numpy as np 

# Load a text file and convert each line to a tuple. 
sc=SparkContext(appName="Prem") 

# Load and parse the data 
def parsePoint(line): 
    values = [float(x) for x in line.split('\t')] 
    return LabeledPoint(values[0], values[1:]) 

data = sc.textFile("c:/python27/classifier.txt") 
parsedData = data.map(parsePoint) 
print parsedData 

# Build the model 
model = SVMWithSGD.train(parsedData, iterations=100) 
model.setThreshold(0.5) 
print model 

### Build the model 
##model = LogisticRegressionWithSGD.train(parsedData, iterations=100,
intercept=True) 
##print model 

### Build the model 
##model = NaiveBayes.train(parsedData) 
##print model 

for i in range(100): 
    print i+1, model.predict(np.array([float(i+1)])) 
    
================================================= 

Incidentally, the weights I observe in MLlib are 0.8949991, while if I run
it using the scikit-learn library version of support vector machine, I get
0.05417109. Is this indicative of the problem? 
Can you please let me know what I am doing wrong? 

Thanks, 
Prem



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/support-vector-machine-question-tp26543.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

support vector machine question

Reply via email to