Hi, I created a dataset of 100 points, ranging from X=1.0 to to X=100.0. I let the y variable be 0.0 if X < 51.0 and 1.0 otherwise. I then fit a SVMwithSGD. When I predict the y values for the same values of X as in the sample, I get back 1.0 for each predicted y!
Incidentally, I don't get perfect separation when I replace SVMwithSGD with LogisticRegressionWithSGD or NaiveBayes. Here's the code: ==================================== import sys from pyspark import SparkContext from pyspark.mllib.classification import LogisticRegressionWithSGD, LogisticRegressionModel from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel from pyspark.mllib.classification import SVMWithSGD, SVMModel from pyspark.mllib.regression import LabeledPoint import numpy as np # Load a text file and convert each line to a tuple. sc=SparkContext(appName="Prem") # Load and parse the data def parsePoint(line): values = [float(x) for x in line.split('\t')] return LabeledPoint(values[0], values[1:]) data = sc.textFile("c:/python27/classifier.txt") parsedData = data.map(parsePoint) print parsedData # Build the model model = SVMWithSGD.train(parsedData, iterations=100) model.setThreshold(0.5) print model ### Build the model ##model = LogisticRegressionWithSGD.train(parsedData, iterations=100, intercept=True) ##print model ### Build the model ##model = NaiveBayes.train(parsedData) ##print model for i in range(100): print i+1, model.predict(np.array([float(i+1)])) ================================================= Incidentally, the weights I observe in MLlib are 0.8949991, while if I run it using the scikit-learn library version of support vector machine, I get 0.05417109. Is this indicative of the problem? Can you please let me know what I am doing wrong? Thanks, Prem -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/support-vector-machine-does-not-classify-properly-tp26216.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org