Ah right. It is important to use clearThreshold() in that example in
order to generate margins, because the AUC metric needs the
classifications to be ranked by some relative strength, rather than
just 0/1. These outputs are not probabilities, and that is not what
SVMs give you in general. There are techniques for estimating
probabilities from SVM output but these aren't present here.

If you just want 0/1, you do not want to call clearThreshold().

Linear regression is not a classifier so probabilities don't enter
into it. Logistic regression however does give you a probability if
you compute the logistic function of the input directly.

On Sun, Oct 19, 2014 at 3:00 PM, Nick Pomfret
<nick-nab...@snowmonkey.co.uk> wrote:
> Thanks.
>
> The example I used is here
> https://spark.apache.org/docs/latest/mllib-linear-methods.html see
> SVMClassifier
>
> So there's no way to get a probability based output?  What about from linear
> regression, or logistic regression?
>
> On 19 October 2014 19:52, Sean Owen <so...@cloudera.com> wrote:
>>
>> The problem is that you called clearThreshold(). The result becomes the
>> SVM margin not a 0/1 class prediction. There is no probability output.
>>
>> There was a very similar question last week. Is there an example out there
>> suggesting clearThreshold()? I also wonder if it is good to overload the
>> meaning of the output indirectly this way.
>>
>> On Oct 19, 2014 6:53 PM, "npomfret" <nick-nab...@snowmonkey.co.uk> wrote:
>>>
>>> Hi, I'm new to spark and just trying to make sense of the SVMWithSGD
>>> example. I ran my dataset through it and build a model. When I call
>>> predict() on the testing data (after clearThreshold()) I was expecting to
>>> get answers in the range of 0 to 1. But they aren't, all predictions seem to
>>> be negative numbers between -0 and -2. I guess my question is what do these
>>> predictions mean? How are they of use? The outcome I need is a probability
>>> rather than a binary. Here's my java code: SparkConf conf = new SparkConf()
>>> .setAppName("name") .set("spark.cores.max", "1"); JavaSparkContext sc = new
>>> JavaSparkContext(conf); JavaRDD points = sc.textFile(path).map(new
>>> ParsePoint()).cache(); JavaRDD training = points.sample(false, 0.8,
>>> 0L).cache(); JavaRDD testing = points.subtract(training); SVMModel model =
>>> SVMWithSGD.train(training.rdd(), 100); model.clearThreshold(); for
>>> (LabeledPoint point : testing.toArray()) { Double score =
>>> model.predict(point.features()); System.out.println("score = " + score);//<-
>>> all these are negative numbers, seemingly between 0 and -2 }
>>> ________________________________
>>> View this message in context: Using SVMWithSGD model to predict
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to