subject:"LinearRegressionWithSGD accuracy"

Re: LinearRegressionWithSGD accuracy

2015-01-28 Thread DB Tsai

Hi Robin,

You can try this PR out. This has built-in features scaling, and has
ElasticNet regularization (L1/L2 mix). This implementation can stably
converge to model from R's glmnet package.

https://github.com/apache/spark/pull/4259

Sincerely,

DB Tsai
---
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai



On Thu, Jan 15, 2015 at 9:42 AM, Robin East  wrote:
> -dev, +user
>
> You’ll need to set the gradient descent step size to something small - a bit 
> of trial and error shows that 0.0001 works.
>
> You’ll need to create a LinearRegressionWithSGD instance and set the step 
> size explicitly:
>
> val lr = new LinearRegressionWithSGD()
> lr.optimizer.setStepSize(0.0001)
> lr.optimizer.setNumIterations(100)
> val model = lr.run(parsedData)
>
> On 15 Jan 2015, at 16:46, devl.development  wrote:
>
>> From what I gather, you use LinearRegressionWithSGD to predict y or the
>> response variable given a feature vector x.
>>
>> In a simple example I used a perfectly linear dataset such that x=y
>> y,x
>> 1,1
>> 2,2
>> ...
>>
>> 1,1
>>
>> Using the out-of-box example from the website (with and without scaling):
>>
>> val data = sc.textFile(file)
>>
>>val parsedData = data.map { line =>
>>  val parts = line.split(',')
>> LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
>> and x
>>
>>}
>>val scaler = new StandardScaler(withMean = true, withStd = true)
>>  .fit(parsedData.map(x => x.features))
>>val scaledData = parsedData
>>  .map(x =>
>>  LabeledPoint(x.label,
>>scaler.transform(Vectors.dense(x.features.toArray
>>
>>// Building the model
>>val numIterations = 100
>>val model = LinearRegressionWithSGD.train(parsedData, numIterations)
>>
>>// Evaluate model on training examples and compute training error *
>> tried using both scaledData and parsedData
>>val valuesAndPreds = scaledData.map { point =>
>>  val prediction = model.predict(point.features)
>>  (point.label, prediction)
>>}
>>val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
>>println("training Mean Squared Error = " + MSE)
>>
>> Both scaled and unscaled attempts give:
>>
>> training Mean Squared Error = NaN
>>
>> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
>> still comes up with the same thing.
>>
>> Is this not supposed to work for x and y or 2 dimensional plots? Is there
>> something I'm missing or wrong in the code above? Or is there a limitation
>> in the method?
>>
>> Thanks for any advice.
>>
>>
>>
>> --
>> View this message in context: 
>> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
>> Sent from the Apache Spark Developers List mailing list archive at 
>> Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: LinearRegressionWithSGD accuracy

2015-01-17 Thread DB Tsai

I'm working on LinearRegressionWithElasticNet using OWLQN now. This
will do the data standardization internally so it's transparent to
users. With OWLQN, you don't have to manually choose stepSize. Will
send out PR soon next week.

Sincerely,

DB Tsai
---
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai



On Thu, Jan 15, 2015 at 8:46 AM, devl.development
 wrote:
> From what I gather, you use LinearRegressionWithSGD to predict y or the
> response variable given a feature vector x.
>
> In a simple example I used a perfectly linear dataset such that x=y
> y,x
> 1,1
> 2,2
> ...
>
> 1,1
>
> Using the out-of-box example from the website (with and without scaling):
>
>  val data = sc.textFile(file)
>
> val parsedData = data.map { line =>
>   val parts = line.split(',')
>  LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
> and x
>
> }
> val scaler = new StandardScaler(withMean = true, withStd = true)
>   .fit(parsedData.map(x => x.features))
> val scaledData = parsedData
>   .map(x =>
>   LabeledPoint(x.label,
> scaler.transform(Vectors.dense(x.features.toArray
>
> // Building the model
> val numIterations = 100
> val model = LinearRegressionWithSGD.train(parsedData, numIterations)
>
> // Evaluate model on training examples and compute training error *
> tried using both scaledData and parsedData
> val valuesAndPreds = scaledData.map { point =>
>   val prediction = model.predict(point.features)
>   (point.label, prediction)
> }
> val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
> println("training Mean Squared Error = " + MSE)
>
> Both scaled and unscaled attempts give:
>
> training Mean Squared Error = NaN
>
> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
> still comes up with the same thing.
>
> Is this not supposed to work for x and y or 2 dimensional plots? Is there
> something I'm missing or wrong in the code above? Or is there a limitation
> in the method?
>
> Thanks for any advice.
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Fwd: LinearRegressionWithSGD accuracy

2015-01-16 Thread Robin East



Sent from my iPhone

Begin forwarded message:

> From: Robin East 
> Date: 16 January 2015 11:35:23 GMT
> To: Joseph Bradley 
> Cc: Yana Kadiyska , Devl Devel 
> 
> Subject: Re: LinearRegressionWithSGD accuracy
> 
> Yes with scaled data intercept would be 5000 but the code as it stands is 
> running a model where intercept will be 0.00. You need to call 
> setIntercept(true) to include the intercept in the model.
> 
> Robin
> 
> Sent from my iPhone
> 
>> On 16 Jan 2015, at 02:01, Joseph Bradley  wrote:
>> 
>> Good point about using the intercept.  When scaling uses the mean (shifting 
>> the feature values), then the "true" model now has an intercept of 5000.5, 
>> whereas the original data's "true" model has an intercept of 0.  I think 
>> that's the issue.
>> 
>>> On Thu, Jan 15, 2015 at 5:16 PM, Yana Kadiyska  
>>> wrote:
>>> I can actually reproduce his MSE -- with the scaled data only (non-scaled 
>>> works out just fine)
>>> 
>>> import org.apache.spark.mllib.regression._
>>> import org.apache.spark.mllib.linalg.{Vector, Vectors}
>>> 
>>> val t=(1 to 1).map(x=>(x,x))
>>> val rdd = sc.parallelize(t)
>>> val parsedData =  
>>> rdd.map(q=>LabeledPoint(q._1.toDouble,Vectors.dense(q._2.toDouble))
>>> 
>>> val lr = new LinearRegressionWithSGD()
>>> lr.optimizer.setStepSize(0.0001)
>>> lr.optimizer.setNumIterations(100)
>>> 
>>> val scaledData = parsedData.map(x => LabeledPoint(x.label, 
>>> scaler.transform(Vectors.dense(x.features.toArray
>>> val model = lr.run(scaledData)
>>> 
>>> val valuesAndPreds = scaledData.map { point =>
>>>   val prediction = model.predict(point.features)
>>>   (prediction,point.label)
>>> }
>>> val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
>>> Last few lines read as:
>>> 
>>> 15/01/15 16:16:40 INFO GradientDescent: GradientDescent.runMiniBatchSGD 
>>> finished. Last 10 stochastic losses 3.3338313007386144E7, 
>>> 3.333831299679853E7, 3.333831298621632E7, 3.333831297563938E7, 
>>> 3.3338312965067785E7, 3.3338312954501465E7, 3.333831294394051E7, 
>>> 3.3338312933384743E7, 3.33383129228344E7, 3.3338312912289333E7
>>> 15/01/15 16:16:40 WARN LinearRegressionWithSGD: The input data was not 
>>> directly cached, which may hurt performance if its parent RDDs are also 
>>> uncached.
>>> model: org.apache.spark.mllib.regression.LinearRegressionModel = 
>>> (weights=[0.00356790226811], intercept=0.0)
>>> 
>>> So I am a bit puzzled as I was under the impression that a scaled model 
>>> would only converge faster. Non-scaled version produced near perfect 
>>> results at alpha=0.0001,numIterations=100
>>> 
>>> According to R the weights should be a lot higher:
>>> y=seq(1, 1)
>>> X=scale(a, center = TRUE, scale = TRUE)
>>> dt=data.frame(y,X)
>>> names(dt) = c("y","x")
>>> model= lm(y~x,data=dt)
>>> #intercept:5000.5,2886.896
>>> new <- data.frame(x=dt$x)
>>> preds = predict(model,new)
>>> mean( (preds-dt$y)^2 , na.rm = TRUE )
>>> Coefficients:
>>> (Intercept)x  
>>>5000.5,  2886.896
>>> 
>>> I did have success with the following model and scaled features as shown in 
>>> the original code block:
>>> 
>>> val lr = new LinearRegressionWithSGD().setIntercept(true)
>>> lr.optimizer.setStepSize(0.1)
>>> lr.optimizer.setNumIterations(1000)
>>> 
>>> scala> model
>>> res12: org.apache.spark.mllib.regression.LinearRegressionModel = 
>>> (weights=[2886.885094323781], intercept=5000.48169121784)
>>> MSE: Double = 4.472548743491049E-4
>>> 
>>> Not sure that it's a question for the dev list as much as someone who 
>>> understands ML well -- I'd appreciate if you guys have any insight on why 
>>> the small alpha/numIters did so poorly on the scaled data (I've removed the 
>>> dev list)
>>> 
>>> 
>>> 
>>> 
>>>> On Thu, Jan 15, 2015 at 3:23 PM, Joseph Bradley  
>>>> wrote:
>>> 
>>>> It looks like you're training on the non-scaled data but testing on the
>>>> scaled data.  Have you tried this training & testing on only the scaled
>>>> data?
>>>> 
>>>> On Thu, Ja

Re: LinearRegressionWithSGD accuracy

2015-01-15 Thread Devl Devel

It was a bug in the code, however adding the step parameter got the results
to work.  Mean Squared Error = 2.610379825794694E-5

I've also opened a jira to put the step parameter in the examples so that
people new to mllib have a way to improve the MSE.

https://issues.apache.org/jira/browse/SPARK-5273

On Thu, Jan 15, 2015 at 8:23 PM, Joseph Bradley 
wrote:

> It looks like you're training on the non-scaled data but testing on the
> scaled data.  Have you tried this training & testing on only the scaled
> data?
>
> On Thu, Jan 15, 2015 at 10:42 AM, Devl Devel 
> wrote:
>
>> Thanks, that helps a bit at least with the NaN but the MSE is still very
>> high even with that step size and 10k iterations:
>>
>> training Mean Squared Error = 3.3322561285919316E7
>>
>> Does this method need say 100k iterations?
>>
>>
>>
>>
>>
>>
>> On Thu, Jan 15, 2015 at 5:42 PM, Robin East 
>> wrote:
>>
>> > -dev, +user
>> >
>> > You’ll need to set the gradient descent step size to something small - a
>> > bit of trial and error shows that 0.0001 works.
>> >
>> > You’ll need to create a LinearRegressionWithSGD instance and set the
>> step
>> > size explicitly:
>> >
>> > val lr = new LinearRegressionWithSGD()
>> > lr.optimizer.setStepSize(0.0001)
>> > lr.optimizer.setNumIterations(100)
>> > val model = lr.run(parsedData)
>> >
>> > On 15 Jan 2015, at 16:46, devl.development 
>> > wrote:
>> >
>> > From what I gather, you use LinearRegressionWithSGD to predict y or the
>> > response variable given a feature vector x.
>> >
>> > In a simple example I used a perfectly linear dataset such that x=y
>> > y,x
>> > 1,1
>> > 2,2
>> > ...
>> >
>> > 1,1
>> >
>> > Using the out-of-box example from the website (with and without
>> scaling):
>> >
>> > val data = sc.textFile(file)
>> >
>> >val parsedData = data.map { line =>
>> >  val parts = line.split(',')
>> > LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble))
>> //y
>> > and x
>> >
>> >}
>> >val scaler = new StandardScaler(withMean = true, withStd = true)
>> >  .fit(parsedData.map(x => x.features))
>> >val scaledData = parsedData
>> >  .map(x =>
>> >  LabeledPoint(x.label,
>> >scaler.transform(Vectors.dense(x.features.toArray
>> >
>> >// Building the model
>> >val numIterations = 100
>> >val model = LinearRegressionWithSGD.train(parsedData, numIterations)
>> >
>> >// Evaluate model on training examples and compute training error *
>> > tried using both scaledData and parsedData
>> >val valuesAndPreds = scaledData.map { point =>
>> >  val prediction = model.predict(point.features)
>> >  (point.label, prediction)
>> >}
>> >val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p),
>> 2)}.mean()
>> >println("training Mean Squared Error = " + MSE)
>> >
>> > Both scaled and unscaled attempts give:
>> >
>> > training Mean Squared Error = NaN
>> >
>> > I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
>> > still comes up with the same thing.
>> >
>> > Is this not supposed to work for x and y or 2 dimensional plots? Is
>> there
>> > something I'm missing or wrong in the code above? Or is there a
>> limitation
>> > in the method?
>> >
>> > Thanks for any advice.
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> >
>> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
>> > Sent from the Apache Spark Developers List mailing list archive at
>> > Nabble.com.
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: dev-h...@spark.apache.org
>> >
>> >
>> >
>>
>
>

Re: LinearRegressionWithSGD accuracy

2015-01-15 Thread Joseph Bradley

It looks like you're training on the non-scaled data but testing on the
scaled data.  Have you tried this training & testing on only the scaled
data?

On Thu, Jan 15, 2015 at 10:42 AM, Devl Devel 
wrote:

> Thanks, that helps a bit at least with the NaN but the MSE is still very
> high even with that step size and 10k iterations:
>
> training Mean Squared Error = 3.3322561285919316E7
>
> Does this method need say 100k iterations?
>
>
>
>
>
>
> On Thu, Jan 15, 2015 at 5:42 PM, Robin East 
> wrote:
>
> > -dev, +user
> >
> > You’ll need to set the gradient descent step size to something small - a
> > bit of trial and error shows that 0.0001 works.
> >
> > You’ll need to create a LinearRegressionWithSGD instance and set the step
> > size explicitly:
> >
> > val lr = new LinearRegressionWithSGD()
> > lr.optimizer.setStepSize(0.0001)
> > lr.optimizer.setNumIterations(100)
> > val model = lr.run(parsedData)
> >
> > On 15 Jan 2015, at 16:46, devl.development 
> > wrote:
> >
> > From what I gather, you use LinearRegressionWithSGD to predict y or the
> > response variable given a feature vector x.
> >
> > In a simple example I used a perfectly linear dataset such that x=y
> > y,x
> > 1,1
> > 2,2
> > ...
> >
> > 1,1
> >
> > Using the out-of-box example from the website (with and without scaling):
> >
> > val data = sc.textFile(file)
> >
> >val parsedData = data.map { line =>
> >  val parts = line.split(',')
> > LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
> > and x
> >
> >}
> >val scaler = new StandardScaler(withMean = true, withStd = true)
> >  .fit(parsedData.map(x => x.features))
> >val scaledData = parsedData
> >  .map(x =>
> >  LabeledPoint(x.label,
> >scaler.transform(Vectors.dense(x.features.toArray
> >
> >// Building the model
> >val numIterations = 100
> >val model = LinearRegressionWithSGD.train(parsedData, numIterations)
> >
> >// Evaluate model on training examples and compute training error *
> > tried using both scaledData and parsedData
> >val valuesAndPreds = scaledData.map { point =>
> >  val prediction = model.predict(point.features)
> >  (point.label, prediction)
> >}
> >val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p),
> 2)}.mean()
> >println("training Mean Squared Error = " + MSE)
> >
> > Both scaled and unscaled attempts give:
> >
> > training Mean Squared Error = NaN
> >
> > I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
> > still comes up with the same thing.
> >
> > Is this not supposed to work for x and y or 2 dimensional plots? Is there
> > something I'm missing or wrong in the code above? Or is there a
> limitation
> > in the method?
> >
> > Thanks for any advice.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
> > Sent from the Apache Spark Developers List mailing list archive at
> > Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
> >
> >
>

Re: LinearRegressionWithSGD accuracy

2015-01-15 Thread Devl Devel

Thanks, that helps a bit at least with the NaN but the MSE is still very
high even with that step size and 10k iterations:

training Mean Squared Error = 3.3322561285919316E7

Does this method need say 100k iterations?






On Thu, Jan 15, 2015 at 5:42 PM, Robin East  wrote:

> -dev, +user
>
> You’ll need to set the gradient descent step size to something small - a
> bit of trial and error shows that 0.0001 works.
>
> You’ll need to create a LinearRegressionWithSGD instance and set the step
> size explicitly:
>
> val lr = new LinearRegressionWithSGD()
> lr.optimizer.setStepSize(0.0001)
> lr.optimizer.setNumIterations(100)
> val model = lr.run(parsedData)
>
> On 15 Jan 2015, at 16:46, devl.development 
> wrote:
>
> From what I gather, you use LinearRegressionWithSGD to predict y or the
> response variable given a feature vector x.
>
> In a simple example I used a perfectly linear dataset such that x=y
> y,x
> 1,1
> 2,2
> ...
>
> 1,1
>
> Using the out-of-box example from the website (with and without scaling):
>
> val data = sc.textFile(file)
>
>val parsedData = data.map { line =>
>  val parts = line.split(',')
> LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
> and x
>
>}
>val scaler = new StandardScaler(withMean = true, withStd = true)
>  .fit(parsedData.map(x => x.features))
>val scaledData = parsedData
>  .map(x =>
>  LabeledPoint(x.label,
>scaler.transform(Vectors.dense(x.features.toArray
>
>// Building the model
>val numIterations = 100
>val model = LinearRegressionWithSGD.train(parsedData, numIterations)
>
>// Evaluate model on training examples and compute training error *
> tried using both scaledData and parsedData
>val valuesAndPreds = scaledData.map { point =>
>  val prediction = model.predict(point.features)
>  (point.label, prediction)
>}
>val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
>println("training Mean Squared Error = " + MSE)
>
> Both scaled and unscaled attempts give:
>
> training Mean Squared Error = NaN
>
> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
> still comes up with the same thing.
>
> Is this not supposed to work for x and y or 2 dimensional plots? Is there
> something I'm missing or wrong in the code above? Or is there a limitation
> in the method?
>
> Thanks for any advice.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>
>

Re: LinearRegressionWithSGD accuracy

2015-01-15 Thread Robin East

-dev, +user

You’ll need to set the gradient descent step size to something small - a bit of 
trial and error shows that 0.0001 works.

You’ll need to create a LinearRegressionWithSGD instance and set the step size 
explicitly:

val lr = new LinearRegressionWithSGD()
lr.optimizer.setStepSize(0.0001)
lr.optimizer.setNumIterations(100)
val model = lr.run(parsedData)

On 15 Jan 2015, at 16:46, devl.development  wrote:

> From what I gather, you use LinearRegressionWithSGD to predict y or the
> response variable given a feature vector x.
> 
> In a simple example I used a perfectly linear dataset such that x=y
> y,x
> 1,1
> 2,2
> ...
> 
> 1,1
> 
> Using the out-of-box example from the website (with and without scaling):
> 
> val data = sc.textFile(file)
> 
>val parsedData = data.map { line =>
>  val parts = line.split(',')
> LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
> and x
> 
>}
>val scaler = new StandardScaler(withMean = true, withStd = true)
>  .fit(parsedData.map(x => x.features))
>val scaledData = parsedData
>  .map(x =>
>  LabeledPoint(x.label,
>scaler.transform(Vectors.dense(x.features.toArray
> 
>// Building the model
>val numIterations = 100
>val model = LinearRegressionWithSGD.train(parsedData, numIterations)
> 
>// Evaluate model on training examples and compute training error *
> tried using both scaledData and parsedData
>val valuesAndPreds = scaledData.map { point =>
>  val prediction = model.predict(point.features)
>  (point.label, prediction)
>}
>val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
>println("training Mean Squared Error = " + MSE)
> 
> Both scaled and unscaled attempts give:
> 
> training Mean Squared Error = NaN
> 
> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
> still comes up with the same thing.
> 
> Is this not supposed to work for x and y or 2 dimensional plots? Is there
> something I'm missing or wrong in the code above? Or is there a limitation
> in the method?
> 
> Thanks for any advice.
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

LinearRegressionWithSGD accuracy

2015-01-15 Thread devl.development

>From what I gather, you use LinearRegressionWithSGD to predict y or the
response variable given a feature vector x.

In a simple example I used a perfectly linear dataset such that x=y
y,x
1,1
2,2
...

1,1

Using the out-of-box example from the website (with and without scaling):

 val data = sc.textFile(file)

val parsedData = data.map { line =>
  val parts = line.split(',')
 LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
and x

}
val scaler = new StandardScaler(withMean = true, withStd = true)
  .fit(parsedData.map(x => x.features))
val scaledData = parsedData
  .map(x =>
  LabeledPoint(x.label,
scaler.transform(Vectors.dense(x.features.toArray

// Building the model
val numIterations = 100
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

// Evaluate model on training examples and compute training error *
tried using both scaledData and parsedData
val valuesAndPreds = scaledData.map { point =>
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
println("training Mean Squared Error = " + MSE)

Both scaled and unscaled attempts give:

training Mean Squared Error = NaN

I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
still comes up with the same thing.

Is this not supposed to work for x and y or 2 dimensional plots? Is there
something I'm missing or wrong in the code above? Or is there a limitation
in the method?

Thanks for any advice.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: LinearRegressionWithSGD accuracy

Re: LinearRegressionWithSGD accuracy

Fwd: LinearRegressionWithSGD accuracy

Re: LinearRegressionWithSGD accuracy

Re: LinearRegressionWithSGD accuracy

Re: LinearRegressionWithSGD accuracy

Re: LinearRegressionWithSGD accuracy

LinearRegressionWithSGD accuracy

8 matches

Site Navigation

Mail list logo

Footer information