Re: spark linear regression error training dataset is empty

2016-12-25 Thread Yuhao Yang
Hi Xiaomeng,

Have you tried to confirm the DataFrame contents before fitting? like
assembleddata.show()
before fitting.

Regards,
Yuhao

2016-12-21 10:05 GMT-08:00 Xiaomeng Wan :

> Hi,
>
> I am running linear regression on a dataframe and get the following error:
>
> Exception in thread "main" java.lang.AssertionError: assertion failed:
> Training dataset is empty.
>
> at scala.Predef$.assert(Predef.scala:170)
>
> at org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(
> WeightedLeastSquares.scala:247)
>
> at org.apache.spark.ml.optim.WeightedLeastSquares.fit(
> WeightedLeastSquares.scala:82)
>
> at org.apache.spark.ml.regression.LinearRegression.
> train(LinearRegression.scala:180)
>
> at org.apache.spark.ml.regression.LinearRegression.
> train(LinearRegression.scala:70)
>
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
>
> here is the data and code:
>
> {"label":79.3,"features":{"type":1,"values":[6412.
> 14350001,888.0,1407.0,1.5844594594594594,10.614,12.07,
> 0.12062966031483012,0.9991237664152219,6.065,0.49751449875724935]}}
>
> {"label":72.3,"features":{"type":1,"values":[6306.
> 04450001,1084.0,1451.0,1.338560885608856,7.018,12.04,0.
> 41710963455149497,0.9992054343916128,6.05,0.4975083056478405]}}
>
> {"label":76.7,"features":{"type":1,"values":[6142.
> 9203,1494.0,1437.0,0.9618473895582329,7.939,12.06,
> 0.34170812603648426,0.9992216101762574,6.06,0.49751243781094534]}}
>
> val lr = new LinearRegression().setMaxIter(300).setFeaturesCol("features")
>
> val lrModel = lr.fit(assembleddata)
>
> Any clue or inputs are appreciated.
>
>
> Regards,
>
> Shawn
>
>
>


spark linear regression error training dataset is empty

2016-12-21 Thread Xiaomeng Wan
Hi,

I am running linear regression on a dataframe and get the following error:

Exception in thread "main" java.lang.AssertionError: assertion failed:
Training dataset is empty.

at scala.Predef$.assert(Predef.scala:170)

at
org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:247)

at
org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)

at
org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)

at
org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:70)

at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)

here is the data and code:

{"label":79.3,"features":{"type":1,"values":[6412.14350001,888.0,1407.0,1.5844594594594594,10.614,12.07,0.12062966031483012,0.9991237664152219,6.065,0.49751449875724935]}}

{"label":72.3,"features":{"type":1,"values":[6306.04450001,1084.0,1451.0,1.338560885608856,7.018,12.04,0.41710963455149497,0.9992054343916128,6.05,0.4975083056478405]}}

{"label":76.7,"features":{"type":1,"values":[6142.9203,1494.0,1437.0,0.9618473895582329,7.939,12.06,0.34170812603648426,0.9992216101762574,6.06,0.49751243781094534]}}

val lr = new LinearRegression().setMaxIter(300).setFeaturesCol("features")

val lrModel = lr.fit(assembleddata)

Any clue or inputs are appreciated.


Regards,

Shawn


Re: Linear Regression Error

2016-10-12 Thread Sean Owen
See https://issues.apache.org/jira/browse/SPARK-17588

On Wed, Oct 12, 2016 at 9:07 PM Meeraj Kunnumpurath <
mee...@servicesymphony.com> wrote:

> If I drop the last feature on the third model, the error seems to go away.
>
> On Wed, Oct 12, 2016 at 11:52 PM, Meeraj Kunnumpurath <
> mee...@servicesymphony.com> wrote:
>
> Hello,
>
> I have some code trying to compare linear regression coefficients with
> three sets of features, as shown below. On the third one, I get an
> assertion error.
>
> This is the code,
>
> object MultipleRegression extends App {
>
>
>
>   val spark = SparkSession.builder().appName("Regression Model 
> Builder").master("local").getOrCreate()
>
>   import spark.implicits._
>
>   val training = build("kc_house_train_data.csv", "train", spark)
>   val test = build("kc_house_test_data.csv", "test", spark)
>
>   val lr = new LinearRegression()
>
>   val m1 = lr.fit(training.map(r => buildLp(r, "sqft_living", "bedrooms", 
> "bathrooms", "lat", "long")))
>   println(s"Coefficients: ${m1.coefficients}, Intercept: ${m1.intercept}")
>
>   val m2 = lr.fit(training.map(r => buildLp(r, "sqft_living", "bedrooms", 
> "bathrooms", "lat", "long", "bed_bath_rooms")))
>   println(s"Coefficients: ${m2.coefficients}, Intercept: ${m2.intercept}")
>
>   val m3 = lr.fit(training.map(r => buildLp(r, "sqft_living", "bedrooms", 
> "bathrooms", "lat", "long", "bed_bath_rooms", "bedrooms_squared", 
> "log_sqft_living", "lat_plus_long")))
>   println(s"Coefficients: ${m3.coefficients}, Intercept: ${m3.intercept}")
>
>
>   def build(path: String, view: String, spark: SparkSession) = {
>
> val toDouble = udf((x: String) => x.toDouble)
> val product = udf((x: Double, y: Double) => x * y)
> val sum = udf((x: Double, y: Double) => x + y)
> val log = udf((x: Double) => scala.math.log(x))
>
> spark.read.
>   option("header", "true").
>   csv(path).
>   withColumn("sqft_living", toDouble('sqft_living)).
>   withColumn("price", toDouble('price)).
>   withColumn("bedrooms", toDouble('bedrooms)).
>   withColumn("bathrooms", toDouble('bathrooms)).
>   withColumn("lat", toDouble('lat)).
>   withColumn("long", toDouble('long)).
>   withColumn("bedrooms_squared", product('bedrooms, 'bedrooms)).
>   withColumn("bed_bath_rooms", product('bedrooms, 'bathrooms)).
>   withColumn("lat_plus_long", sum('lat, 'long)).
>   withColumn("log_sqft_living", log('sqft_living))
>
>   }
>
>   def buildLp(r: Row, input: String*) = {
> var features = input.map(r.getAs[Double](_)).toArray
> new LabeledPoint(r.getAs[Double]("price"), Vectors.dense(features))
>   }
>
> }
>
>
> This is the error I get.
>
> Exception in thread "main" java.lang.AssertionError: assertion failed:
> lapack.dppsv returned 9.
> at scala.Predef$.assert(Predef.scala:170)
> at
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:40)
> at
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:140)
> at org.apache.spark.ml
> .regression.LinearRegression.train(LinearRegression.scala:180)
> at org.apache.spark.ml
> .regression.LinearRegression.train(LinearRegression.scala:70)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> at
> com.ss.ml.regression.MultipleRegression$.delayedEndpoint$com$ss$ml$regression$MultipleRegression$1(MultipleRegression.scala:36)
> at
> com.ss.ml.regression.MultipleRegression$delayedInit$body.apply(MultipleRegression.scala:12)
> at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
> at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
> at scala.App$$anonfun$main$1.apply(App.scala:76)
> at scala.App$$anonfun$main$1.apply(App.scala:76)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
> at scala.App$class.main(App.scala:76)
> at
> com.ss.ml.regression.MultipleRegression$.main(MultipleRegression.scala:12)
> at com.ss.ml.regression.MultipleRegression.main(MultipleRegression.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
>
>
> Does anyone know what is going wrong here?
>
> Many thanks
>
> --
> *Meeraj Kunnumpurath*
>
>
> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597
> <07702%20693597>*
>
> *00 971 50 409 0169 <+971%2050%20409%200169>mee...@servicesymphony.com
> *
>
>
>
>
> --
> *Meeraj Kunnumpurath*
>
>
> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597
> <07702%20693597>*
>
> *00 971 50 409 0169 <+971%2050%20409%200169>mee...@servicesymphony.com
> *
>


Re: Linear Regression Error

2016-10-12 Thread Meeraj Kunnumpurath
If I drop the last feature on the third model, the error seems to go away.

On Wed, Oct 12, 2016 at 11:52 PM, Meeraj Kunnumpurath <
mee...@servicesymphony.com> wrote:

> Hello,
>
> I have some code trying to compare linear regression coefficients with
> three sets of features, as shown below. On the third one, I get an
> assertion error.
>
> This is the code,
>
> object MultipleRegression extends App {
>
>
>
>   val spark = SparkSession.builder().appName("Regression Model 
> Builder").master("local").getOrCreate()
>
>   import spark.implicits._
>
>   val training = build("kc_house_train_data.csv", "train", spark)
>   val test = build("kc_house_test_data.csv", "test", spark)
>
>   val lr = new LinearRegression()
>
>   val m1 = lr.fit(training.map(r => buildLp(r, "sqft_living", "bedrooms", 
> "bathrooms", "lat", "long")))
>   println(s"Coefficients: ${m1.coefficients}, Intercept: ${m1.intercept}")
>
>   val m2 = lr.fit(training.map(r => buildLp(r, "sqft_living", "bedrooms", 
> "bathrooms", "lat", "long", "bed_bath_rooms")))
>   println(s"Coefficients: ${m2.coefficients}, Intercept: ${m2.intercept}")
>
>   val m3 = lr.fit(training.map(r => buildLp(r, "sqft_living", "bedrooms", 
> "bathrooms", "lat", "long", "bed_bath_rooms", "bedrooms_squared", 
> "log_sqft_living", "lat_plus_long")))
>   println(s"Coefficients: ${m3.coefficients}, Intercept: ${m3.intercept}")
>
>
>   def build(path: String, view: String, spark: SparkSession) = {
>
> val toDouble = udf((x: String) => x.toDouble)
> val product = udf((x: Double, y: Double) => x * y)
> val sum = udf((x: Double, y: Double) => x + y)
> val log = udf((x: Double) => scala.math.log(x))
>
> spark.read.
>   option("header", "true").
>   csv(path).
>   withColumn("sqft_living", toDouble('sqft_living)).
>   withColumn("price", toDouble('price)).
>   withColumn("bedrooms", toDouble('bedrooms)).
>   withColumn("bathrooms", toDouble('bathrooms)).
>   withColumn("lat", toDouble('lat)).
>   withColumn("long", toDouble('long)).
>   withColumn("bedrooms_squared", product('bedrooms, 'bedrooms)).
>   withColumn("bed_bath_rooms", product('bedrooms, 'bathrooms)).
>   withColumn("lat_plus_long", sum('lat, 'long)).
>   withColumn("log_sqft_living", log('sqft_living))
>
>   }
>
>   def buildLp(r: Row, input: String*) = {
> var features = input.map(r.getAs[Double](_)).toArray
> new LabeledPoint(r.getAs[Double]("price"), Vectors.dense(features))
>   }
>
> }
>
>
> This is the error I get.
>
> Exception in thread "main" java.lang.AssertionError: assertion failed:
> lapack.dppsv returned 9.
> at scala.Predef$.assert(Predef.scala:170)
> at org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(
> CholeskyDecomposition.scala:40)
> at org.apache.spark.ml.optim.WeightedLeastSquares.fit(
> WeightedLeastSquares.scala:140)
> at org.apache.spark.ml.regression.LinearRegression.
> train(LinearRegression.scala:180)
> at org.apache.spark.ml.regression.LinearRegression.
> train(LinearRegression.scala:70)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> at com.ss.ml.regression.MultipleRegression$.delayedEndpoint$com$ss$ml$
> regression$MultipleRegression$1(MultipleRegression.scala:36)
> at com.ss.ml.regression.MultipleRegression$delayedInit$body.apply(
> MultipleRegression.scala:12)
> at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
> at scala.runtime.AbstractFunction0.apply$mcV$
> sp(AbstractFunction0.scala:12)
> at scala.App$$anonfun$main$1.apply(App.scala:76)
> at scala.App$$anonfun$main$1.apply(App.scala:76)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.generic.TraversableForwarder$class.
> foreach(TraversableForwarder.scala:35)
> at scala.App$class.main(App.scala:76)
> at com.ss.ml.regression.MultipleRegression$.main(
> MultipleRegression.scala:12)
> at com.ss.ml.regression.MultipleRegression.main(MultipleRegression.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
>
>
> Does anyone know what is going wrong here?
>
> Many thanks
>
> --
> *Meeraj Kunnumpurath*
>
>
> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*
>
> *00 971 50 409 0169mee...@servicesymphony.com *
>



-- 
*Meeraj Kunnumpurath*


*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*

*00 971 50 409 0169mee...@servicesymphony.com *


Linear Regression Error

2016-10-12 Thread Meeraj Kunnumpurath
Hello,

I have some code trying to compare linear regression coefficients with
three sets of features, as shown below. On the third one, I get an
assertion error.

This is the code,

object MultipleRegression extends App {



  val spark = SparkSession.builder().appName("Regression Model
Builder").master("local").getOrCreate()

  import spark.implicits._

  val training = build("kc_house_train_data.csv", "train", spark)
  val test = build("kc_house_test_data.csv", "test", spark)

  val lr = new LinearRegression()

  val m1 = lr.fit(training.map(r => buildLp(r, "sqft_living",
"bedrooms", "bathrooms", "lat", "long")))
  println(s"Coefficients: ${m1.coefficients}, Intercept: ${m1.intercept}")

  val m2 = lr.fit(training.map(r => buildLp(r, "sqft_living",
"bedrooms", "bathrooms", "lat", "long", "bed_bath_rooms")))
  println(s"Coefficients: ${m2.coefficients}, Intercept: ${m2.intercept}")

  val m3 = lr.fit(training.map(r => buildLp(r, "sqft_living",
"bedrooms", "bathrooms", "lat", "long", "bed_bath_rooms",
"bedrooms_squared", "log_sqft_living", "lat_plus_long")))
  println(s"Coefficients: ${m3.coefficients}, Intercept: ${m3.intercept}")


  def build(path: String, view: String, spark: SparkSession) = {

val toDouble = udf((x: String) => x.toDouble)
val product = udf((x: Double, y: Double) => x * y)
val sum = udf((x: Double, y: Double) => x + y)
val log = udf((x: Double) => scala.math.log(x))

spark.read.
  option("header", "true").
  csv(path).
  withColumn("sqft_living", toDouble('sqft_living)).
  withColumn("price", toDouble('price)).
  withColumn("bedrooms", toDouble('bedrooms)).
  withColumn("bathrooms", toDouble('bathrooms)).
  withColumn("lat", toDouble('lat)).
  withColumn("long", toDouble('long)).
  withColumn("bedrooms_squared", product('bedrooms, 'bedrooms)).
  withColumn("bed_bath_rooms", product('bedrooms, 'bathrooms)).
  withColumn("lat_plus_long", sum('lat, 'long)).
  withColumn("log_sqft_living", log('sqft_living))

  }

  def buildLp(r: Row, input: String*) = {
var features = input.map(r.getAs[Double](_)).toArray
new LabeledPoint(r.getAs[Double]("price"), Vectors.dense(features))
  }

}


This is the error I get.

Exception in thread "main" java.lang.AssertionError: assertion failed:
lapack.dppsv returned 9.
at scala.Predef$.assert(Predef.scala:170)
at
org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:40)
at
org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:140)
at
org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
at
org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:70)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
at
com.ss.ml.regression.MultipleRegression$.delayedEndpoint$com$ss$ml$regression$MultipleRegression$1(MultipleRegression.scala:36)
at
com.ss.ml.regression.MultipleRegression$delayedInit$body.apply(MultipleRegression.scala:12)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at
com.ss.ml.regression.MultipleRegression$.main(MultipleRegression.scala:12)
at com.ss.ml.regression.MultipleRegression.main(MultipleRegression.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)


Does anyone know what is going wrong here?

Many thanks

-- 
*Meeraj Kunnumpurath*


*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*

*00 971 50 409 0169mee...@servicesymphony.com *