IsotonicRegression can handle feature column of vector type. It will extract the a certain index (controlled by param "featureIndex") of this feature vector and feed it into model training. It will perform Pool adjacent violators algorithms on each partition, so it's distributed and the data is not necessary to fit into memory of a single machine. The following code snippets can work well on my machine:
val labels = Seq(1, 2, 3, 1, 6, 17, 16, 17, 18) val dataset = spark.createDataFrame( labels.zipWithIndex.map { case (label, i) => (label, Vectors.dense(Array(i.toDouble, i.toDouble + 1.0)), 1.0) } ).toDF("label", "features", "weight") val ir = new IsotonicRegression().setIsotonic(true) val model = ir.fit(dataset) val predictions = model .transform(dataset) .select("prediction").rdd.map { case Row(pred) => pred }.collect() assert(predictions === Array(1, 2, 2, 2, 6, 16.5, 16.5, 17, 18)) Thanks Yanbo 2016-07-11 6:14 GMT-07:00 Fridtjof Sander <fridtjof.san...@googlemail.com>: > Hi Swaroop, > > from my understanding, Isotonic Regression is currently limited to data > with 1 feature plus weight and label. Also the entire data is required to > fit into memory of a single machine. > I did some work on the latter issue but discontinued the project, because > I felt no one really needed it. I'd be happy to resume my work on Spark's > IR implementation, but I fear there won't be a quick for your issue. > > Fridtjof > > > Am 08.07.2016 um 22:38 schrieb dsp: > >> Hi I am trying to perform Isotonic Regression on a data set with 9 >> features >> and a label. >> When I run the algorithm similar to the way mentioned on MLlib page, I get >> the error saying >> >> /*error:* overloaded method value run with alternatives: >> (input: org.apache.spark.api.java.JavaRDD[(java.lang.Double, >> java.lang.Double, >> >> java.lang.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel >> <and> >> (input: org.apache.spark.rdd.RDD[(scala.Double, scala.Double, >> scala.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel >> cannot be applied to (org.apache.spark.rdd.RDD[(scala.Double, >> scala.Double, >> scala.Double, scala.Double, scala.Double, scala.Double, scala.Double, >> scala.Double, scala.Double, scala.Double, scala.Double, scala.Double, >> scala.Double)]) >> val model = new >> IsotonicRegression().setIsotonic(true).run(training)/ >> >> For the may given in the sample code, it looks like it can be done only >> for >> dataset with a single feature because run() method can accept only three >> parameters leaving which already has a label and a default value leaving >> place for only one variable. >> So, How can this be done for multiple variables ? >> >> Regards, >> Swaroop >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Isotonic-Regression-run-method-overloaded-Error-tp27313.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >