Re: Isotonic Regression, run method overloaded Error

2016-07-11 Thread Fridtjof Sander
Spark's implementation does perform PAVA on each partition only to then 
collect each result to the driver and to perform PAVA again on the 
collected results. The hope of that is, that enough data is pooled, so 
that the the last step does not exceed the drivers memory limits. This 
assumption does of course not generally hold. Just consider what 
happens, if the data is already correctly sorted. In that case nothing 
is pooled and model size roughly equals data size. Spark's IR model 
saves boundaries and predictions as double arrays instead, so the 
(unpooled) data has to fit into memory.


https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala

Am 11.07.2016 um 17:06 schrieb Yanbo Liang:
IsotonicRegression can handle feature column of vector type. It will 
extract the a certain index (controlled by param "featureIndex") of 
this feature vector and feed it into model training. It will perform 
Pool adjacent violators algorithms on each partition, so it's 
distributed and the data is not necessary to fit into memory of a 
single machine.

The following code snippets can work well on my machine:

|val labels = Seq(1, 2, 3, 1, 6, 17, 16, 17, 18) val dataset = 
spark.createDataFrame( labels.zipWithIndex.map { case (label, i) => 
(label, Vectors.dense(Array(i.toDouble, i.toDouble + 1.0)), 1.0) } 
).toDF("label", "features", "weight") val ir = new 
IsotonicRegression().setIsotonic(true) val model = ir.fit(dataset) val 
predictions = model .transform(dataset) .select("prediction").rdd.map 
{ case Row(pred) => pred }.collect() assert(predictions === Array(1, 
2, 2, 2, 6, 16.5, 16.5, 17, 18)) |


Thanks
Yanbo

2016-07-11 6:14 GMT-07:00 Fridtjof Sander 
<fridtjof.san...@googlemail.com <mailto:fridtjof.san...@googlemail.com>>:


Hi Swaroop,

from my understanding, Isotonic Regression is currently limited to
data with 1 feature plus weight and label. Also the entire data is
required to fit into memory of a single machine.
I did some work on the latter issue but discontinued the project,
because I felt no one really needed it. I'd be happy to resume my
work on Spark's IR implementation, but I fear there won't be a
quick for your issue.

Fridtjof


Am 08.07.2016 um 22:38 schrieb dsp:

Hi I am trying to perform Isotonic Regression on a data set
with 9 features
and a label.
When I run the algorithm similar to the way mentioned on MLlib
page, I get
the error saying

/*error:* overloaded method value run with alternatives:
(input: org.apache.spark.api.java.JavaRDD[(java.lang.Double,
java.lang.Double,

java.lang.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel

   (input: org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
scala.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
  cannot be applied to
(org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
scala.Double, scala.Double, scala.Double, scala.Double,
scala.Double,
scala.Double, scala.Double, scala.Double, scala.Double,
scala.Double,
scala.Double)])
  val model = new
IsotonicRegression().setIsotonic(true).run(training)/

For the may given in the sample code, it looks like it can be
done only for
dataset with a single feature because run() method can accept
only three
parameters leaving which already has a label and a default
value leaving
place for only one variable.
So, How can this be done for multiple variables ?

Regards,
Swaroop



--
View this message in context:
    
http://apache-spark-user-list.1001560.n3.nabble.com/Isotonic-Regression-run-method-overloaded-Error-tp27313.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>






Re: Isotonic Regression, run method overloaded Error

2016-07-11 Thread Yanbo Liang
IsotonicRegression can handle feature column of vector type. It will
extract the a certain index (controlled by param "featureIndex") of this
feature vector and feed it into model training. It will perform Pool
adjacent violators algorithms on each partition, so it's distributed and
the data is not necessary to fit into memory of a single machine.
The following code snippets can work well on my machine:

val labels = Seq(1, 2, 3, 1, 6, 17, 16, 17, 18)
val dataset = spark.createDataFrame(
  labels.zipWithIndex.map { case (label, i) =>
(label, Vectors.dense(Array(i.toDouble, i.toDouble + 1.0)), 1.0)
  }
).toDF("label", "features", "weight")

val ir = new IsotonicRegression().setIsotonic(true)

val model = ir.fit(dataset)

val predictions = model
  .transform(dataset)
  .select("prediction").rdd.map { case Row(pred) =>
  pred
}.collect()

assert(predictions === Array(1, 2, 2, 2, 6, 16.5, 16.5, 17, 18))


Thanks
Yanbo

2016-07-11 6:14 GMT-07:00 Fridtjof Sander <fridtjof.san...@googlemail.com>:

> Hi Swaroop,
>
> from my understanding, Isotonic Regression is currently limited to data
> with 1 feature plus weight and label. Also the entire data is required to
> fit into memory of a single machine.
> I did some work on the latter issue but discontinued the project, because
> I felt no one really needed it. I'd be happy to resume my work on Spark's
> IR implementation, but I fear there won't be a quick for your issue.
>
> Fridtjof
>
>
> Am 08.07.2016 um 22:38 schrieb dsp:
>
>> Hi I am trying to perform Isotonic Regression on a data set with 9
>> features
>> and a label.
>> When I run the algorithm similar to the way mentioned on MLlib page, I get
>> the error saying
>>
>> /*error:* overloaded method value run with alternatives:
>> (input: org.apache.spark.api.java.JavaRDD[(java.lang.Double,
>> java.lang.Double,
>>
>> java.lang.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
>> 
>>(input: org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
>> scala.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
>>   cannot be applied to (org.apache.spark.rdd.RDD[(scala.Double,
>> scala.Double,
>> scala.Double, scala.Double, scala.Double, scala.Double, scala.Double,
>> scala.Double, scala.Double, scala.Double, scala.Double, scala.Double,
>> scala.Double)])
>>   val model = new
>> IsotonicRegression().setIsotonic(true).run(training)/
>>
>> For the may given in the sample code, it looks like it can be done only
>> for
>> dataset with a single feature because run() method can accept only three
>> parameters leaving which already has a label and a default value leaving
>> place for only one variable.
>> So, How can this be done for multiple variables ?
>>
>> Regards,
>> Swaroop
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Isotonic-Regression-run-method-overloaded-Error-tp27313.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Isotonic Regression, run method overloaded Error

2016-07-11 Thread Fridtjof Sander

Hi Swaroop,

from my understanding, Isotonic Regression is currently limited to data 
with 1 feature plus weight and label. Also the entire data is required 
to fit into memory of a single machine.
I did some work on the latter issue but discontinued the project, 
because I felt no one really needed it. I'd be happy to resume my work 
on Spark's IR implementation, but I fear there won't be a quick for your 
issue.


Fridtjof

Am 08.07.2016 um 22:38 schrieb dsp:

Hi I am trying to perform Isotonic Regression on a data set with 9 features
and a label.
When I run the algorithm similar to the way mentioned on MLlib page, I get
the error saying

/*error:* overloaded method value run with alternatives:
(input: org.apache.spark.api.java.JavaRDD[(java.lang.Double,
java.lang.Double,
java.lang.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel

   (input: org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
scala.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
  cannot be applied to (org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
scala.Double, scala.Double, scala.Double, scala.Double, scala.Double,
scala.Double, scala.Double, scala.Double, scala.Double, scala.Double,
scala.Double)])
  val model = new
IsotonicRegression().setIsotonic(true).run(training)/

For the may given in the sample code, it looks like it can be done only for
dataset with a single feature because run() method can accept only three
parameters leaving which already has a label and a default value leaving
place for only one variable.
So, How can this be done for multiple variables ?

Regards,
Swaroop



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Isotonic-Regression-run-method-overloaded-Error-tp27313.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Isotonic Regression, run method overloaded Error

2016-07-10 Thread Yanbo Liang
Hi Swaroop,

Would you mind to share your code that others can help you to figure out
what caused this error?
I can run the isotonic regression examples well.

Thanks
Yanbo

2016-07-08 13:38 GMT-07:00 dsp <durgaswar...@gmail.com>:

> Hi I am trying to perform Isotonic Regression on a data set with 9 features
> and a label.
> When I run the algorithm similar to the way mentioned on MLlib page, I get
> the error saying
>
> /*error:* overloaded method value run with alternatives:
> (input: org.apache.spark.api.java.JavaRDD[(java.lang.Double,
> java.lang.Double,
>
> java.lang.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
> 
>   (input: org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
> scala.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
>  cannot be applied to (org.apache.spark.rdd.RDD[(scala.Double,
> scala.Double,
> scala.Double, scala.Double, scala.Double, scala.Double, scala.Double,
> scala.Double, scala.Double, scala.Double, scala.Double, scala.Double,
> scala.Double)])
>  val model = new
> IsotonicRegression().setIsotonic(true).run(training)/
>
> For the may given in the sample code, it looks like it can be done only for
> dataset with a single feature because run() method can accept only three
> parameters leaving which already has a label and a default value leaving
> place for only one variable.
> So, How can this be done for multiple variables ?
>
> Regards,
> Swaroop
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Isotonic-Regression-run-method-overloaded-Error-tp27313.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Isotonic Regression, run method overloaded Error

2016-07-08 Thread dsp
Hi I am trying to perform Isotonic Regression on a data set with 9 features
and a label. 
When I run the algorithm similar to the way mentioned on MLlib page, I get
the error saying

/*error:* overloaded method value run with alternatives:
(input: org.apache.spark.api.java.JavaRDD[(java.lang.Double,
java.lang.Double,
java.lang.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel

  (input: org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
scala.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
 cannot be applied to (org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
scala.Double, scala.Double, scala.Double, scala.Double, scala.Double,
scala.Double, scala.Double, scala.Double, scala.Double, scala.Double,
scala.Double)])
 val model = new
IsotonicRegression().setIsotonic(true).run(training)/

For the may given in the sample code, it looks like it can be done only for
dataset with a single feature because run() method can accept only three
parameters leaving which already has a label and a default value leaving
place for only one variable. 
So, How can this be done for multiple variables ? 

Regards,
Swaroop



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Isotonic-Regression-run-method-overloaded-Error-tp27313.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org