Re: Isotonic Regression, run method overloaded Error

Fridtjof Sander Mon, 11 Jul 2016 08:20:44 -0700

Spark's implementation does perform PAVA on each partition only to thencollect each result to the driver and to perform PAVA again on thecollected results. The hope of that is, that enough data is pooled, sothat the the last step does not exceed the drivers memory limits. Thisassumption does of course not generally hold. Just consider whathappens, if the data is already correctly sorted. In that case nothingis pooled and model size roughly equals data size. Spark's IR modelsaves boundaries and predictions as double arrays instead, so the(unpooled) data has to fit into memory.


https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala


Am 11.07.2016 um 17:06 schrieb Yanbo Liang:

IsotonicRegression can handle feature column of vector type. It willextract the a certain index (controlled by param "featureIndex") ofthis feature vector and feed it into model training. It will performPool adjacent violators algorithms on each partition, so it'sdistributed and the data is not necessary to fit into memory of asingle machine.

The following code snippets can work well on my machine:

|val labels = Seq(1, 2, 3, 1, 6, 17, 16, 17, 18) val dataset =spark.createDataFrame( labels.zipWithIndex.map { case (label, i) =>(label, Vectors.dense(Array(i.toDouble, i.toDouble + 1.0)), 1.0) }).toDF("label", "features", "weight") val ir = newIsotonicRegression().setIsotonic(true) val model = ir.fit(dataset) valpredictions = model .transform(dataset) .select("prediction").rdd.map{ case Row(pred) => pred }.collect() assert(predictions === Array(1,2, 2, 2, 6, 16.5, 16.5, 17, 18)) |


Thanks
Yanbo

2016-07-11 6:14 GMT-07:00 Fridtjof Sander<fridtjof.san...@googlemail.com <mailto:fridtjof.san...@googlemail.com>>:


    Hi Swaroop,

    from my understanding, Isotonic Regression is currently limited to
    data with 1 feature plus weight and label. Also the entire data is
    required to fit into memory of a single machine.
    I did some work on the latter issue but discontinued the project,
    because I felt no one really needed it. I'd be happy to resume my
    work on Spark's IR implementation, but I fear there won't be a
    quick for your issue.

    Fridtjof


    Am 08.07.2016 um 22:38 schrieb dsp:

        Hi I am trying to perform Isotonic Regression on a data set
        with 9 features
        and a label.
        When I run the algorithm similar to the way mentioned on MLlib
        page, I get
        the error saying

        /*error:* overloaded method value run with alternatives:
        (input: org.apache.spark.api.java.JavaRDD[(java.lang.Double,
        java.lang.Double,
        
java.lang.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
        <and>
           (input: org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
        scala.Double)])org.apache.spark.mllib.regression.IsotonicRegressionModel
          cannot be applied to
        (org.apache.spark.rdd.RDD[(scala.Double, scala.Double,
        scala.Double, scala.Double, scala.Double, scala.Double,
        scala.Double,
        scala.Double, scala.Double, scala.Double, scala.Double,
        scala.Double,
        scala.Double)])
                  val model = new
        IsotonicRegression().setIsotonic(true).run(training)/

        For the may given in the sample code, it looks like it can be
        done only for
        dataset with a single feature because run() method can accept
        only three
        parameters leaving which already has a label and a default
        value leaving
        place for only one variable.
        So, How can this be done for multiple variables ?

        Regards,
        Swaroop



        --
        View this message in context:
        
http://apache-spark-user-list.1001560.n3.nabble.com/Isotonic-Regression-run-method-overloaded-Error-tp27313.html
        Sent from the Apache Spark User List mailing list archive at
        Nabble.com.

        ---------------------------------------------------------------------
        To unsubscribe e-mail: user-unsubscr...@spark.apache.org
        <mailto:user-unsubscr...@spark.apache.org>



    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>

Re: Isotonic Regression, run method overloaded Error

Reply via email to