[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

LeoIV Fri, 14 Jul 2017 06:35:44 -0700

Github user LeoIV commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    @WeichenXu123 I deleted my last comment because I wasn't quite sure, if I 
had no mistakes at other places. As I described above, I performed your changes 
in version 2.1. For **small** datasets, I get raw predictions, that are not in 
[0, 1]. You should be able to check it, using this small test case:
    
    ```
    import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
    import org.apache.spark.ml.linalg.Vectors
    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
    import org.apache.spark.sql.types.{IntegerType, StructType}
    import org.apache.spark.sql.{Row, SparkSession}
    
    /**
      * Created by Leonard HÃ¶velmann (leonard.hoevelm...@adesso.de) on 
14.07.2017.
      */
    object TestProb {
    
      def main(args: Array[String]) = {
        val spark = SparkSession.builder().master("local[*]").getOrCreate()
    
        val rowSchema = new StructType().add("class", 
IntegerType).add("features", org.apache.spark.ml.linalg.SQLDataTypes.VectorType)
    
        val testData: RDD[Row] = spark.sparkContext.parallelize(Seq(
          new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 
0.4, 0.5))), rowSchema).asInstanceOf[Row]
        ))
    
        val testDataDf = spark.sqlContext.createDataFrame(testData, rowSchema)
    
        val mlp = new 
MultilayerPerceptronClassifier().setFeaturesCol("features").setLabelCol("class").setLayers(Array(5,
 4, 3))
    
        val mlpModel = mlp.fit(testDataDf)
    
        mlpModel.transform(testDataDf).show(6)
      }
    
    }
    ```
    
    Using this, I get the following results:
    
    ```
    
+-----+--------------------+--------------------+--------------------+----------+
    |class|            features|       rawPrediction|         
probability|prediction|
    
+-----+--------------------+--------------------+--------------------+----------+
    |    0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|      
 0.0|
    |    0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|      
 0.0|
    |    1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|      
 1.0|
    |    1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|      
 1.0|
    |    2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|      
 2.0|
    |    2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|      
 2.0|
    
+-----+--------------------+--------------------+--------------------+----------+
    ```
    
    
    Does this work in your code?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Reply via email to