[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

WeichenXu123 Fri, 14 Jul 2017 10:43:10 -0700

Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17373
  
    RawPrediction is not probability
    It's range is from -inf to inf
    Softmax(raw predictions) get probabilities
    It's range is from 0 to 1
    Thanks!
    
    Sent from my iPhone
    
    On 14 Jul 2017, at 6:38 AM, Leonard HÃ¶velmann 
<[email protected]<mailto:[email protected]>> wrote:
    
    
    @WeichenXu123<https://github.com/weichenxu123> I deleted my last comment 
because I wasn't quite sure, if I had no mistakes at other places. As I 
described above, I performed your changes in version 2.1. For small datasets, I 
get raw predictions, that are not in [0, 1]. You should be able to check it, 
using this small test case:
    
    import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
    import org.apache.spark.ml.linalg.Vectors
    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
    import org.apache.spark.sql.types.{IntegerType, StructType}
    import org.apache.spark.sql.{Row, SparkSession}
    
    /**
      * Created by Leonard HÃ¶velmann 
([email protected]<mailto:[email protected]>) on 
14.07.2017.
      */
    object TestProb {
    
      def main(args: Array[String]) = {
        val spark = SparkSession.builder().master("local[*]").getOrCreate()
    
        val rowSchema = new StructType().add("class", 
IntegerType).add("features", org.apache.spark.ml.linalg.SQLDataTypes.VectorType)
    
        val testData: RDD[Row] = spark.sparkContext.parallelize(Seq(
          new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 
0.4, 0.5))), rowSchema).asInstanceOf[Row],
          new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 
0.4, 0.5))), rowSchema).asInstanceOf[Row]
        ))
    
        val testDataDf = spark.sqlContext.createDataFrame(testData, rowSchema)
    
        val mlp = new 
MultilayerPerceptronClassifier().setFeaturesCol("features").setLabelCol("class").setLayers(Array(5,
 4, 3))
    
        val mlpModel = mlp.fit(testDataDf)
    
        mlpModel.transform(testDataDf).show(6)
      }
    
    }
    
    
    Using this, I get the following results:
    
    
+-----+--------------------+--------------------+--------------------+----------+
    |class|            features|       rawPrediction|         
probability|prediction|
    
+-----+--------------------+--------------------+--------------------+----------+
    |    0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|      
 0.0|
    |    0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|      
 0.0|
    |    1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|      
 1.0|
    |    1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|      
 1.0|
    |    2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|      
 2.0|
    |    2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|      
 2.0|
    
+-----+--------------------+--------------------+--------------------+----------+
    
    
    Does this work in your code?
    
    â
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on 
GitHub<https://github.com/apache/spark/pull/17373#issuecomment-315360925>, or 
mute the 
thread<https://github.com/notifications/unsubscribe-auth/ASWEkjtBMz-Hk6Uujc-VS1Nn5fnsPOhhks5sN27_gaJpZM4MjfvG>.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

Reply via email to