Github user LeoIV commented on the issue: https://github.com/apache/spark/pull/17373 @WeichenXu123 I deleted my last comment because I wasn't quite sure, if I had no mistakes at other places. As I described above, I performed your changes in version 2.1. For **small** datasets, I get raw predictions, that are not in [0, 1]. You should be able to check it, using this small test case: ``` import org.apache.spark.ml.classification.MultilayerPerceptronClassifier import org.apache.spark.ml.linalg.Vectors import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema import org.apache.spark.sql.types.{IntegerType, StructType} import org.apache.spark.sql.{Row, SparkSession} /** * Created by Leonard Hövelmann (leonard.hoevelm...@adesso.de) on 14.07.2017. */ object TestProb { def main(args: Array[String]) = { val spark = SparkSession.builder().master("local[*]").getOrCreate() val rowSchema = new StructType().add("class", IntegerType).add("features", org.apache.spark.ml.linalg.SQLDataTypes.VectorType) val testData: RDD[Row] = spark.sparkContext.parallelize(Seq( new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row], new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row], new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row], new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3, 0.4, 0.5))), rowSchema).asInstanceOf[Row], new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 0.4, 0.5))), rowSchema).asInstanceOf[Row], new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8, 0.4, 0.5))), rowSchema).asInstanceOf[Row] )) val testDataDf = spark.sqlContext.createDataFrame(testData, rowSchema) val mlp = new MultilayerPerceptronClassifier().setFeaturesCol("features").setLabelCol("class").setLayers(Array(5, 4, 3)) val mlpModel = mlp.fit(testDataDf) mlpModel.transform(testDataDf).show(6) } } ``` Using this, I get the following results: ``` +-----+--------------------+--------------------+--------------------+----------+ |class| features| rawPrediction| probability|prediction| +-----+--------------------+--------------------+--------------------+----------+ | 0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...| 0.0| | 0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...| 0.0| | 1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...| 1.0| | 1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...| 1.0| | 2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...| 2.0| | 2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...| 2.0| +-----+--------------------+--------------------+--------------------+----------+ ``` Does this work in your code?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org