Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17373
RawPrediction is not probability
It's range is from -inf to inf
Softmax(raw predictions) get probabilities
It's range is from 0 to 1
Thanks!
Sent from my iPhone
On 14 Jul 2017, at 6:38 AM, Leonard Hövelmann
<[email protected]<mailto:[email protected]>> wrote:
@WeichenXu123<https://github.com/weichenxu123> I deleted my last comment
because I wasn't quite sure, if I had no mistakes at other places. As I
described above, I performed your changes in version 2.1. For small datasets, I
get raw predictions, that are not in [0, 1]. You should be able to check it,
using this small test case:
import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
import org.apache.spark.sql.types.{IntegerType, StructType}
import org.apache.spark.sql.{Row, SparkSession}
/**
* Created by Leonard Hövelmann
([email protected]<mailto:[email protected]>) on
14.07.2017.
*/
object TestProb {
def main(args: Array[String]) = {
val spark = SparkSession.builder().master("local[*]").getOrCreate()
val rowSchema = new StructType().add("class",
IntegerType).add("features", org.apache.spark.ml.linalg.SQLDataTypes.VectorType)
val testData: RDD[Row] = spark.sparkContext.parallelize(Seq(
new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3,
0.4, 0.5))), rowSchema).asInstanceOf[Row],
new GenericRowWithSchema(Array(0, Vectors.dense(Array(0.1, 0.2, 0.3,
0.4, 0.5))), rowSchema).asInstanceOf[Row],
new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3,
0.4, 0.5))), rowSchema).asInstanceOf[Row],
new GenericRowWithSchema(Array(1, Vectors.dense(Array(0.1, 0.5, 0.3,
0.4, 0.5))), rowSchema).asInstanceOf[Row],
new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8,
0.4, 0.5))), rowSchema).asInstanceOf[Row],
new GenericRowWithSchema(Array(2, Vectors.dense(Array(0.1, 0.2, 0.8,
0.4, 0.5))), rowSchema).asInstanceOf[Row]
))
val testDataDf = spark.sqlContext.createDataFrame(testData, rowSchema)
val mlp = new
MultilayerPerceptronClassifier().setFeaturesCol("features").setLabelCol("class").setLayers(Array(5,
4, 3))
val mlpModel = mlp.fit(testDataDf)
mlpModel.transform(testDataDf).show(6)
}
}
Using this, I get the following results:
+-----+--------------------+--------------------+--------------------+----------+
|class| features| rawPrediction|
probability|prediction|
+-----+--------------------+--------------------+--------------------+----------+
| 0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|
0.0|
| 0|[0.1,0.2,0.3,0.4,...|[52.5097295377110...|[1.0,1.1880726027...|
0.0|
| 1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|
1.0|
| 1|[0.1,0.5,0.3,0.4,...|[22.9478511752010...|[4.03649486150668...|
1.0|
| 2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|
2.0|
| 2|[0.1,0.2,0.8,0.4,...|[6.36424366031029...|[4.39122384367774...|
2.0|
+-----+--------------------+--------------------+--------------------+----------+
Does this work in your code?
â
You are receiving this because you were mentioned.
Reply to this email directly, view it on
GitHub<https://github.com/apache/spark/pull/17373#issuecomment-315360925>, or
mute the
thread<https://github.com/notifications/unsubscribe-auth/ASWEkjtBMz-Hk6Uujc-VS1Nn5fnsPOhhks5sN27_gaJpZM4MjfvG>.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]