Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18281#discussion_r125442535
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
    @@ -101,6 +101,36 @@ class OneVsRestSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
         assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix 
absTol 400)
       }
     
    +  test("one-vs-rest: tuning parallelism does not change output") {
    +    val numClasses = 3
    +    val ovaPar1 = new OneVsRest()
    +      .setClassifier(new LogisticRegression)
    +
    +    val ovaModelPar1 = ovaPar1.fit(dataset)
    +
    +    val transformedDatasetPar1 = ovaModelPar1.transform(dataset)
    +
    +    val ovaResultsPar1 = transformedDatasetPar1.select("prediction", 
"label").rdd.map {
    +      row => (row.getDouble(0), row.getDouble(1))
    +    }
    +
    +    val ovaPar2 = new OneVsRest()
    +      .setClassifier(new LogisticRegression)
    +      .setParallelism(2)
    +
    +    val ovaModelPar2 = ovaPar2.fit(dataset)
    +
    +    val transformedDatasetPar2 = ovaModelPar2.transform(dataset)
    +
    +    val ovaResultsPar2 = transformedDatasetPar2.select("prediction", 
"label").rdd.map {
    +      row => (row.getDouble(0), row.getDouble(1))
    +    }
    +
    +    val metricsPar1 = new MulticlassMetrics(ovaResultsPar1)
    +    val metricsPar2 = new MulticlassMetrics(ovaResultsPar2)
    +    assert(metricsPar1.confusionMatrix ~== metricsPar2.confusionMatrix 
absTol 400)
    --- End diff --
    
    Just wondering what is the scale of the confusion matrix. I would actually 
expect that the predictions made from the models would be exactly the same, 
since the models fitted should be the same, independent of parallelism. 
    
    Can we also check model coefficients are identical?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to