[GitHub] spark pull request #15149: [SPARK-17057] [ML] ProbabilisticClassifierModels'...

MLnick Wed, 21 Sep 2016 03:46:53 -0700

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15149#discussion_r79805782
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala
 ---
    @@ -56,6 +56,21 @@ class ProbabilisticClassifierSuite extends SparkFunSuite 
{
         val testModel = new TestProbabilisticClassificationModel("myuid", 2, 2)
         assert(testModel.friendlyPredict(Vectors.dense(Array(1.0, 2.0))) === 
1.0)
       }
    +
    +  test("test tiebreak") {
    +    val testModel = new TestProbabilisticClassificationModel("myuid", 2, 2)
    +      .setThresholds(Array(0.4, 0.4))
    +    assert(testModel.friendlyPredict(Vectors.dense(Array(0.6, 0.6))) === 
0.0)
    +  }
    +
    +  test("bad thresholds") {
    +    intercept[IllegalArgumentException] {
    --- End diff --
    
    Honestly I don't really see the theoretical justification even for the 
`cutoff` approach - a hard threshold works in the binary case but not really in 
multi-class. Anything related to thresholds I've seen is mostly related to 
OneVsRest scenarios where the threshold can be adjusted per class...
    
    So yes my point here was that if we are just copying the approach from the 
R package, let's be consistent with it.
    
    But yes, the main issue is ensuring they're all positive, so I'm ok with 
removing that constraint.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15149: [SPARK-17057] [ML] ProbabilisticClassifierModels'...

Reply via email to