[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

srowen Sat, 22 Sep 2018 14:05:44 -0700

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/13440
  
    Yeah I take your point that the trait Impurity already defines two methods, 
only one of which is implemented for each of the subclasses. It's already a 
funky design that probably should have been generalized differently. I think a 
rewrite for Spark 3 would be worthwhile, personally. I'm also not quite sure of 
the difference between the Impurity and ImpurityCalculator class; it seems like 
Impurity should fold into ImpurityCalculator. 
    
    Is the single method we really want to define something like 
`computeInformationGain(ImpurityCalculator, ImpurityCalculator)`? even the new 
method you've added is not directly computing info gain, nor were the existing 
ones in Impurity. But that's the thing we need and abstraction for over several 
implementations, it seems.
    
    Well, I think either this gets a bigger redesign in 3.0, or we try to get 
it into 2.5 and accept some API changes. I think I lean towards a bolder 
breaking change to fix it up in 3.0, unless there's a pressing need for this 
metric.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

Reply via email to