Github user srowen commented on the issue: https://github.com/apache/spark/pull/13440 Yeah I take your point that the trait Impurity already defines two methods, only one of which is implemented for each of the subclasses. It's already a funky design that probably should have been generalized differently. I think a rewrite for Spark 3 would be worthwhile, personally. I'm also not quite sure of the difference between the Impurity and ImpurityCalculator class; it seems like Impurity should fold into ImpurityCalculator. Is the single method we really want to define something like `computeInformationGain(ImpurityCalculator, ImpurityCalculator)`? even the new method you've added is not directly computing info gain, nor were the existing ones in Impurity. But that's the thing we need and abstraction for over several implementations, it seems. Well, I think either this gets a bigger redesign in 3.0, or we try to get it into 2.5 and accept some API changes. I think I lean towards a bolder breaking change to fix it up in 3.0, unless there's a pressing need for this metric.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org