srowen commented on a change in pull request #21632: [SPARK-19591][ML][MLlib]
Add sample weights to decision trees
URL: https://github.com/apache/spark/pull/21632#discussion_r243600096
##########
File path: mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala
##########
@@ -37,4 +37,9 @@ case class LabeledPoint(@Since("2.0.0") label: Double,
@Since("2.0.0") features:
override def toString: String = {
s"($label,$features)"
}
+
+ private[spark] def toInstance(weight: Double): Instance = {
+ Instance(label, weight, features)
Review comment:
LabeledPoint is an old class and used everywhere weights don't matter.
Terminology inconsistency aside, it makes sense to have a separate class with
the weight (rather than carry it around where not needed) and that arose with
the newer .ml package. I'd generally prefer to "point" dependencies from .ml to
.mllib and not the other way.
I suppose Instance could or should have subclassed LabeledPoint to add
weights, and likewise OffsetInstance, so that the conversion is optional one
way, but I wouldn't touch that here.
I'd also see if you can minimize the amount of conversion between
LabeledPoint and Instance in the test code. It's necessary in most places but
we can probably factor out some of it so every caller doesn't have to convert.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]