[GitHub] srowen commented on a change in pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

GitBox Fri, 21 Dec 2018 06:49:31 -0800

srowen commented on a change in pull request #21632: [SPARK-19591][ML][MLlib] 
Add sample weights to decision trees
URL: https://github.com/apache/spark/pull/21632#discussion_r243600096


 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala
 ##########
 @@ -37,4 +37,9 @@ case class LabeledPoint(@Since("2.0.0") label: Double, 
@Since("2.0.0") features:
   override def toString: String = {
     s"($label,$features)"
   }
+
+  private[spark] def toInstance(weight: Double): Instance = {
+    Instance(label, weight, features)
 
 Review comment:
   LabeledPoint is an old class and used everywhere weights don't matter. 
Terminology inconsistency aside, it makes sense to have a separate class with 
the weight (rather than carry it around where not needed) and that arose with 
the newer .ml package. I'd generally prefer to "point" dependencies from .ml to 
.mllib and not the other way.
   
   I suppose Instance could or should have subclassed LabeledPoint to add 
weights, and likewise OffsetInstance, so that the conversion is optional one 
way, but I wouldn't touch that here.
   
   I'd also see if you can minimize the amount of conversion between 
LabeledPoint and Instance in the test code. It's necessary in most places but 
we can probably factor out some of it so every caller doesn't have to convert.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] srowen commented on a change in pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

Reply via email to