imatiach-msft commented on a change in pull request #21632: 
[SPARK-19591][ML][MLlib] Add sample weights to decision trees
URL: https://github.com/apache/spark/pull/21632#discussion_r243369229
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala
 ##########
 @@ -37,4 +37,9 @@ case class LabeledPoint(@Since("2.0.0") label: Double, 
@Since("2.0.0") features:
   override def toString: String = {
     s"($label,$features)"
   }
+
+  private[spark] def toInstance(weight: Double): Instance = {
+    Instance(label, weight, features)
 
 Review comment:
   hmm, interesting, I'm really not sure why we have both LabeledPoint and 
Instance classes, to me they seem very similar.  In addition, we also have 
OffsetInstance which has a toInstance method, but no method to convert an 
Instance to an OffsetInstance or to a LabeledPoint.  It seems it might make 
more sense to have an Instance(label, features), WeightedInstance(label, 
features, weight) and OffsetInstance(label, features, weight, offset), at least 
that would be more consistent in the system and with the terminology.  Or, we 
could have just a single instance with some optional parameters, but I can see 
how it would not make sense to waste memory with two optional params.
   In terms of where the logic to convert LabeledPoint/Instance should live, I 
really don't have an opinion.  I think the most important thing is to be 
consistent, but with these 3 instance/labeledpoint classes I don't see any 
consistency currently.
   However, for this specific PR I think it's more convenient to go from 
LabeledPoints to Instance objects than to create an Instance from a 
LabeledPoint.  Isn't calling a method on an object cleaner than having to call 
a static method?  I agree about the default value though, will make that change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to