imatiach-msft commented on a change in pull request #21632:
[SPARK-19591][ML][MLlib] Add sample weights to decision trees
URL: https://github.com/apache/spark/pull/21632#discussion_r247382545
##########
File path: mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala
##########
@@ -37,4 +37,9 @@ case class LabeledPoint(@Since("2.0.0") label: Double,
@Since("2.0.0") features:
override def toString: String = {
s"($label,$features)"
}
+
+ private[spark] def toInstance(weight: Double): Instance = {
+ Instance(label, weight, features)
Review comment:
"I'd generally prefer to "point" dependencies from .ml to .mllib and not the
other way."
it seems there are actually two LabeledPoint classes:
/mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala
/mllib/src/main/scala/org/apache/spark/mllib/regression/LabeledPoint.scala
I'm not sure why the mllib one is under regression but that is the old one
that is used everywhere.
In this PR the toInstance methods are on the new version in ml, not mllib,
so we aren't depending from mllib to ml.
However, I could move the method to Instance if you prefer. I tried locally
to add the code below but the changes ended up looking worse than before - I
had to use Instance.convert(...) instead of just .toInstance which actually
seemed cleaner. Would it be better for me to add implicit methods then? But
then the code would probably look the same, with a .toInstance, except there
would be an extra import probably. Not sure what would be the better way to
handle this.
Here's the conversion method I defined:
private[ml] object Instance {
/**
* Convert a LabeledPoint into an Instance.
* @param labeledPoint LabeledPoint to convert.
* @param weight Optional weight for the instance.
* @return Instance representation for this LabeledPoint.
*/
def convert(labeledPoint: LabeledPoint, weight: Double = 1.0): Instance =
Instance(labeledPoint.label, weight, labeledPoint.features)
}
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]