Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/4906#discussion_r26439865
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala
---
@@ -47,6 +47,18 @@ trait Loss extends Serializable {
* @param data Training dataset: RDD of
[[org.apache.spark.mllib.regression.LabeledPoint]].
* @return Measure of model error on data
*/
- def computeError(model: TreeEnsembleModel, data: RDD[LabeledPoint]):
Double
+ def computeError(model: TreeEnsembleModel, data: RDD[LabeledPoint]):
Double = {
+ data.map(point => computeError(model.predict(point.features),
point)).mean()
+ }
+
+ /**
+ * Method to calculate loss when the predictions are already known.
+ * Note: This method is used in the method evaluateEachIteration to
avoid recomputing the
+ * predicted values from previously fit trees.
+ * @param prediction Predicted label.
+ * @param datum LabeledPoint
+ * @return Measure of model error on datapoint.
+ */
+ def computeError(prediction: Double, datum: LabeledPoint): Double
--- End diff --
This should really take "label" instead of "datum" since the feature should
never be needed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]