Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/21719#discussion_r202805710
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
@@ -19,45 +19,60 @@ package org.apache.spark.ml.util
import java.util.UUID
-import scala.reflect.ClassTag
+import scala.util.{Failure, Success, Try}
+import scala.util.control.NonFatal
import org.json4s._
import org.json4s.JsonDSL._
import org.json4s.jackson.JsonMethods._
import org.apache.spark.internal.Logging
-import org.apache.spark.ml.{Estimator, Model}
-import org.apache.spark.ml.param.Param
+import org.apache.spark.ml.{Estimator, Model, PipelineStage}
+import org.apache.spark.ml.param.{Param, Params}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Dataset
import org.apache.spark.util.Utils
/**
* A small wrapper that defines a training session for an estimator, and
some methods to log
* useful information during this session.
- *
- * A new instance is expected to be created within fit().
- *
- * @param estimator the estimator that is being fit
- * @param dataset the training dataset
- * @tparam E the type of the estimator
*/
-private[spark] class Instrumentation[E <: Estimator[_]] private (
- val estimator: E,
- val dataset: RDD[_]) extends Logging {
+private[spark] class Instrumentation extends Logging {
private val id = UUID.randomUUID()
- private val prefix = {
+ private val shortId = id.toString.take(8)
+ private val prefix = s"[$shortId] "
+
+ // TODO: update spark.ml to use new Instrumentation APIs and remove this
constructor
+ var stage: Params = _
--- End diff --
I'd recommend we either plan to remove "stage" or change "logPipelineStage"
so it only allows setting "stage" once. If we go with the former, how about
leaving a note to remove "stage" once spark.ml code is migrated to use the new
logParams() method?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]