[GitHub] spark pull request #21719: [SPARK-24747][ML] Make Instrumentation class more...

jkbradley Mon, 16 Jul 2018 13:02:01 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21719#discussion_r202805710
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
    @@ -19,45 +19,60 @@ package org.apache.spark.ml.util
     
     import java.util.UUID
     
    -import scala.reflect.ClassTag
    +import scala.util.{Failure, Success, Try}
    +import scala.util.control.NonFatal
     
     import org.json4s._
     import org.json4s.JsonDSL._
     import org.json4s.jackson.JsonMethods._
     
     import org.apache.spark.internal.Logging
    -import org.apache.spark.ml.{Estimator, Model}
    -import org.apache.spark.ml.param.Param
    +import org.apache.spark.ml.{Estimator, Model, PipelineStage}
    +import org.apache.spark.ml.param.{Param, Params}
     import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.Dataset
     import org.apache.spark.util.Utils
     
     /**
      * A small wrapper that defines a training session for an estimator, and 
some methods to log
      * useful information during this session.
    - *
    - * A new instance is expected to be created within fit().
    - *
    - * @param estimator the estimator that is being fit
    - * @param dataset the training dataset
    - * @tparam E the type of the estimator
      */
    -private[spark] class Instrumentation[E <: Estimator[_]] private (
    -    val estimator: E,
    -    val dataset: RDD[_]) extends Logging {
    +private[spark] class Instrumentation extends Logging {
     
       private val id = UUID.randomUUID()
    -  private val prefix = {
    +  private val shortId = id.toString.take(8)
    +  private val prefix = s"[$shortId] "
    +
    +  // TODO: update spark.ml to use new Instrumentation APIs and remove this 
constructor
    +  var stage: Params = _
    --- End diff --
    
    I'd recommend we either plan to remove "stage" or change "logPipelineStage" 
so it only allows setting "stage" once.  If we go with the former, how about 
leaving a note to remove "stage" once spark.ml code is migrated to use the new 
logParams() method?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21719: [SPARK-24747][ML] Make Instrumentation class more...

Reply via email to