Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21719#discussion_r200497209
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -500,7 +500,7 @@ class LogisticRegression @Since("1.2.0") (
     
         if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
     
    -    val instr = Instrumentation.create(this, dataset)
    +    instr.logContext(this, dataset)
    --- End diff --
    
    It doesn't log anything. I think we should auto-generate `prefix` and keep 
it as a constant. So logs would appear as:
    
    ~~~
    [PREFIX]: instrumentation started
    [PREFIX]: using estimator logReg-abc128
    [PREFIX]: using dataset some hashcode
    [PREFIX]: param maxIter=10
    [PREFIX]: ...
    [PREFIX]: run succeeded/failed
    [PREFIX]: instrumentation ended
    ~~~
    
    We can generate 8 random chars as the PREFIX. This is sufficient for 
correlate metrics from the same run. The issue with making it mutable is that 
we do not have a way to guarantee `logContext` is always called.
    
    So I would suggest replacing logContext with the following:
    
    * logEstimator or logPipelineStage
    * logDataset
    
    Btw, we can by default log call site. It provides more info for 
instrumentation, not necessary in this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to