Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21719#discussion_r200497209
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -500,7 +500,7 @@ class LogisticRegression @Since("1.2.0") (
if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
- val instr = Instrumentation.create(this, dataset)
+ instr.logContext(this, dataset)
--- End diff --
It doesn't log anything. I think we should auto-generate `prefix` and keep
it as a constant. So logs would appear as:
~~~
[PREFIX]: instrumentation started
[PREFIX]: using estimator logReg-abc128
[PREFIX]: using dataset some hashcode
[PREFIX]: param maxIter=10
[PREFIX]: ...
[PREFIX]: run succeeded/failed
[PREFIX]: instrumentation ended
~~~
We can generate 8 random chars as the PREFIX. This is sufficient for
correlate metrics from the same run. The issue with making it mutable is that
we do not have a way to guarantee `logContext` is always called.
So I would suggest replacing logContext with the following:
* logEstimator or logPipelineStage
* logDataset
Btw, we can by default log call site. It provides more info for
instrumentation, not necessary in this PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]