Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20837#discussion_r174996170
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -517,6 +517,9 @@ class LogisticRegression @Since("1.2.0") (
(new MultivariateOnlineSummarizer, new MultiClassSummarizer)
)(seqOp, combOp, $(aggregationDepth))
}
+ instr.logNamedValue(Instrumentation.loggerTags.numExamples,
summarizer.count)
+ instr.logNamedValue("lowestLabelWeight",
labelSummarizer.histogram.min.toString)
+ instr.logNamedValue("highestLabelWeight",
labelSummarizer.histogram.min.toString)
--- End diff --
Why not log the whole histogram ( each label -> its weightSum ).
Only log min/max weightSum seems useless and user even do not know they
related to which label.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]