[GitHub] [spark] zhengruifeng commented on a change in pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC

GitBox Thu, 15 Oct 2020 20:49:24 -0700


zhengruifeng commented on a change in pull request #30009:
URL: https://github.com/apache/spark/pull/30009#discussion_r506032899




##########
File path: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala
##########
@@ -199,14 +193,11 @@ class LinearSVC @Since("2.2.0") (
     instr.logNamedValue("lowestLabelWeight", 
labelSummarizer.histogram.min.toString)
     instr.logNamedValue("highestLabelWeight", 
labelSummarizer.histogram.max.toString)
     instr.logSumOfWeights(summarizer.weightSum)
-    if ($(blockSize) > 1) {
-      val scale = 1.0 / summarizer.count / numFeatures
-      val sparsity = 1 - summarizer.numNonzeros.toArray.map(_ * scale).sum
-      instr.logNamedValue("sparsity", sparsity.toString)
-      if (sparsity > 0.5) {
-        instr.logWarning(s"sparsity of input dataset is $sparsity, " +
-          s"which may hurt performance in high-level BLAS.")
-      }
+    if (actualBlockSizeInMB == 0) {
+      val avgNNZ = summarizer.numNonzeros.activeIterator.map(_._2 / 
summarizer.count).sum

Review comment:
       yes, one more metric `numNonZeros` will be computed.
   Since it still need only one pass, I think the additional time should not be 
significant.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on a change in pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC

Reply via email to