sririshindra commented on a change in pull request #31477:
URL: https://github.com/apache/spark/pull/31477#discussion_r570720842



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala
##########
@@ -89,13 +90,20 @@ case class BroadcastNestedLoopJoinExec(
     }
   }
 
-  @transient private lazy val boundCondition = {
+  private val numMatchedPairs = longMetric("numMatchedPairs")
+
+  @transient private lazy val boundCondition: InternalRow => Boolean =
     if (condition.isDefined) {

Review comment:
       - Yes, the metric is very useful especially when there is a condition. 
It helps the users to recognize skew. But even if there is no condition, it 
could help the users to see the number of matched rows. For instance, in case 
of outer joins users will be able to see just how many rows matched between the 
stream side and the build side without explicitly doing another inner join.   
   
   - I also faced an implementation problem when I originally tried to show the 
metric only when there is a condition. If there is a metric defined, but if the 
metric is not used in some cases (For example if we don't increment the metric 
when there is no condition), then the tests in SQLMetricsSuite will fail 
because the accumulatorIds corresponding the metrics would get messed up. The 
size of metric values in `val metricValues = 
statusStore.executionMetrics(executionId)` and the metrics in `node.metrics` 
would become inconsistent. 
   
   - It might also confuse the users to see the metric in some cases and not 
see the metric in other cases. It would probably be best to show the metric 
everytime there is a join node in the UI to avoid any confusion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to