juliuszsompolski commented on code in PR #51091:
URL: https://github.com/apache/spark/pull/51091#discussion_r2153892692
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala:
##########
@@ -69,6 +71,13 @@ case class MergeRowsExec(
copy(child = newChild)
}
+ override lazy val metrics: Map[String, SQLMetric] = Map(
+ "numTargetRowsCopied" -> SQLMetrics.createMetric(sparkContext,
+ "Number of target rows copied over because they did not match any
condition."),
+ "numTargetRowsUnmatched" -> SQLMetrics.createMetric(sparkContext,
+ "Number of target rows processed that do not match any condition. " +
+ "These will be dropped for delta-based merge and retained for
group-based merge."))
Review Comment:
Hm, I think the "Unmatched" in the name is a bit confusing because of
overload with the "MATCHED", "NOT MATCHED", "NOT MATCHED BY SOURCE" of the
whole MERGE.
When looking at it from the perspective of the whole MERGE:
- you are already after the target-source join, so these are rows that can
be MATCHED (or NOT MATCHED BY SOURCE), but then didn't pass any of the extra
conditions of "WHEN MATCHED AND", and "WHEN NOT MATCHED BY SOURCE AND".
- these are not really all target rows "processed" in the MERGE, because
there are more rows scanned from the target that then may have been dropped in
the join.
But from the perspective of the MergeRows operator:
- "processed" is fine, because these are rows processed by the MergeRows
operator.
- I asked my teammates what name could be better than "Unmatched" and I
think "Unchanged". Because these are the rows that passed into the MergeRows
operator, but then fell through all the instructions.
So I think "numTargetRowsUnchanged". Maybe while at it also have a
corresponding "numSourceRowsUnchanged"?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala:
##########
@@ -87,7 +88,22 @@ object MergeRows {
override def dataType: DataType = NullType
}
- case class Keep(condition: Expression, output: Seq[Expression]) extends
Instruction {
+ // A special case of Keep where the row is kept as is.
+ case class CarryOver(output: Seq[Expression]) extends Instruction {
Review Comment:
nit: since then we name it "copied rows" in other places, maybe name it
`Copy`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]