ZiyaZa commented on code in PR #55711:
URL: https://github.com/apache/spark/pull/55711#discussion_r3196276108
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala:
##########
@@ -44,6 +47,20 @@ case class BatchScanExec(
@transient lazy val batch: Batch = if (scan == null) null else scan.toBatch
+ override protected lazy val sparkMetrics: Map[String, SQLMetric] = {
+ val name = "number of output rows"
+ val metric = table match {
+ // Use SLAM for the scan-output count when this scan reads on behalf of
a row-level DELETE,
+ // so that the driver-side derivation `numDeletedRows = numScannedRows -
numCopiedRows` in
+ // `ReplaceDataExec.getWriteSummary` stays correct under stage retries.
+ case rlot: RowLevelOperationTable if rlot.operation.command() == DELETE
=>
+ SQLLastAttemptMetrics.createMetric(sparkContext, name)
+ case _ =>
+ SQLMetrics.createMetric(sparkContext, name)
Review Comment:
Any reason for not always using SLAM here? Is it riskier than regular
metrics?
IMO it would look better if this code was in `DataSourceV2ScanExecBase`, but
there we can't check if it's a DELETE command.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]