yadavay-amzn commented on code in PR #56550:
URL: https://github.com/apache/spark/pull/56550#discussion_r3432858643
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -383,6 +459,7 @@ object JdbcUtils extends Logging with SQLConfHelper {
if (rs.wasNull) mutableRow.setNullAt(i)
i = i + 1
}
+ inputMetrics.incBytesRead(estimateInternalRowSize(mutableRow,
schema))
Review Comment:
Thanks for the steer; agreed an estimate does not belong in the physical-I/O
metrics. Reworked it to dedicated, clearly-estimated SQL metrics and dropped
the bytesRead/bytesWritten changes entirely (recordsRead/recordsWritten
unchanged):
- Read: an `estimatedDataSizeBytes` SQLMetric on the JDBC scan, registered
in `JDBCRDD.getMetrics` next to `fetchAndTransformToInternalRowsNs`. Because it
rides on JDBCRDD it covers both the v1 (`spark.read.jdbc`) and v2/catalog read
paths.
- Write: a v2 `CustomMetric` on `JDBCWriteBuilder` (via
`reportDriverMetrics`). This appears only on the v2/catalog write path; plain
`df.write.jdbc(...)` goes through `SaveIntoDataSourceCommand`, which has no
operator-metrics surface, so it is not covered. I noted that as a known
limitation; if you would like v1 write coverage too, I am happy to take it as a
separate follow-up since the v1 wiring is more invasive.
Both are labeled as estimates of decoded Spark-side row size (read measures
UTF-8 bytes, write measures char count to stay allocation-free), so they are
clearly not physical I/O. On the non-blocking note: with the dedicated metric
the per-row estimate is the metric's own work, O(numColumns), and negligible
next to JDBC fetch latency. Updated the PR description to match. PTAL.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]