yadavay-amzn commented on code in PR #56550:
URL: https://github.com/apache/spark/pull/56550#discussion_r3432858643


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -383,6 +459,7 @@ object JdbcUtils extends Logging with SQLConfHelper {
             if (rs.wasNull) mutableRow.setNullAt(i)
             i = i + 1
           }
+          inputMetrics.incBytesRead(estimateInternalRowSize(mutableRow, 
schema))

Review Comment:
   Thanks for the steer; agreed an estimate does not belong in the physical-I/O 
metrics. Reworked it to dedicated, clearly-estimated SQL metrics and dropped 
the bytesRead/bytesWritten changes entirely (recordsRead/recordsWritten 
unchanged):
   - Read: an `estimatedDataSizeBytes` SQLMetric on the JDBC scan, registered 
in `JDBCRDD.getMetrics` next to `fetchAndTransformToInternalRowsNs`. Because it 
rides on JDBCRDD it covers both the v1 (`spark.read.jdbc`) and v2/catalog read 
paths.
   - Write: a v2 `CustomMetric` on `JDBCWriteBuilder` (via 
`reportDriverMetrics`). This appears only on the v2/catalog write path; plain 
`df.write.jdbc(...)` goes through `SaveIntoDataSourceCommand`, which has no 
operator-metrics surface, so it is not covered. I noted that as a known 
limitation; if you would like v1 write coverage too, I am happy to take it as a 
separate follow-up since the v1 wiring is more invasive.
   
   Both are labeled as estimates of decoded Spark-side row size (read measures 
UTF-8 bytes, write measures char count to stay allocation-free), so they are 
clearly not physical I/O. On the non-blocking note: with the dedicated metric 
the per-row estimate is the metric's own work, O(numColumns), and negligible 
next to JDBC fetch latency. Updated the PR description to match. PTAL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to