chenhao-db commented on code in PR #56505:
URL: https://github.com/apache/spark/pull/56505#discussion_r3431310671
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkShreddingUtils.scala:
##########
@@ -836,8 +865,26 @@ case object SparkShreddingUtils {
val resultRow = new GenericInternalRow(numFields)
var fieldIdx = 0
while (fieldIdx < numFields) {
- resultRow.update(fieldIdx, extractField(inputRow, topLevelMetadata,
schema,
- fields(fieldIdx).path, fields(fieldIdx).reader))
+ val field = fields(fieldIdx)
+ if (field.isCastError) {
+ // Filled by the paired data field on failure; left null otherwise.
+ } else if (field.castErrorOrdinal >= 0) {
+ try {
+ val value = extractField(inputRow, topLevelMetadata, schema,
field.path, field.reader)
+ resultRow.update(fieldIdx, value)
+ } catch {
+ case e: SparkRuntimeException if e.getCondition ==
"INVALID_VARIANT_CAST" =>
+ // Recover the offending value from the error's `value` message
parameter so the
Review Comment:
The observation is correct, but it may not worth the complexity to retain
the data type in error message (need to add more information to the companion
field). The whole type is more informational in some sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]