chenhao-db commented on code in PR #56505:
URL: https://github.com/apache/spark/pull/56505#discussion_r3431310671


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkShreddingUtils.scala:
##########
@@ -836,8 +865,26 @@ case object SparkShreddingUtils {
     val resultRow = new GenericInternalRow(numFields)
     var fieldIdx = 0
     while (fieldIdx < numFields) {
-      resultRow.update(fieldIdx, extractField(inputRow, topLevelMetadata, 
schema,
-        fields(fieldIdx).path, fields(fieldIdx).reader))
+      val field = fields(fieldIdx)
+      if (field.isCastError) {
+        // Filled by the paired data field on failure; left null otherwise.
+      } else if (field.castErrorOrdinal >= 0) {
+        try {
+          val value = extractField(inputRow, topLevelMetadata, schema, 
field.path, field.reader)
+          resultRow.update(fieldIdx, value)
+        } catch {
+          case e: SparkRuntimeException if e.getCondition == 
"INVALID_VARIANT_CAST" =>
+            // Recover the offending value from the error's `value` message 
parameter so the

Review Comment:
   The observation is correct, but it may not worth the complexity to retain 
the data type in error message (need to add more information to the companion 
field). The whole type is more informational in some sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to