This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 70b4b1d1f69 [SPARK-38979][SQL] Improve error log readability in OrcUtils.requestedColumnIds 70b4b1d1f69 is described below commit 70b4b1d1f69be3a15eadb0e798139982c152b7bb Author: sychen <syc...@ctrip.com> AuthorDate: Wed Apr 27 08:38:28 2022 -0500 [SPARK-38979][SQL] Improve error log readability in OrcUtils.requestedColumnIds ### What changes were proposed in this pull request? Add detailed log in `OrcUtils#requestedColumnIds`. ### Why are the changes needed? In `OrcUtils#requestedColumnIds` sometimes it fails because `orcFieldNames.length > dataSchema.length`, the log is not very clear. ``` java.lang.AssertionError: assertion failed: The given data schema struct<field1:int> has less fields than the actual ORC physical schema, no idea which columns were dropped, fail to read. ``` after the change ``` java.lang.AssertionError: assertion failed: The given data schema struct<field1:int> (length:1) has fewer 1 fields than the actual ORC physical schema struct<field1:int,field2:int> (length:2), no idea which columns were dropped, fail to read. ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist UT / local test Closes #36296 from cxzl25/SPARK-38979. Authored-by: sychen <syc...@ctrip.com> Signed-off-by: Sean Owen <sro...@gmail.com> --- .../org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala index f07573beae6..1783aadaa78 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala @@ -224,7 +224,9 @@ object OrcUtils extends Logging { // the physical schema doesn't match the data schema). // In these cases we map the physical schema to the data schema by index. assert(orcFieldNames.length <= dataSchema.length, "The given data schema " + - s"${dataSchema.catalogString} has less fields than the actual ORC physical schema, " + + s"${dataSchema.catalogString} (length:${dataSchema.length}) " + + s"has fewer ${orcFieldNames.length - dataSchema.length} fields than " + + s"the actual ORC physical schema $orcSchema (length:${orcFieldNames.length}), " + "no idea which columns were dropped, fail to read.") // for ORC file written by Hive, no field names // in the physical schema, there is a need to send the --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org