Bruce Robbins created SPARK-26990: ------------------------------------- Summary: Difference in handling of mixed-case partition columns after SPARK-26188 Key: SPARK-26990 URL: https://issues.apache.org/jira/browse/SPARK-26990 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.1 Reporter: Bruce Robbins
I noticed that the [PR for SPARK-26188|https://github.com/apache/spark/pull/23165] changed how mixed-cased partition columns are handled when the user provides a schema. Say I have this file structure (note that each instance of `pS` is mixed case): {noformat} bash-3.2$ find partitioned5 -type d partitioned5 partitioned5/pi=2 partitioned5/pi=2/pS=foo partitioned5/pi=2/pS=bar partitioned5/pi=1 partitioned5/pi=1/pS=foo partitioned5/pi=1/pS=bar bash-3.2$ {noformat} If I load the file with a user-provided schema in 2.4 (before the PR was committed) or 2.3, I see: {noformat} scala> val df = spark.read.schema("intField int, pi int, ps string").parquet("partitioned5") df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field] scala> df.printSchema root |-- intField: integer (nullable = true) |-- pi: integer (nullable = true) |-- ps: string (nullable = true) scala> {noformat} However, using 2.4 after the PR was committed. I see: {noformat} scala> val df = spark.read.schema("intField int, pi int, ps string").parquet("partitioned5") df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field] scala> df.printSchema root |-- intField: integer (nullable = true) |-- pi: integer (nullable = true) |-- pS: string (nullable = true) scala> {noformat} Spark is picking up the mixed-case column name {{pS}} from the directory name, not the lower-case {{ps}} from my specified schema. In all tests, {{spark.sql.caseSensitive}} is set to the default (false). Not sure is this is an bug, but it is a difference. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org