[jira] [Created] (SPARK-26990) Difference in handling of mixed-case partition columns after SPARK-26188

Bruce Robbins (JIRA) Mon, 25 Feb 2019 16:59:35 -0800

Bruce Robbins created SPARK-26990:
-------------------------------------

             Summary: Difference in handling of mixed-case partition columns 
after SPARK-26188
                 Key: SPARK-26990
                 URL: https://issues.apache.org/jira/browse/SPARK-26990
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.1
            Reporter: Bruce Robbins



I noticed that the [PR for 
SPARK-26188|https://github.com/apache/spark/pull/23165] changed how mixed-cased 
partition columns are handled when the user provides a schema.

Say I have this file structure (note that each instance of `pS` is mixed case):
{noformat}
bash-3.2$ find partitioned5 -type d
partitioned5
partitioned5/pi=2
partitioned5/pi=2/pS=foo
partitioned5/pi=2/pS=bar
partitioned5/pi=1
partitioned5/pi=1/pS=foo
partitioned5/pi=1/pS=bar
bash-3.2$
{noformat}
If I load the file with a user-provided schema in 2.4 (before the PR was 
committed) or 2.3, I see:
{noformat}
scala> val df = spark.read.schema("intField int, pi int, ps 
string").parquet("partitioned5")
df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field]
scala> df.printSchema
root
 |-- intField: integer (nullable = true)
 |-- pi: integer (nullable = true)
 |-- ps: string (nullable = true)
scala>
{noformat}
However, using 2.4 after the PR was committed. I see:
{noformat}
scala> val df = spark.read.schema("intField int, pi int, ps 
string").parquet("partitioned5")
df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field]
scala> df.printSchema
root
 |-- intField: integer (nullable = true)
 |-- pi: integer (nullable = true)
 |-- pS: string (nullable = true)
scala>
{noformat}
Spark is picking up the mixed-case column name {{pS}} from the directory name, 
not the lower-case {{ps}} from my specified schema.

In all tests, {{spark.sql.caseSensitive}} is set to the default (false).

Not sure is this is an bug, but it is a difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26990) Difference in handling of mixed-case partition columns after SPARK-26188

Reply via email to