bersprockets commented on issue #23165: [SPARK-26188][SQL] FileIndex: don't infer data types of partition columns if user specifies schema URL: https://github.com/apache/spark/pull/23165#issuecomment-467192011 Hi @gengliangwang @cloud-fan I noticed this PR changed how mixed-cased partition columns are handled when the user provides a schema. Say I have this file structure (note that each instance of ```pS``` is mixed case): <pre> bash-3.2$ find partitioned5 -type d partitioned5 partitioned5/pi=2 partitioned5/pi=2/pS=foo partitioned5/pi=2/pS=bar partitioned5/pi=1 partitioned5/pi=1/pS=foo partitioned5/pi=1/pS=bar bash-3.2$ </pre> If I load the file with a user-provided schema in 2.4 (before this PR was committed) or 2.3, I see: <pre> scala> val df = spark.read.schema("intField int, pi int, ps string").parquet("partitioned5") df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field] scala> df.printSchema root |-- intField: integer (nullable = true) |-- pi: integer (nullable = true) |-- ps: string (nullable = true) scala> </pre> However, with this PR I see: <pre> scala> val df = spark.read.schema("intField int, pi int, ps string").parquet("partitioned5") df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field] scala> df.printSchema root |-- intField: integer (nullable = true) |-- pi: integer (nullable = true) |-- pS: string (nullable = true) scala> </pre> Spark is picking up the mixed-case column name ```pS``` from the directory name, not the lower-case ```ps``` from my specified schema. In all cases, ```spark.sql.caseSensitive``` is set to the default (false). Not sure is this is an issue, but it is a difference.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
