Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16030#discussion_r90694615 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -189,7 +189,15 @@ case class DataSource( throw new AnalysisException( s"Unable to infer schema for $format. It must be specified manually.") } - (dataSchema, partitionSchema) + + // Override the fields of the partition schema if the data schema has the same field + val resolvedPartitionSchema = partitionSchema.map { partitionField => --- End diff -- I don't think we need this. Otherwise, the value you return when `if (justPartitioning)` is inconsistent. In a real world setting, as you provided in your example, for: `case class A(a: Long, b: Int)` if `a` is in fact a `Long` and is also the partitioning column, there will be examples of `a` being a `Long` among the partition columns, therefore, things should work. No-one should ever have more than 2 billion partition columns anyway. Spark wouldn't be able to resolve that many columns all in-memory right now anyway
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org