Github user brkyvz commented on a diff in the pull request:
https://github.com/apache/spark/pull/16030#discussion_r90694615
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -189,7 +189,15 @@ case class DataSource(
throw new AnalysisException(
s"Unable to infer schema for $format. It must be specified
manually.")
}
- (dataSchema, partitionSchema)
+
+ // Override the fields of the partition schema if the data schema has
the same field
+ val resolvedPartitionSchema = partitionSchema.map { partitionField =>
--- End diff --
I don't think we need this. Otherwise, the value you return when `if
(justPartitioning)` is inconsistent.
In a real world setting, as you provided in your example, for:
`case class A(a: Long, b: Int)`
if `a` is in fact a `Long` and is also the partitioning column, there will
be examples of `a` being a `Long` among the partition columns, therefore,
things should work. No-one should ever have more than 2 billion partition
columns anyway. Spark wouldn't be able to resolve that many columns all
in-memory right now anyway
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]