Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16030#discussion_r90694615
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
    @@ -189,7 +189,15 @@ case class DataSource(
           throw new AnalysisException(
             s"Unable to infer schema for $format. It must be specified 
manually.")
         }
    -    (dataSchema, partitionSchema)
    +
    +    // Override the fields of the partition schema if the data schema has 
the same field
    +    val resolvedPartitionSchema = partitionSchema.map { partitionField =>
    --- End diff --
    
    I don't think we need this. Otherwise, the value you return when `if 
(justPartitioning)` is inconsistent.
    In a real world setting, as you provided in your example, for:
    `case class A(a: Long, b: Int)`
    if `a` is in fact a `Long` and is also the partitioning column, there will 
be examples of `a` being a `Long` among the partition columns, therefore, 
things should work. No-one should ever have more than 2 billion partition 
columns anyway. Spark wouldn't be able to resolve that many columns all 
in-memory right now anyway


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to