[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...

brkyvz Fri, 02 Dec 2016 10:33:07 -0800

Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16030#discussion_r90694615
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
    @@ -189,7 +189,15 @@ case class DataSource(
           throw new AnalysisException(
             s"Unable to infer schema for $format. It must be specified 
manually.")
         }
    -    (dataSchema, partitionSchema)
    +
    +    // Override the fields of the partition schema if the data schema has 
the same field
    +    val resolvedPartitionSchema = partitionSchema.map { partitionField =>
    --- End diff --
    
    I don't think we need this. Otherwise, the value you return when `if 
(justPartitioning)` is inconsistent.
    In a real world setting, as you provided in your example, for:
    `case class A(a: Long, b: Int)`
    if `a` is in fact a `Long` and is also the partitioning column, there will 
be examples of `a` being a `Long` among the partition columns, therefore, 
things should work. No-one should ever have more than 2 billion partition 
columns anyway. Spark wouldn't be able to resolve that many columns all 
in-memory right now anyway



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...

Reply via email to