[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

gengliangwang Fri, 30 Nov 2018 10:51:01 -0800

Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23186#discussion_r237963856
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
    @@ -345,15 +346,18 @@ object PartitioningUtils {
        */
       def resolvePartitions(
           pathsWithPartitionValues: Seq[(Path, PartitionValues)],
    +      caseSensitive: Boolean,
           timeZone: TimeZone): Seq[PartitionValues] = {
         if (pathsWithPartitionValues.isEmpty) {
           Seq.empty
         } else {
    -      // TODO: Selective case sensitivity.
    -      val distinctPartColNames =
    -        
pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
    +      val distinctPartColNames = if (caseSensitive) {
    +        pathsWithPartitionValues.map(_._2.columnNames)
    +      } else {
    +        pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase()))
    +      }
           assert(
    -        distinctPartColNames.size == 1,
    +        distinctPartColNames.distinct.size == 1,
             listConflictingPartitionColumns(pathsWithPartitionValues))
    --- End diff --
    
    The method `listConflictingPartitionColumns` also shows the suspicious 
paths.
    If case sensitive, the method works fine. 
    If case insensitive, it will list all column names without any 
transformation. e.g. 
    ```
        Partition column name list #0: a
            Partition column name list #1: A
        Partition column name list #2: B
    ```
    I can fix the method listConflictingPartitionColumns. But seems a bit 
trivial, we will have to display the original column names instead of  
transforming all to lower case .



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

Reply via email to