[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

gatorsmile Tue, 07 Aug 2018 15:48:38 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21608#discussion_r208408858
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
    @@ -49,4 +51,11 @@ object DataSourceUtils {
           }
         }
       }
    +
    +  // SPARK-15895: Metadata files (e.g. Parquet summary files) and 
temporary files should not be
    +  // counted as data files, so that they shouldn't participate partition 
discovery.
    +  private[sql] def isDataPath(path: Path): Boolean = {
    +    val name = path.getName
    +    !((name.startsWith("_") && !name.contains("=")) || 
name.startsWith("."))
    --- End diff --
    
    Not sure what is your earlier impl. I would prefer to keeping unchanged the 
original code in `PartitioningAwareFileIndex.scala`. Just add a utility 
function `isDataPath ` in CommandUtils.scala. Does this sound good to you?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

Reply via email to