[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

gatorsmile Sun, 20 May 2018 22:07:39 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21320#discussion_r189491217
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
    @@ -286,7 +286,19 @@ case class FileSourceScanExec(
           } getOrElse {
             metadata
           }
    -    withOptPartitionCount
    +    val withOptColumnCount = relation.fileFormat match {
    +      case columnar: ColumnarFileFormat =>
    +        SparkSession
    +          .getActiveSession
    +          .map { sparkSession =>
    +            val columnCount = columnar.columnCountForSchema(sparkSession, 
requiredSchema)
    +            withOptPartitionCount + ("ColumnCount" -> columnCount.toString)
    --- End diff --
    
    This needs to be in a separate PR as I suggested above. BTW, we could 
easily lose this metadata if this change does not have a test case.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

Reply via email to