[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

jose-torres Thu, 29 Mar 2018 22:32:24 -0700

Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20933#discussion_r178234982
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -187,6 +189,14 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
             "read files of Hive data source directly.")
         }
     
    +    // SPARK-23817 Since datasource V2 didn't support reading multiple 
files yet,
    +    // ORC V2 is only used when loading single file path.
    +    val allPaths = CaseInsensitiveMap(extraOptions.toMap).get("path") ++ 
paths
    +    val orcV2 = OrcDataSourceV2.satisfy(sparkSession, source, 
allPaths.toSeq)
    +    if (orcV2.isDefined) {
    +      option("path", allPaths.head)
    +      source = orcV2.get
    +    }
    --- End diff --
    
    It seems weird that DataFrameReader is modified here. Will DataSourceV2 
implementations generally need to modify DataFrameReader, or is it just a 
temporary hack because of the mentioned lack of support? In the latter case, is 
there a plan to add this support soon?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

Reply via email to