Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/20933#discussion_r178238660
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -187,6 +189,14 @@ class DataFrameReader private[sql](sparkSession:
SparkSession) extends Logging {
"read files of Hive data source directly.")
}
+ // SPARK-23817 Since datasource V2 didn't support reading multiple
files yet,
+ // ORC V2 is only used when loading single file path.
+ val allPaths = CaseInsensitiveMap(extraOptions.toMap).get("path") ++
paths
+ val orcV2 = OrcDataSourceV2.satisfy(sparkSession, source,
allPaths.toSeq)
+ if (orcV2.isDefined) {
+ option("path", allPaths.head)
+ source = orcV2.get
+ }
--- End diff --
This is temporary hack. I think @cloud-fan will create a PR to support
reading multiple files recently.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]