Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20933#discussion_r179507749
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -187,6 +189,14 @@ class DataFrameReader private[sql](sparkSession:
SparkSession) extends Logging {
"read files of Hive data source directly.")
}
+ // SPARK-23817 Since datasource V2 didn't support reading multiple
files yet,
+ // ORC V2 is only used when loading single file path.
+ val allPaths = CaseInsensitiveMap(extraOptions.toMap).get("path") ++
paths
+ val orcV2 = OrcDataSourceV2.satisfy(sparkSession, source,
allPaths.toSeq)
+ if (orcV2.isDefined) {
+ option("path", allPaths.head)
+ source = orcV2.get
+ }
--- End diff --
We only support bucket with tables, while data source v2 can't work with
tables now.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]