Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/21618#discussion_r197836518
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -724,4 +726,35 @@ object DataSource extends Logging {
""".stripMargin)
}
}
+
+ /**
+ * Return all paths represented by the wildcard string.
+ * Use a local thread pool to do this while there's too many paths.
+ */
+ private def getGlobbedPaths(
+ sparkSession: SparkSession,
+ fs: FileSystem,
+ hadoopConf: Configuration,
+ qualified: Path): Seq[Path] = {
+ val getGlobbedPathThreshold =
sparkSession.sessionState.conf.parallelGetGlobbedPathThreshold
+ val paths = SparkHadoopUtil.get.expandGlobPath(fs, qualified,
getGlobbedPathThreshold)
--- End diff --
em...It's hard to achieve this in currently implement, maybe use a bool
config like `spark.sql.sources.parallelGetGlobbedPath.enable` to control this
and previous code path here? Maybe safer under your concerns.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]