Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/17745#discussion_r212822877
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
---
@@ -196,29 +191,29 @@ class FileInputDStream[K, V, F <: NewInputFormat[K,
V]](
logDebug(s"Getting new files for time $currentTime, " +
s"ignoring files older than $modTimeIgnoreThreshold")
- val newFileFilter = new PathFilter {
- def accept(path: Path): Boolean = isNewFile(path, currentTime,
modTimeIgnoreThreshold)
- }
- val directoryFilter = new PathFilter {
- override def accept(path: Path): Boolean =
fs.getFileStatus(path).isDirectory
- }
- val directories = fs.globStatus(directoryPath,
directoryFilter).map(_.getPath)
+ val directories =
Option(fs.globStatus(directoryPath)).getOrElse(Array.empty[FileStatus])
--- End diff --
Still a lot; I think we can do a new one.
Latest version of this code is
[here](https://github.com/hortonworks-spark/cloud-integration/tree/master/spark-cloud-integration);
I think its time to set up a module in bahir for this
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]