org.apache.spark.deploy.SparkHadoopUtil has a method: /** * Get [[FileStatus]] objects for all leaf children (files) under the given base path. If the * given path points to a file, return a single-element collection containing [[FileStatus]] of * that file. */ def listLeafStatuses(fs: FileSystem, basePath: Path): Seq[FileStatus] = { def recurse(path: Path) = { val (directories, leaves) = fs.listStatus(path).partition(_.isDir) leaves ++ directories.flatMap(f => listLeafStatuses(fs, f.getPath)) }
val baseStatus = fs.getFileStatus(basePath) if (baseStatus.isDir) recurse(basePath) else Array(baseStatus) } — Best Regards! Yijie Shen On March 12, 2015 at 2:35:49 PM, Akhil Das (ak...@sigmoidanalytics.com) wrote: Hi We have a custom build to read directories recursively, Currently we use it with fileStream like: val lines = ssc.fileStream[LongWritable, Text, TextInputFormat]("/datadumps/", (t: Path) => true, true, true) Making the 4th argument true to read recursively. You could give it a try https://s3.amazonaws.com/sigmoidanalytics-builds/spark-1.2.0-bin-spark-1.2.0-hadoop2.4.0.tgz Thanks Best Regards On Wed, Mar 11, 2015 at 9:45 PM, Masf <masfwo...@gmail.com> wrote: Hi all Is it possible to read recursively folders to read parquet files? Thanks. -- Saludos. Miguel Ángel