Re: Read parquet folders recursively

Yijie Shen Thu, 12 Mar 2015 00:39:20 -0700

org.apache.spark.deploy.SparkHadoopUtil has a method:

/**
   * Get [[FileStatus]] objects for all leaf children (files) under the given 
base path. If the
   * given path points to a file, return a single-element collection containing 
[[FileStatus]] of
   * that file.
   */
  def listLeafStatuses(fs: FileSystem, basePath: Path): Seq[FileStatus] = {
    def recurse(path: Path) = {
      val (directories, leaves) = fs.listStatus(path).partition(_.isDir)
      leaves ++ directories.flatMap(f => listLeafStatuses(fs, f.getPath))
    }


    val baseStatus = fs.getFileStatus(basePath)
    if (baseStatus.isDir) recurse(basePath) else Array(baseStatus)
  }

— 
Best Regards!
Yijie Shen

On March 12, 2015 at 2:35:49 PM, Akhil Das (ak...@sigmoidanalytics.com) wrote:

Hi

We have a custom build to read directories recursively, Currently we use it 
with fileStream like:

val lines = ssc.fileStream[LongWritable, Text, TextInputFormat]("/datadumps/",
     (t: Path) => true, true, true)

Making the 4th argument true to read recursively.


You could give it a try 
https://s3.amazonaws.com/sigmoidanalytics-builds/spark-1.2.0-bin-spark-1.2.0-hadoop2.4.0.tgz

Thanks
Best Regards

On Wed, Mar 11, 2015 at 9:45 PM, Masf <masfwo...@gmail.com> wrote:
Hi all

Is it possible to read recursively folders to read parquet files?


Thanks.

--


Saludos.
Miguel Ángel

Re: Read parquet folders recursively

Reply via email to