With fileStream you are free to plugin any InputFormat, in your case, you
can easily plugin ParquetInputFormat. Here' some parquet hadoop examples
<https://github.com/Parquet/parquet-mr/tree/master/parquet-hadoop/src/main/java/parquet/hadoop/example>
.

Thanks
Best Regards

On Thu, Mar 12, 2015 at 5:51 PM, Masf <masfwo...@gmail.com> wrote:

> Hi.
>
> Thanks for your answers, but, to read parquet files is necessary to use
> parquetFile method in org.apache.spark.sql.SQLContext,  is it true?
>
> How can I combine your solution with the called to this method?
>
> Thanks!!
> Regards
>
> On Thu, Mar 12, 2015 at 8:34 AM, Yijie Shen <henry.yijies...@gmail.com>
> wrote:
>
>> org.apache.spark.deploy.SparkHadoopUtil has a method:
>>
>> /**
>>    * Get [[FileStatus]] objects for all leaf children (files) under the
>> given base path. If the
>>    * given path points to a file, return a single-element collection
>> containing [[FileStatus]] of
>>    * that file.
>>    */
>>   def listLeafStatuses(fs: FileSystem, basePath: Path): Seq[FileStatus] =
>> {
>>     def recurse(path: Path) = {
>>       val (directories, leaves) = fs.listStatus(path).partition(_.isDir)
>>       leaves ++ directories.flatMap(f => listLeafStatuses(fs, f.getPath))
>>     }
>>
>>     val baseStatus = fs.getFileStatus(basePath)
>>     if (baseStatus.isDir) recurse(basePath) else Array(baseStatus)
>>   }
>>
>> —
>> Best Regards!
>> Yijie Shen
>>
>> On March 12, 2015 at 2:35:49 PM, Akhil Das (ak...@sigmoidanalytics.com)
>> wrote:
>>
>>  Hi
>>
>> We have a custom build to read directories recursively, Currently we use
>> it with fileStream like:
>>
>>  val lines = ssc.fileStream[LongWritable, Text,
>> TextInputFormat]("/datadumps/",
>>       (t: Path) => true, true, *true*)
>>
>>
>> Making the 4th argument true to read recursively.
>>
>>
>> You could give it a try
>> https://s3.amazonaws.com/sigmoidanalytics-builds/spark-1.2.0-bin-spark-1.2.0-hadoop2.4.0.tgz
>>
>>  Thanks
>> Best Regards
>>
>> On Wed, Mar 11, 2015 at 9:45 PM, Masf <masfwo...@gmail.com> wrote:
>>
>>> Hi all
>>>
>>> Is it possible to read recursively folders to read parquet files?
>>>
>>>
>>> Thanks.
>>>
>>> --
>>>
>>>
>>> Saludos.
>>> Miguel Ángel
>>>
>>
>>
>
>
> --
>
>
> Saludos.
> Miguel Ángel
>

Reply via email to