Have a look at the spark streaming. You can make use of the ssc.fileStream.

Eg:

val avroStream = ssc.fileStream[AvroKey[GenericRecord], NullWritable,
      AvroKeyInputFormat[GenericRecord]](input)

You can also specify a filter function
<http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext>
as the second argument.

Thanks
Best Regards

On Wed, Aug 19, 2015 at 10:46 PM, Masf <masfwo...@gmail.com> wrote:

> Hi.
>
> I'd like to read Avro files using this library
> https://github.com/databricks/spark-avro
>
> I need to load several files from a folder, not all files. Is there some
> functionality to filter the files to load?
>
> And... Is is possible to know the name of the files loaded from a folder?
>
> My problem is that I have a folder where an external process is inserting
> files every X minutes and I need process these files once, and I can't
> move, rename or copy the source files.
>
>
> Thanks
> --
>
> Regards
> Miguel Ángel
>

Reply via email to