That looks like what I've been searching for. I'll give it a try.

Thanks!
Dennis


PS: Thanks Daniel as well. The loader might also be very useful.

Am 02.01.12 20:16 schrieb "Dmitriy Ryaboy" unter <[email protected]>:

>Dennis,
>Hadoop and Pig support globs, which may be sufficient for what you want.
>The glob matching rules are described here:
>http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/File
>System.html#globStatus(org.apache.hadoop.fs.Path)
>
>If those aren't sufficient, it's possible to write a custom loader to do
>more advanced regex expression handling in the input format, or  you could
>alter your file naming conventions / directory structure so that globs do
>become sufficient.
>
>Hope this helps.
>-Dmitriy
>
>On Mon, Jan 2, 2012 at 6:27 AM, Meyer, Dennis
><[email protected]>wrote:
>
>> Hi,
>>
>> We have a use-case where it would be beneficial to "select" multiple
>>files
>> to process by a regex pattern (or a loop-like functionality to
>>dynamically
>> adjust which files to pick). We have files of different types and inside
>> one type they have versions where we add new data to the records, but
>>we do
>> not remove info. As the files of the same type would be very similar,
>>this
>> would be a UNION. The files are stored in a directory and look like:
>>
>> type-A-v1‹1.avro
>> type-A-v1‹2.avro
>> type-A-v1‹3.avro
>> type-A-v1‹4.avro
>> type-A-v2‹1.avro
>> type-A-v2‹2.avro
>> type-A-v2‹3.avro
>> type-A-v2‹4.avro
>> type-A-v2‹5.avro
>> type-B-v1‹1.avro
>> type-B-v1‹2.avro
>> type-B-v1‹3.avro
>> Š.
>> Same with C etcŠ
>>
>> As you can guess the v1 stands for version #1, so higher version will
>>have
>> new fields in it. Different types contain different data.
>>
>> It would be great if there is a possibility to address only certain
>>files
>> (aggregate all files type "A" for "v1" and "v2"). What would be the
>> technique of choice here?
>> The aim is to increment the version (adding fields to the records
>> dynamically) without changing the aggregation itself. Of course the new
>> fields will just be ignored.
>>
>> Thanks,
>> Dennis
>>

Reply via email to