That looks like what I've been searching for. I'll give it a try. Thanks! Dennis
PS: Thanks Daniel as well. The loader might also be very useful. Am 02.01.12 20:16 schrieb "Dmitriy Ryaboy" unter <[email protected]>: >Dennis, >Hadoop and Pig support globs, which may be sufficient for what you want. >The glob matching rules are described here: >http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/File >System.html#globStatus(org.apache.hadoop.fs.Path) > >If those aren't sufficient, it's possible to write a custom loader to do >more advanced regex expression handling in the input format, or you could >alter your file naming conventions / directory structure so that globs do >become sufficient. > >Hope this helps. >-Dmitriy > >On Mon, Jan 2, 2012 at 6:27 AM, Meyer, Dennis ><[email protected]>wrote: > >> Hi, >> >> We have a use-case where it would be beneficial to "select" multiple >>files >> to process by a regex pattern (or a loop-like functionality to >>dynamically >> adjust which files to pick). We have files of different types and inside >> one type they have versions where we add new data to the records, but >>we do >> not remove info. As the files of the same type would be very similar, >>this >> would be a UNION. The files are stored in a directory and look like: >> >> type-A-v1‹1.avro >> type-A-v1‹2.avro >> type-A-v1‹3.avro >> type-A-v1‹4.avro >> type-A-v2‹1.avro >> type-A-v2‹2.avro >> type-A-v2‹3.avro >> type-A-v2‹4.avro >> type-A-v2‹5.avro >> type-B-v1‹1.avro >> type-B-v1‹2.avro >> type-B-v1‹3.avro >> Š. >> Same with C etcŠ >> >> As you can guess the v1 stands for version #1, so higher version will >>have >> new fields in it. Different types contain different data. >> >> It would be great if there is a possibility to address only certain >>files >> (aggregate all files type "A" for "v1" and "v2"). What would be the >> technique of choice here? >> The aim is to increment the version (adding fields to the records >> dynamically) without changing the aggregation itself. Of course the new >> fields will just be ignored. >> >> Thanks, >> Dennis >>
