Hi,

We have a use-case where it would be beneficial to "select" multiple files to 
process by a regex pattern (or a loop-like functionality to dynamically adjust 
which files to pick). We have files of different types and inside one type they 
have versions where we add new data to the records, but we do not remove info. 
As the files of the same type would be very similar, this would be a UNION. The 
files are stored in a directory and look like:

type-A-v1—1.avro
type-A-v1—2.avro
type-A-v1—3.avro
type-A-v1—4.avro
type-A-v2—1.avro
type-A-v2—2.avro
type-A-v2—3.avro
type-A-v2—4.avro
type-A-v2—5.avro
type-B-v1—1.avro
type-B-v1—2.avro
type-B-v1—3.avro
….
Same with C etc…

As you can guess the v1 stands for version #1, so higher version will have new 
fields in it. Different types contain different data.

It would be great if there is a possibility to address only certain files 
(aggregate all files type "A" for "v1" and "v2"). What would be the technique 
of choice here?
The aim is to increment the version (adding fields to the records dynamically) 
without changing the aggregation itself. Of course the new fields will just be 
ignored.

Thanks,
Dennis

Reply via email to