thanks doug On Fri, Feb 3, 2012 at 3:58 PM, Doug Cutting <[email protected]> wrote:
> On 02/02/2012 08:03 PM, Koert Kuipers wrote: > > i have many avro files with similar data (same meaning, same type, etc.) > > but different names for the fields. > > can i create a reader schema that for each field that i am interested in > > maps it to all the different possible fields in the files by using > > aliases, and then run map-reduce over the files using this schema? > > i am talking about tens of aliases per field, and this number will only > > grow as more data comes in. > > is this acceptible use of the alias concept, or is it abuse? > > This seems like a reasonable use of aliases to me. Note that aliases > are limited to elements at the same level of nesting and cannot perform > arbitrary structural manipulations. But beyond that, they're meant to > be a general-purpose mechanism for mapping data from one schema to another. > > > and is the > > alias implementation in avro efficient for such usage? > > They should be efficient. Aliases are implemented by rewriting the old > schema to have the new names prior to reading. The rewriting is > performed once and cached so performance should not be impacted. > > Doug >
