In theory, any PType that supports GenericRecord will work- even a dummy one that defines a schema that isn't the same as the one you're using.
I don't recommend doing that, of course, but it will work. On Wed, Feb 24, 2016 at 12:18 AM Marcin Michalski <[email protected]> wrote: > Hi, is there an easy way to pass GenericData.Record between Fns in crunch > without specifically stating the schema? Since I want to pass multiple avro > files that have various schemas as input to a single DoFn which will > enhance the data into a Pair and later I want to do an aggregation > (deduping) Fn on that data but don't want to specify the Schema in between > (I just want to work with GenericData.Record instances. Here is an example > > PCollection<Record> messages = > pipeline.read(From.avroFile("/events/*/20160223/")); > > // I don't want pass the schema instance but rather just work with > GenericData.Record, is that possible? Or do I need to store use Avros.bytes > instead and then reconstruct the Record later in the next Fn? > messages.parallellDo(new EventEnhancerDoFn(), > Avros.generics(messageSchema)).groupByKey... > > > Thanks, > Marcin > >
