The problem was solved by David's GenericAvroFunction solution. Thanks again.
On Tue, Jun 23, 2015 at 1:57 AM, Josh Wills <[email protected]> wrote: > Hey Sankash, > > I don't understand a couple of things here: > > 1) The init() error in SpecificRecord from your original email: I could > see that sort of thing being a problem if you were trying to create a > PType<SpecificRecord> vs. a PType<SomeImplOfSpecificRecord>, but I don't > get why it would be a problem in defining an ordinary DoFn. > 2) Why David's suggestion of GenericAvroFunction<T extends > SpecificRecordBase> wouldn't be serializable. > > J > > On Mon, Jun 22, 2015 at 3:15 PM, David Ortiz <[email protected]> > wrote: > >> How are you getting it into a PCollection? Whatever you're doing there >> should work for the function shouldn't it? >> >> *Sent from my Verizon Wireless 4G LTE DROID* >> On Jun 22, 2015 6:09 PM, Sankash Shankar <[email protected]> >> wrote: >> Hello, >> >> With regards to your question, we will know the class will be one of a >> pre-defined list of classes, but the exact class will not be known until >> runtime. In addition, the generic class GenericAvroFunction cannot be >> defined in a static manner and a generic type, which keeps it from being >> serializable. >> >> Thanks. >> >> >> >> On Mon, Jun 22, 2015 at 1:23 PM, David Ortiz <[email protected]> >> wrote: >> >>> When you actually write the code will you know what the avro record >>> is? I’ve been able to do something along the lines of >>> >>> >>> >>> public class GenericAvroFunction<T extends SpecificRecordBase> extends >>> DoFn<T, String> { >>> >>> … >>> >>> >>> >>> public void process(T input, Emitter<String> emitter) { >>> >>> … >>> >>> } >>> >>> } >>> >>> >>> >>> then parameterizing it in the various pipelines that use it. Not sure >>> with regards to making it work at run time though. >>> >>> >>> >>> *From:* Sankash Shankar [mailto:[email protected]] >>> *Sent:* Monday, June 22, 2015 4:18 PM >>> *To:* [email protected] >>> *Subject:* How to write a generic transform method that will act upon >>> generated avro objects in a generic fashion >>> >>> >>> >>> Hello. >>> >>> >>> >>> I am writing a Crunch job that takes in an arbitrary class that extends >>> SpecificRecord and performs a transformation on the fields in the class. I >>> am attempting to write a parallelDo function on these classes, but >>> >>> *public static *PCollection<String> function(PCollection<? *extends >>> *SpecificRecord> coll) { >>> coll.parallelDo(*new *DoFn<? *extends *SpecificRecord, String>() { >>> ... >>> }, Avros.*strings*()); >>> } >>> >>> will not compile given it expects a type at compile-time >>> >>> *will not compile given it expects a type at compile time, while * >>> >>> *public static *PCollection<String> >>> transformAvroToCsv(PCollection<SpecificRecord> coll) { >>> coll.parallelDo(*new *DoFn<SpecificRecord, String>() { >>> @Override >>> *public void *process(SpecificRecord input, Emitter<String> emitter) { >>> } >>> }, Avros.*strings*()); >>> *return null*; >>> } >>> >>> *will fail at run-time due to SpecificRecord not having an init >>> constructor.* >>> >>> What is the standard way for taking in generic avro records and >>> having a generic >>> >>> transform method to call on them? >>> >>> >>> >>> Thanks. >>> *This email is intended only for the use of the individual(s) to >>> whom it is addressed. If you have received this communication in error, >>> please immediately notify the sender and delete the original email.* >>> >> >> *This email is intended only for the use of the individual(s) to whom >> it is addressed. If you have received this communication in error, please >> immediately notify the sender and delete the original email.* >> > >
