When you actually write the code will you know what the avro record is?  I’ve 
been able to do something along the lines of

public class GenericAvroFunction<T extends SpecificRecordBase> extends DoFn<T, 
String> {
…

public void process(T input, Emitter<String> emitter) {
…
}
}

then parameterizing it in the various pipelines that use it.  Not sure with 
regards to making it work at run time though.

From: Sankash Shankar [mailto:[email protected]]
Sent: Monday, June 22, 2015 4:18 PM
To: [email protected]
Subject: How to write a generic transform method that will act upon generated 
avro objects in a generic fashion

Hello.

I am writing a Crunch job that takes in an arbitrary class that extends 
SpecificRecord and performs a transformation on the fields in the class. I am 
attempting to write a parallelDo function on these classes, but

public static PCollection<String> function(PCollection<? extends 
SpecificRecord> coll) {
  coll.parallelDo(new DoFn<? extends SpecificRecord, String>() {
    ...
  }, Avros.strings());
}

will not compile given it expects a type at compile-time

will not compile given it expects a type at compile time, while

public static PCollection<String> 
transformAvroToCsv(PCollection<SpecificRecord> coll) {
  coll.parallelDo(new DoFn<SpecificRecord, String>() {
    @Override
    public void process(SpecificRecord input, Emitter<String> emitter) {
    }
  }, Avros.strings());
  return null;
}

will fail at run-time due to SpecificRecord not having an init constructor.
What is the standard way for taking in generic avro records and having a generic
transform method to call on them?

Thanks.
This email is intended only for the use of the individual(s) to whom it is 
addressed. If you have received this communication in error, please immediately 
notify the sender and delete the original email.

Reply via email to