Hey Micah, It seems like having the AvroKeyConverter use the AvroKey as the return type instead of AvroWrapper is the easiest way to solve this, since AvroKey is a subclass of AvroWrapper. That said, I agree, that's a thorny problem. We're just getting ready for the 0.6.0 release, but I'd be fine to get the switch in there if that solved this problem for you.
J On Wed, Apr 24, 2013 at 3:23 PM, Micah Whitacre <[email protected]>wrote: > As an alternative to the standard AvroInput/OutputFormat, I've been > playing around with how to support alternate Avro file types like > Trevni[1], which give benefits when we want to only retrieve a subset of > the Avro object. > > Picking one of the implementations > (AvroTrevniKeyInputFormat/AvroTrevniKeyOutputFormat)[2], I implemented the > various Source/Target/SourceTarget implementations. When I started trying > to test it out (to see if I did any of it right), I hit the issue that the > AvroKeyConverter only produces AvroWrapper objects and the output format > requires AvroKey. So I get ClassCastExceptions CrunchOutputs.write(...) > method. > > Caused by: java.lang.ClassCastException: > org.apache.avro.mapred.AvroWrapper cannot be cast to > org.apache.avro.mapred.AvroKey > at > org.apache.trevni.avro.mapreduce.AvroTrevniKeyRecordWriter.write(AvroTrevniKeyRecordWriter.java:34) > at org.apache.crunch.io.CrunchOutputs.write(CrunchOutputs.java:129) > > I was hoping that the target would be able to take any PCollection<? > extends AvroType> but it looks like I'd need to implement my own PType and > force consumers to use that just to change the converter to produce AvroKey > instead. > > Is implementing a custom PType the only way to inject an alternate > converter? That seems like a high cost on the implementation side and > forcing a restriction onto others in the pipeline who are generally happy > with the standard AvroType and shouldn't be burdened with how the data > might be stored later on in the processing. > > Thoughts? > > [1] - http://avro.apache.org/docs/current/trevni/spec.html > [2] - > http://avro.apache.org/docs/current/api/java/org/apache/trevni/avro/mapreduce/AvroTrevniKeyOutputFormat.html > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
