> I think it's just that. It seems relatively low-risk to me (e.g., we already use AvroKey in the AvroPairConverter for PTables).
Ok sounds good. Do you want me to log a bug for this? > I'm also curious if you're looking at Parquet for this use case? Yeah was going to look at it after Trevni. It's Avro support is not as far along (looks like ~16 days). The goal was to hopefully help get support for both into Crunch eventually and we can choose whichever is better for our job. On Wed, Apr 24, 2013 at 5:52 PM, Josh Wills <[email protected]> wrote: > > > > On Wed, Apr 24, 2013 at 3:49 PM, Micah Whitacre <[email protected]>wrote: > >> Is the change simply: >> >> private AvroWrapper<K> getWrapper() { >> if (wrapper == null) { >> // wrapper = new AvroWrapper<K>(); >> wrapper = new AvroKey<K>(); >> } >> return wrapper; >> } >> >> Or are there more changes I might be missing? Doing that got me past the >> ClassCastException (though still trying to get my code working). >> >> As I indicated I'm still just trying to prove out my code and if it pans >> out we can probably wait till the 0.7.0 release (assuming the current ~2 >> month release cycle). I'll leave it to you to evaluate the risk. >> > > I think it's just that. It seems relatively low-risk to me (e.g., we > already use AvroKey in the AvroPairConverter for PTables). > > >> >> I'm guessing the injecting a converter issue will be more significant if >> I try out the other Trevni format[1] where I'd need the converter to >> support AvroValue instead of NullWritable. So I'm fine with holding off a >> rushed change before a release in lieu of a more holistic solution to both >> parts. >> >> [1] - >> http://avro.apache.org/docs/current/api/java/org/apache/trevni/avro/mapreduce/AvroTrevniKeyValueOutputFormat.html >> > > I'm also curious if you're looking at Parquet for this use case? > > >> >> >> >> On Wed, Apr 24, 2013 at 5:29 PM, Josh Wills <[email protected]> wrote: >> >>> Hey Micah, >>> >>> It seems like having the AvroKeyConverter use the AvroKey as the return >>> type instead of AvroWrapper is the easiest way to solve this, since AvroKey >>> is a subclass of AvroWrapper. That said, I agree, that's a thorny problem. >>> We're just getting ready for the 0.6.0 release, but I'd be fine to get the >>> switch in there if that solved this problem for you. >>> >>> J >>> >>> >>> On Wed, Apr 24, 2013 at 3:23 PM, Micah Whitacre <[email protected]>wrote: >>> >>>> As an alternative to the standard AvroInput/OutputFormat, I've been >>>> playing around with how to support alternate Avro file types like >>>> Trevni[1], which give benefits when we want to only retrieve a subset of >>>> the Avro object. >>>> >>>> Picking one of the implementations >>>> (AvroTrevniKeyInputFormat/AvroTrevniKeyOutputFormat)[2], I implemented the >>>> various Source/Target/SourceTarget implementations. When I started trying >>>> to test it out (to see if I did any of it right), I hit the issue that the >>>> AvroKeyConverter only produces AvroWrapper objects and the output format >>>> requires AvroKey. So I get ClassCastExceptions CrunchOutputs.write(...) >>>> method. >>>> >>>> Caused by: java.lang.ClassCastException: >>>> org.apache.avro.mapred.AvroWrapper cannot be cast to >>>> org.apache.avro.mapred.AvroKey >>>> at >>>> org.apache.trevni.avro.mapreduce.AvroTrevniKeyRecordWriter.write(AvroTrevniKeyRecordWriter.java:34) >>>> at org.apache.crunch.io.CrunchOutputs.write(CrunchOutputs.java:129) >>>> >>>> I was hoping that the target would be able to take any PCollection<? >>>> extends AvroType> but it looks like I'd need to implement my own PType and >>>> force consumers to use that just to change the converter to produce AvroKey >>>> instead. >>>> >>>> Is implementing a custom PType the only way to inject an alternate >>>> converter? That seems like a high cost on the implementation side and >>>> forcing a restriction onto others in the pipeline who are generally happy >>>> with the standard AvroType and shouldn't be burdened with how the data >>>> might be stored later on in the processing. >>>> >>>> Thoughts? >>>> >>>> [1] - http://avro.apache.org/docs/current/trevni/spec.html >>>> [2] - >>>> http://avro.apache.org/docs/current/api/java/org/apache/trevni/avro/mapreduce/AvroTrevniKeyOutputFormat.html >>>> >>> >>> >>> >>> -- >>> Director of Data Science >>> Cloudera <http://www.cloudera.com> >>> Twitter: @josh_wills <http://twitter.com/josh_wills> >>> >> >> > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> >
