Hi Harsh, I'd be happy to do that. Thank you for your help!
Best, Ed On Thu, Jan 16, 2014 at 10:05 PM, Harsh J <[email protected]> wrote: > Thanks Ed! Can you also file an improvement JIRA under > https://issues.apache.org/jira/browse/AVRO with a patch that changes > it to make more sense? > > On Thu, Jan 16, 2014 at 5:14 PM, ed <[email protected]> wrote: > > Hi Harsh, > > > > Thank you for your response which was invaluable in helping me to figure > out > > my issue. The Java-Doc is in fact incorrect when it states that > > AvroJob.setOutputSchema cannot accept non-Pair configs as it turns out it > > can. What was throwing me off is that if you use > AvroJob.setOutputSchema to > > set a non-Pair config, then you also need to call > AvroJob.setMapOutputSchema > > (which does require the use of Pair). Otherwise, by default, the map > output > > schema gets set to whatever you set in setOutputSchema and if that is > > non-pair you'll get an error at runtime. > > > > Maybe the JavaDoc should say something along the lines of: > > > >> Configure a job's output schema. If this is a not a Pair-schema then you > >> must explicitly set the job's map output schema using setMapOutputSchema > > > > > > Thank you! > > > > Best Regards, > > > > Ed > > > > > > > > > > On Thu, Jan 16, 2014 at 6:47 PM, Harsh J <[email protected]> wrote: > >> > >> Hello Ed, > >> > >> The AvroReducer per > >> > >> > http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapred/AvroReducer.html > >> has a simple spec of <K,V,OUT>, where OUT can be any record type and > >> not necessarily a Pair<KO,VO> type. > >> > >> AvroJob.setOutputSchema(…) should accept non-pair configs. I think its > >> java-doc is incorrect though. I wrote a test case yesterday at > >> http://issues.apache.org/jira/browse/AVRO-1439, in which I set a > >> non-Pair schema via the same call without any trouble. We could get > >> the java-doc fixed, if it is indeed wrong. > >> > >> On Thu, Jan 16, 2014 at 2:14 PM, ed <[email protected]> wrote: > >> > Hello, > >> > > >> > I am currently reading in lots of small avro files and then writing > them > >> > out > >> > into one large avro file using Map Reduce MR1. I'm trying to do this > >> > using > >> > the AvroMapper and AvroReducer and it's almost working how I want. > >> > > >> > The problem right now is that it looks like I have to use > >> > "org.apache.avro.mapred.Pair" if I use "AvroJob.setOutputSchema". Is > >> > there > >> > a way to output a Pair schema from AvroReducer and have the "key" in > >> > that > >> > schema be ignored (i.e., not included in the output from the reducer)? > >> > Right now when I check the Reducer output there is an added field in > >> > each > >> > record called "key" which I'd like to not have there. > >> > > >> > Essentially I'm looking for something like NullWritable where the key > >> > will > >> > just be ignored in the final output. > >> > > >> > Thank you for any assistance or guidance you can provide! > >> > > >> > Best Regards, > >> > > >> > Ed > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J >
