On Sat, Jan 29, 2011 at 1:59 AM, felix gao <gre1...@gmail.com> wrote: > Thanks for the quick reply. I am interested in doing this through the java > implementation and I would like to do it in parallel that utilizes the > mapreduce framework.
That operation is pretty similar to writing a normal output data file. You can use the MapReduce API of Avro (that provides an Input/Output Format class to use, given a Schema) to do so, or write your own custom record writing classes that do it by converting your input format's record representation to Avro serialized records and writing those out to an open DataFile for a given schema. Alternatively, you can also write avro serialized data bytes into SequenceFiles. I believe the Hadoop MapReduce trunk may have some good code on Avro serialization classes and uses of that in MapReduce. > On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <qwertyman...@gmail.com> wrote: >> >> Based on the language you're targeting, have a look at its test-cases >> available on the in the project's version control: >> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out >> via SVN, or via Git mirrors] >> >> Another good resource on the ends of Avro (Data and RPC) is by phunt >> at http://github.com/phunt/avro-rpc-quickstart#readme >> >> I had written a python data-file centric snippet for Avro a while ago >> at my blog; it may help if you're looking to get started with Python >> (although it does not cover all aspects, which the functions in the >> available test cases for lang/python do): >> >> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ >> >> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <gre1...@gmail.com> wrote: >> > Hi all, >> > I am trying to convert a lot of our existing logs into avro format in >> > hadoop. I am not sure if there are any examples to follow. >> > Thanks, >> > Felix >> >> >> >> -- >> Harsh J >> www.harshj.com > > -- Harsh J www.harshj.com