On Sat, Jan 29, 2011 at 1:59 AM, felix gao <gre1...@gmail.com> wrote:
> Thanks for the quick reply.  I am interested in doing this through the java
> implementation and I would like to do it in parallel that utilizes the
> mapreduce framework.

That operation is pretty similar to writing a normal output data file.

You can use the MapReduce API of Avro (that provides an Input/Output
Format class to use, given a Schema) to do so, or write your own
custom record writing classes that do it by converting your input
format's record representation to Avro serialized records and writing
those out to an open DataFile for a given schema. Alternatively, you
can also write avro serialized data bytes into SequenceFiles.

I believe the Hadoop MapReduce trunk may have some good code on Avro
serialization classes and uses of that in MapReduce.

> On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <qwertyman...@gmail.com> wrote:
>>
>> Based on the language you're targeting, have a look at its test-cases
>> available on the in the project's version control:
>> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out
>> via SVN, or via Git mirrors]
>>
>> Another good resource on the ends of Avro (Data and RPC) is by phunt
>> at http://github.com/phunt/avro-rpc-quickstart#readme
>>
>> I had written a python data-file centric snippet for Avro a while ago
>> at my blog; it may help if you're looking to get started with Python
>> (although it does not cover all aspects, which the functions in the
>> available test cases for lang/python do):
>>
>> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/
>>
>> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <gre1...@gmail.com> wrote:
>> > Hi all,
>> > I am trying to convert a lot of our existing logs into avro format in
>> > hadoop.  I am not sure if there are any examples to follow.
>> > Thanks,
>> > Felix
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>
>



-- 
Harsh J
www.harshj.com

Reply via email to