The Colossal Pipe (https://github.com/ThinkBigAnalytics/colossal-pipe)
framework also supports working with Avro as its native format for Java
map-reduce, but it also lets you read in JSON or text files as input to
mappers, making it fairly easy to use for this kind of conversion job. E.g.,
the heart of the program would be just this:
ColFile inlogs = ColFile.at("/dfs/logs/json/"+hr
/*2011/01/28/03*/).of(LogFormat.class).jsonFormat();
ColFile outlogs = ColFile.at("/dfs/logs/avro/"+hr).of(Log.class);
ColPhase copy = new
ColPhase().reads(inlogs).writes(outlogs).map(IdentityMapper.class).
groupBy("timestamp").reduce(IdentityReducer.class);
ColPipe conversion = new ColPipe(getClass()).named("log conversion");
Conversion.produces(outlogs);
You'd currently define an identity mapper and reducer (soon it will default
to those):
public static class IdentitMapper extends BaseMapper<Log, Log> {
@Override
public void map(Log in, Log out, ColContext<Log> context) {
super.map(in, out, context);
}
}
public static class IdentityReducer extends BaseReducer<Log, Log> {
@Override
public void reduce(Iterable<Log> in, Log out, ColContext<Log> context) {
super.reduce(in, out, context);
}
}
Ron
Ron Bodkin
CEO
Think Big Analytics
m: +1 (415) 509-2895
From: Philip Zeyliger <[email protected]>
Reply-To: <[email protected]>
Date: Fri, 28 Jan 2011 13:44:42 -0800
To: <[email protected]>
Subject: Re: How to get started with examples on avro
Felix,
After you've figured out how to work it for your application, I do encourage
you to contribute (https://cwiki.apache.org/AVRO/how-to-contribute.html)
examples to the open source project. We'll find a place for them!
-- Philip
On Fri, Jan 28, 2011 at 12:29 PM, felix gao <[email protected]> wrote:
> Thanks for the quick reply. I am interested in doing this through the java
> implementation and I would like to do it in parallel that utilizes the
> mapreduce framework.
>
>
> On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[email protected]> wrote:
>> Based on the language you're targeting, have a look at its test-cases
>> available on the in the project's version control:
>> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out
>> via SVN, or via Git mirrors]
>>
>> Another good resource on the ends of Avro (Data and RPC) is by phunt
>> at http://github.com/phunt/avro-rpc-quickstart#readme
>>
>> I had written a python data-file centric snippet for Avro a while ago
>> at my blog; it may help if you're looking to get started with Python
>> (although it does not cover all aspects, which the functions in the
>> available test cases for lang/python do):
>> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-py
>> thon/
>>
>> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[email protected]> wrote:
>>> > Hi all,
>>> > I am trying to convert a lot of our existing logs into avro format in
>>> > hadoop. I am not sure if there are any examples to follow.
>>> > Thanks,
>>> > Felix
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com <http://www.harshj.com>
>