Re: UDF discussion? Here or on the dev list? / Json Loading

Jacob Perkins Sun, 30 Jan 2011 13:01:40 -0800

Yes, you would have to distribute ruby (though. it's typicallyinstalled by default) as well as the wukong and json libraries to allthe nodes in the cluster. Unfortunately this isn't something wukonggives you for free at the moment though it is planned.

As far as I know Pig doesn't do anything more complex than launch ahadoop streaming job and use the output in the subsequent steps

btw I write 90% of my mr jobs using either wukong or Pig. Only whenit's absolutely required do I use a language with as much overhead asjava :)


--jacob
@thedatachef

Sent from my iPhone

On Jan 30, 2011, at 2:09 PM, Alex McLintock <[email protected]>wrote:

On 29 January 2011 13:43, Jacob Perkins <[email protected]>wrote:
Write a map only wukong script that parses the json as you want it.See
the example here:


http://thedatachef.blogspot.com/2011/01/processing-json-records-with-hadoop-and.html
Hi Jacob,

Thanks very much for helping me out. I haven't heard of Wukong before.
I am a bit concerned though by adding Ruby into my tool stack aswell as
Pig. It seems like a step too far.
Presumably I have to distribute Ruby and Wukong across all my jobnodes in
the same way as if I were writing perl or C++ streaming programs.
With STREAMing - the script is launched once per file, right, notonce per
record?

Alex

Re: UDF discussion? Here or on the dev list? / Json Loading

Reply via email to