I started a Google Group, you can find it here: http://groups.google.com/group/piglet-dsl
T# On Fri, Jan 15, 2010 at 1:18 PM, Theo Hultberg <[email protected]> wrote: > Sorry, no mailing list yet. Up until this week it's only been me, so > the need hasn't arisen =) I should probably start a Google group or > something. > > T# > > On Fri, Jan 15, 2010 at 11:56 AM, Mridul Muralidharan > <[email protected]> wrote: >> >> This looks really promising Theo ! >> Is there some mailing list where discussions & queries related to piglet are >> discussed ? >> >> Thanks, >> Mridul >> >> >> Theo Hultberg wrote: >>> >>> Hi, >>> >>> I've written a Ruby DSL for writing Pig scripts, which I hope might >>> interest some of you. It makes it possible to do a lot of things you >>> can't do in Pig Latin, like loops, reuse code through functions, and >>> introspection on relation schemas. Basically you write some Ruby code >>> that looks a lot like Pig Latin, and you get the equivalent Pig Latin >>> as output. Loops are unrolled, functions are inlined, and so on. >>> >>> There's a lot of documentation and examples on GitHub: >>> http://github.com/iconara/piglet, and here are a few examples too: >>> >>> If you run this Ruby code through Piglet >>> >>> a = load 'input', :schema => [:x, :y] >>> b = a.group :x >>> store b, 'output' >>> >>> you will get the following Pig Latin code: >>> >>> relation_2 = LOAD 'input' AS (x, y); >>> relation_1 = GROUP relation_2 BY x; >>> STORE relation_1 INTO 'output'; >>> >>> More or less the same, don't you think? (Piglet can't determine the >>> names of the variables, unfortunately, thus the relation names are not >>> fantastic, I might get that working in a future version). >>> >>> I wrote Piglet when some Pig scripts I was working on started to get >>> very repetitive. I had a relation with a few fields that were keys and >>> a few that were numbers and I wanted to get the sums for each value of >>> each of the key fields. This meant having to repeat the same GROUP and >>> FOREACH operations once for each key, even though the only thing that >>> changed was the name of the field that I grouped by. Having to repeat >>> the same code again and again for every key was frustrating, and I >>> dreamed up a way of doing the same thing in Ruby. With Piglet I can >>> now do something like this: >>> >>> input = load('input', :schema => %w(country browser site >>> pages_visited visit_duration)) >>> >>> %w(country browser site).each do |dimension| >>> grouped = input.group(dimension).foreach do |r| >>> [ >>> r[0], >>> r[1].pages_visited.avg, >>> r[1].visit_duration.sum >>> ] >>> end >>> >>> store(grouped, "output-#{dimension}") >>> end >>> >>> which will be translated to this Pig Latin code: >>> >>> relation_1 = LOAD 'input' AS (country, browser, site, visit_duration); >>> relation_3 = GROUP relation_1 BY country; >>> relation_2 = FOREACH relation_3 GENERATE $0, AVG($1.pages_visited), >>> SUM($1.visit_duration); >>> STORE relation_2 INTO 'output-country'; >>> relation_5 = GROUP relation_1 BY browser; >>> relation_4 = FOREACH relation_5 GENERATE $0, AVG($1.pages_visited), >>> SUM($1.visit_duration); >>> STORE relation_4 INTO 'output-browser'; >>> relation_7 = GROUP relation_1 BY site; >>> relation_6 = FOREACH relation_7 GENERATE $0, AVG($1.pages_visited), >>> SUM($1.visit_duration); >>> STORE relation_6 INTO 'output-site'; >>> >>> where you can see how the loop has been unrolled and all the >>> operations repeated for each key. >>> >>> I hope that Piglet will help some of you write DRYer code. It doesn't >>> solve all problems, and there are things which are not supported at >>> all yet, but with your help I think it can be a very good companion to >>> Pig. >>> >>> If you want to know more read the documentation on GitHub: >>> http://github.com/iconara/piglet, or send me a mail either through >>> GitHub, to my e-mail ([email protected]), or via Twitter (@iconara). >>> >>> yours, >>> Theo >> >> >
