Thanks !

- Mridul

Theo Hultberg wrote:
I started a Google Group, you can find it here:

http://groups.google.com/group/piglet-dsl

T#

On Fri, Jan 15, 2010 at 1:18 PM, Theo Hultberg <[email protected]> wrote:
Sorry, no mailing list yet. Up until this week it's only been me, so
the need hasn't arisen =) I should probably start a Google group or
something.

T#

On Fri, Jan 15, 2010 at 11:56 AM, Mridul Muralidharan
<[email protected]> wrote:
This looks really promising Theo !
Is there some mailing list where discussions & queries related to piglet are
discussed ?

Thanks,
Mridul


Theo Hultberg wrote:
Hi,

I've written a Ruby DSL for writing Pig scripts, which I hope might
interest some of you. It makes it possible to do a lot of things you
can't do in Pig Latin, like loops, reuse code through functions, and
introspection on relation schemas. Basically you write some Ruby code
that looks a lot like Pig Latin, and you get the equivalent Pig Latin
as output. Loops are unrolled, functions are inlined, and so on.

There's a lot of documentation and examples on GitHub:
http://github.com/iconara/piglet, and here are a few examples too:

If you run this Ruby code through Piglet

 a = load 'input', :schema => [:x, :y]
 b = a.group :x
 store b, 'output'

you will get the following Pig Latin code:

 relation_2 = LOAD 'input' AS (x, y);
 relation_1 = GROUP relation_2 BY x;
 STORE relation_1 INTO 'output';

More or less the same, don't you think? (Piglet can't determine the
names of the variables, unfortunately, thus the relation names are not
fantastic, I might get that working in a future version).

I wrote Piglet when some Pig scripts I was working on started to get
very repetitive. I had a relation with a few fields that were keys and
a few that were numbers and I wanted to get the sums for each value of
each of the key fields. This meant having to repeat the same GROUP and
FOREACH operations once for each key, even though the only thing that
changed was the name of the field that I grouped by. Having to repeat
the same code again and again for every key was frustrating, and I
dreamed up a way of doing the same thing in Ruby. With Piglet I can
now do something like this:

 input = load('input', :schema => %w(country browser site
pages_visited visit_duration))

 %w(country browser site).each do |dimension|
   grouped = input.group(dimension).foreach do |r|
     [
       r[0],
       r[1].pages_visited.avg,
       r[1].visit_duration.sum
     ]
   end

   store(grouped, "output-#{dimension}")
 end

which will be translated to this Pig Latin code:

 relation_1 = LOAD 'input' AS (country, browser, site, visit_duration);
 relation_3 = GROUP relation_1 BY country;
 relation_2 = FOREACH relation_3 GENERATE $0, AVG($1.pages_visited),
SUM($1.visit_duration);
 STORE relation_2 INTO 'output-country';
 relation_5 = GROUP relation_1 BY browser;
 relation_4 = FOREACH relation_5 GENERATE $0, AVG($1.pages_visited),
SUM($1.visit_duration);
 STORE relation_4 INTO 'output-browser';
 relation_7 = GROUP relation_1 BY site;
 relation_6 = FOREACH relation_7 GENERATE $0, AVG($1.pages_visited),
SUM($1.visit_duration);
 STORE relation_6 INTO 'output-site';

where you can see how the loop has been unrolled and all the
operations repeated for each key.

I hope that Piglet will help some of you write DRYer code. It doesn't
solve all problems, and there are things which are not supported at
all yet, but with your help I think it can be a very good companion to
Pig.

If you want to know more read the documentation on GitHub:
http://github.com/iconara/piglet, or send me a mail either through
GitHub, to my e-mail ([email protected]), or via Twitter (@iconara).

yours,
Theo


Reply via email to