I started a Google Group, you can find it here:

http://groups.google.com/group/piglet-dsl

T#

On Fri, Jan 15, 2010 at 1:18 PM, Theo Hultberg <[email protected]> wrote:
> Sorry, no mailing list yet. Up until this week it's only been me, so
> the need hasn't arisen =) I should probably start a Google group or
> something.
>
> T#
>
> On Fri, Jan 15, 2010 at 11:56 AM, Mridul Muralidharan
> <[email protected]> wrote:
>>
>> This looks really promising Theo !
>> Is there some mailing list where discussions & queries related to piglet are
>> discussed ?
>>
>> Thanks,
>> Mridul
>>
>>
>> Theo Hultberg wrote:
>>>
>>> Hi,
>>>
>>> I've written a Ruby DSL for writing Pig scripts, which I hope might
>>> interest some of you. It makes it possible to do a lot of things you
>>> can't do in Pig Latin, like loops, reuse code through functions, and
>>> introspection on relation schemas. Basically you write some Ruby code
>>> that looks a lot like Pig Latin, and you get the equivalent Pig Latin
>>> as output. Loops are unrolled, functions are inlined, and so on.
>>>
>>> There's a lot of documentation and examples on GitHub:
>>> http://github.com/iconara/piglet, and here are a few examples too:
>>>
>>> If you run this Ruby code through Piglet
>>>
>>>  a = load 'input', :schema => [:x, :y]
>>>  b = a.group :x
>>>  store b, 'output'
>>>
>>> you will get the following Pig Latin code:
>>>
>>>  relation_2 = LOAD 'input' AS (x, y);
>>>  relation_1 = GROUP relation_2 BY x;
>>>  STORE relation_1 INTO 'output';
>>>
>>> More or less the same, don't you think? (Piglet can't determine the
>>> names of the variables, unfortunately, thus the relation names are not
>>> fantastic, I might get that working in a future version).
>>>
>>> I wrote Piglet when some Pig scripts I was working on started to get
>>> very repetitive. I had a relation with a few fields that were keys and
>>> a few that were numbers and I wanted to get the sums for each value of
>>> each of the key fields. This meant having to repeat the same GROUP and
>>> FOREACH operations once for each key, even though the only thing that
>>> changed was the name of the field that I grouped by. Having to repeat
>>> the same code again and again for every key was frustrating, and I
>>> dreamed up a way of doing the same thing in Ruby. With Piglet I can
>>> now do something like this:
>>>
>>>  input = load('input', :schema => %w(country browser site
>>> pages_visited visit_duration))
>>>
>>>  %w(country browser site).each do |dimension|
>>>    grouped = input.group(dimension).foreach do |r|
>>>      [
>>>        r[0],
>>>        r[1].pages_visited.avg,
>>>        r[1].visit_duration.sum
>>>      ]
>>>    end
>>>
>>>    store(grouped, "output-#{dimension}")
>>>  end
>>>
>>> which will be translated to this Pig Latin code:
>>>
>>>  relation_1 = LOAD 'input' AS (country, browser, site, visit_duration);
>>>  relation_3 = GROUP relation_1 BY country;
>>>  relation_2 = FOREACH relation_3 GENERATE $0, AVG($1.pages_visited),
>>> SUM($1.visit_duration);
>>>  STORE relation_2 INTO 'output-country';
>>>  relation_5 = GROUP relation_1 BY browser;
>>>  relation_4 = FOREACH relation_5 GENERATE $0, AVG($1.pages_visited),
>>> SUM($1.visit_duration);
>>>  STORE relation_4 INTO 'output-browser';
>>>  relation_7 = GROUP relation_1 BY site;
>>>  relation_6 = FOREACH relation_7 GENERATE $0, AVG($1.pages_visited),
>>> SUM($1.visit_duration);
>>>  STORE relation_6 INTO 'output-site';
>>>
>>> where you can see how the loop has been unrolled and all the
>>> operations repeated for each key.
>>>
>>> I hope that Piglet will help some of you write DRYer code. It doesn't
>>> solve all problems, and there are things which are not supported at
>>> all yet, but with your help I think it can be a very good companion to
>>> Pig.
>>>
>>> If you want to know more read the documentation on GitHub:
>>> http://github.com/iconara/piglet, or send me a mail either through
>>> GitHub, to my e-mail ([email protected]), or via Twitter (@iconara).
>>>
>>> yours,
>>> Theo
>>
>>
>

Reply via email to