[ 
https://issues.apache.org/jira/browse/PIG-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589586#action_12589586
 ] 

Mathieu Poumeyrol commented on PIG-206:
---------------------------------------

All,

A bit of background, first. Over the last four or five years, my team and I 
have implemented several tools for our company 
[http://sk.idm.fr/opensource/software.html], one of them being a data 
processing framework (skprod) including a language which shares many goals and 
characteristics with Pig. 

The design was somehow "pre - MapReduce", and we are now facing some 
scalability issues which, combined with the cost of maintenance, make Pig look 
like a very good candidate for a replacement of skprod. The syntax is very 
different from what pig looks like, but the concepts maps quite easily. You may 
want to have a look to a getting started paper 
[http://sk.idm.fr/opensource/doc/skprod/index.html] to give an idea of it.

I am trying to "port" in Pig some of our existing data processing chains, and 
if many things looks very good, I now get an overall feeling that there is a 
difference of granularity : we were designing huge skprod script (the language 
itself has some builtin modularity) that perform a full featured task end to 
end, and usualy try to avoid chaining skprod scripts. But this approach does 
not map very well with pig as :
 - there is no way of defining "pig functions in pig". This lead obviously to 
pig code duplication.
 - every store statement is evaluated independently of the other, so there is 
no possibility for a script to detect the existence of a reusable intermediary 
result.

This lead me to think that I should maybe use Pig to run very small tasks, and 
find (or build ?) something on top of it to drive my general process calling 
pig as many times as needed, or generating a huge pig script...

At this point I'd realy like to know what you people think and where you plan 
to go... 

> Right granularity for a pig script
> ----------------------------------
>
>                 Key: PIG-206
>                 URL: https://issues.apache.org/jira/browse/PIG-206
>             Project: Pig
>          Issue Type: Wish
>            Reporter: Mathieu Poumeyrol
>
> I'd like to understand what people have in mind when they picture pig 
> scripts...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to