[
https://issues.apache.org/jira/browse/PIG-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589586#action_12589586
]
Mathieu Poumeyrol commented on PIG-206:
---------------------------------------
All,
A bit of background, first. Over the last four or five years, my team and I
have implemented several tools for our company
[http://sk.idm.fr/opensource/software.html], one of them being a data
processing framework (skprod) including a language which shares many goals and
characteristics with Pig.
The design was somehow "pre - MapReduce", and we are now facing some
scalability issues which, combined with the cost of maintenance, make Pig look
like a very good candidate for a replacement of skprod. The syntax is very
different from what pig looks like, but the concepts maps quite easily. You may
want to have a look to a getting started paper
[http://sk.idm.fr/opensource/doc/skprod/index.html] to give an idea of it.
I am trying to "port" in Pig some of our existing data processing chains, and
if many things looks very good, I now get an overall feeling that there is a
difference of granularity : we were designing huge skprod script (the language
itself has some builtin modularity) that perform a full featured task end to
end, and usualy try to avoid chaining skprod scripts. But this approach does
not map very well with pig as :
- there is no way of defining "pig functions in pig". This lead obviously to
pig code duplication.
- every store statement is evaluated independently of the other, so there is
no possibility for a script to detect the existence of a reusable intermediary
result.
This lead me to think that I should maybe use Pig to run very small tasks, and
find (or build ?) something on top of it to drive my general process calling
pig as many times as needed, or generating a huge pig script...
At this point I'd realy like to know what you people think and where you plan
to go...
> Right granularity for a pig script
> ----------------------------------
>
> Key: PIG-206
> URL: https://issues.apache.org/jira/browse/PIG-206
> Project: Pig
> Issue Type: Wish
> Reporter: Mathieu Poumeyrol
>
> I'd like to understand what people have in mind when they picture pig
> scripts...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.