I think it is a great idea to be able to plug-in a different back-ends.
But the way to do that, IMHO, is to make the intermediate artifacts public
(akin to making byte-code specs public).
That way, independent projects can spring up that take the translated pig
script, and provide a new interpreter for that physical plan, and show their
superiority / cool features etc.
My suggestion is this:
Pigcc -L myScript.pig -> parses pig script, generates logical plan, and
stores it in myScript.pig.l
Pigcc -P myScript.pig.l -> produces physical plan from the logical plan, and
stores it in myScript.pig.p
Pigcc -M myScript.pig.p -> produces map-reduce plan, myScript.pig.m
Pig myScript.pig.m -> interprets the MR plan. This can be split into
multiple sequential MR jobs plans too, myScript.pig.m.{1,2,3..}, so that a
way to execute the pig script is to run
Hadoop jar pigRT.jar myScript.pig.m.1
Hadoop jar pigRT.jar myScript.pig.m.2
Hadoop jar pigRT.jar myScript.pig.m.3
Hadoop jar pigRT.jar myScript.pig.m.4
in sequence or as a DAG.
That also makes it easy for someone to write an experimental runtime, or a
full-fledged translator to other languages, without having to wait for pig
committers to have their patches committed. This will have beneficial impact
on the pig eco-system.
Dmitry, you might remember that we had spoken about it in CMU last October
:-)
- Milind
On 4/22/10 1:34 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
> I kind of dig the concept of being able to plug in a different backend,
> though I definitely thing we should get rid of the dead localmode code. Can
> you give an example of how this will simplify the codebase? Is it more than
> just GenericClass foo = new SpecificClass(), and the associated extra files?
>
> -D
>
> On Thu, Apr 22, 2010 at 1:25 PM, Arun C Murthy <[email protected]> wrote:
>
>> +1
>>
>> Arun
>>
>>
>> On Apr 22, 2010, at 11:35 AM, Richard Ding wrote:
>>
>> Pig has an abstraction layer (interfaces and abstract classes) to
>>> support multiple execution engines. After PIG-1053, Hadoop is the only
>>> execution engine supported by Pig. I wonder if we should remove this
>>> layer of code, and make Hadoop THE execution engine for Pig. This will
>>> simplify a lot the backend code.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> -Richard
>>>
>>>
>>>
>>>
>>
--
Milind Bhandarkar
Y!IM: GridSolutions
Tel: 408-203-5213
([email protected])