[
https://issues.apache.org/jira/browse/PIG-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551817
]
Antonio Magnaghi commented on PIG-32:
-------------------------------------
Attaching to the bug this high level summary that I sent out to the mailing
list few days back.
Have discussed this with Ben, one aspect we talked about was to estend the API
to provide a way to collect logging and debugging information.
________________________________________
From: Antonio Magnaghi
Sent: Monday, December 10, 2007 9:29 AM
To: '[email protected]'
Subject: Abstraction layer: execution engine (PIG-32)
I'm starting to work on the portion of the abstraction layer about the
execution engine for the separation of front-end from back-end.
Based on some previous discussions with various folks, including Trevor
Strohman from the Galago project, I think it is possible to identify some
requirements/changes that I've summarize below (in addition to what is
currently posted at: http://wiki.apache.org/pig/PigAbstractionLayer.)
I would like to get some feedback on these points and whether I have left out
aspects that'd need to be considered as well.
Thanks,
-a.
Front-End:
Change logical plan representation: goal is to change the representation of
logical plans so that:
• details pertaining to the physical query plan execution are not present
anymore in the front-end;
• a new logical plan submitted to the back-end can reference a portion
(or alias) of another logical plan
Aspects affected by the changes above are:
1. need to remove data collectors and logic to manage data-pipes from the
eval specs and cond's of logical operators. These data structures are used in
the case of the local execution mode. We can add physical eval specs and cond's
where data pipes and data collectors are set up. This has the disadvantage of
creating extra code (similar to the code for logical eval specs and logical
cond's), but the overall separation of the logical aspects from the physical
execution should be much cleaner.
2. need to remove the table of query results, where aliases are mapped to
intermediate results. This data structure is populated when the logical plan is
compiled. The concept of intermediate results does not seem to belong in the
front-end. (Information about the generation of intermediate results will be
maintained in the back-end)
3. extend representation of logical operators assigning to them a scope
and a unique id within the scope. The motivation for doing this would be that
new logical plans submitted to the back end can reference previous logical
plans (or parts of it) via a (scope id, node id) pair. Having the concept of
scope can provide support in the back-end for purging information about
entities that go out of scope. For instance, the session id could be used as
scope to garbage collect entities in the back-end no longer needed.
4. need to add a catalog that maps aliases to logical trees. For instance,
when a store operation is encountered, the front-end can determine the set of
dependent logical trees to serialize and send to the back-end or (scope, id) of
previous plans to reference.
5. Serialization process from the front-end to the back-end can produce a
representation of the logical plan and its dependencies that include (scope,
id) of each operators to send to the back end.
Back-End:
1. back-end would maintain table of intermediate results
2. compilation of logical plan to physical plan would take place in the
back-end
3. a local back-end would generate physical trees using the physical eval
specs and physical cond's (as described above)
4. a Hadoop back-end would compile logical plan to map/reduce
> Abstraction Layer to decouple Pig from Back-End
> -----------------------------------------------
>
> Key: PIG-32
> URL: https://issues.apache.org/jira/browse/PIG-32
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Antonio Magnaghi
> Assignee: Antonio Magnaghi
> Attachments: DataStorage.diff, DataStorage20071212.diff
>
>
> I'm opening a new issue to track the development work to support an
> abstraction layer for Pig as defined at
> http://wiki.apache.org/pig/PigAbstractionLayer
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.