[
https://issues.apache.org/jira/browse/PIG-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pi Song updated PIG-143:
------------------------
Attachment: ParserDrawing.png
attach the image
> Proposal for refactoring of parsing logic in Pig
> ------------------------------------------------
>
> Key: PIG-143
> URL: https://issues.apache.org/jira/browse/PIG-143
> Project: Pig
> Issue Type: Improvement
> Reporter: Pi Song
> Assignee: Pi Song
> Attachments: ParserDrawing.png
>
>
> This is a place holder for me to come up with a complete proposal. In the
> mean time, I definitely need your opinions!!!
> The basic concept is that now we do validation logic in parsing stage (for
> example, file existence checking) which I think is not clean and difficult to
> add new validation rules.
> The way I propose briefly:-
> - Only keep parsing logic in the parser and leave output of parsing logic
> being unchecked logical plans.
> - Create a new class called LogicalPlanValidatorManager which is responsible
> for validation job.
> - A new validation logic will be subclassing LogicalPlanValidator
> - We can implement chaining of LogicalPlanValidator inside
> LogicalPlanValidatorManager to allow new LogicalPlanValidator to be added
> easily. When plugging in new logic, we do it here. Therefore a new
> LogicalPlanValidator can be implemented like a plug-in.
> Here is a list of possible LogicalPlanValidators in my mind (Please add what
> you want):-
> - The first LogicalPlanValidator to be implemented is FileExistence validator
> which is from the current logic we have.
> - Second LogicalPlanValidator is to sort out filename conflicts (At the
> moment you can save/load same file over and over again in the same plan, this
> is very confusing. Possibly we should not allow same file name in any single
> plan?)
> - Test run of streaming scripts before going to real execution
> - Meta data checking + type system checking as mentioned in Pig-142
> The common way to implement a LogicalPlanValidator is based on Visitor
> pattern. Whether this is universal for all cases or not, I need to think
> through more.
> According to this, parsing errors will be detected first in the parsing
> stage. Errors from validations are detected in the priority order that
> LogicalPlanValidators are organized in LogicalPlanValidatorManager.
> This proposal only applies to the LogicalPlan. For PhysicalPlan, where
> validation logics (backend specific) are required. The same concept can be
> applied.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.