[ 
https://issues.apache.org/jira/browse/PIG-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pi Song updated PIG-143:
------------------------

    Attachment: ParserDrawing.png

attach the image

> Proposal for refactoring of parsing logic in Pig
> ------------------------------------------------
>
>                 Key: PIG-143
>                 URL: https://issues.apache.org/jira/browse/PIG-143
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>            Assignee: Pi Song
>         Attachments: ParserDrawing.png
>
>
> This is  a place holder for me to come up with a complete proposal. In the 
> mean time, I definitely need your opinions!!!
> The basic concept is that now we do validation logic in parsing stage (for 
> example, file existence checking) which I think is not clean and difficult to 
> add new validation rules.
> The way I propose briefly:-
> - Only keep parsing logic in the parser and leave output of parsing logic 
> being unchecked logical plans.
> - Create a new class called LogicalPlanValidatorManager which is responsible 
> for validation job.
> - A new validation logic will be subclassing LogicalPlanValidator
> - We can implement chaining of LogicalPlanValidator inside 
> LogicalPlanValidatorManager to allow new LogicalPlanValidator to be added 
> easily. When plugging in new logic, we do it here. Therefore a new 
> LogicalPlanValidator can be implemented like a plug-in.
> Here is a list of possible LogicalPlanValidators in my mind (Please add what 
> you want):- 
> - The first LogicalPlanValidator to be implemented is FileExistence validator 
> which is from the current logic we have.
> - Second LogicalPlanValidator is to sort out filename conflicts (At the 
> moment you can save/load same file over and over again in the same plan, this 
> is very confusing. Possibly we should not allow same file name in any single 
> plan?)
> - Test run of streaming scripts before going to real execution
> - Meta data checking + type system checking as mentioned in Pig-142
> The common way to implement a LogicalPlanValidator is based on Visitor 
> pattern. Whether this is universal for all cases or not, I need to think 
> through more.
> According to this, parsing errors will be detected first in the parsing 
> stage. Errors from validations are detected in the priority order that 
> LogicalPlanValidators are organized in LogicalPlanValidatorManager.
> This proposal only applies to the LogicalPlan. For PhysicalPlan, where 
> validation logics (backend specific) are required. The same concept can be 
> applied.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to