Proposal for refactoring of parsing logic in Pig
------------------------------------------------
Key: PIG-143
URL: https://issues.apache.org/jira/browse/PIG-143
Project: Pig
Issue Type: Improvement
Reporter: Pi Song
Assignee: Pi Song
This is a place holder for me to come up with a complete proposal. In the mean
time, I definitely need your opinions!!!
The basic concept is that now we do validation logic in parsing stage (for
example, file existence checking) which I think is not clean and difficult to
add new validation rules.
The way I propose briefly:-
- Only keep parsing logic in the parser and leave output of parsing logic being
unchecked logical plans.
- Create a new class called LogicalPlanValidatorManager which is responsible
for validation job.
- A new validation logic will be subclassing LogicalPlanValidator
- We can implement chaining of LogicalPlanValidator inside
LogicalPlanValidatorManager to allow new LogicalPlanValidator to be added
easily. When plugging in new logic, we do it here. Therefore a new
LogicalPlanValidator can be implemented like a plug-in.
Here is a list of possible LogicalPlanValidators in my mind (Please add what
you want):-
- The first LogicalPlanValidator to be implemented is FileExistence validator
which is from the current logic we have.
- Second LogicalPlanValidator is to sort out filename conflict (At the moment
you can save/load same file over and over again in the same plan, this is very
confusing. Possibly we should not allow same file name in any single plan?)
- Meta data checking + type system checking as mentioned in Pig-142
The common way to implement a LogicalPlanValidator is based on Visitor pattern.
Whether this is universal for all cases or not, I need to think through more.
The merit of implementing this proposal will be based on the number of
validation rules we actually need. If we don't have so many things to check, it
will become just a nice feature that doesn't have much value. However, I
believe at least it will make the parsing logic cleaner.
This proposal only applies to the LogicalPlan. For PhysicalPlan, where
validation logics (backend specific) are required. The same concept can be
applied.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.