Alan Gates updated PIG-143:
Status: Resolved (was: Patch Available)
Done as part of the types work, which is now in trunk.
> Proposal for refactoring of parsing logic in Pig
> Key: PIG-143
> URL: https://issues.apache.org/jira/browse/PIG-143
> Project: Pig
> Issue Type: Sub-task
> Components: impl
> Reporter: Pi Song
> Assignee: Pi Song
> Attachments: FixSchemaCompare.patch, ParserDrawing.png,
> pigtype_cycle_check.patch, PostParseValidation_Set2.patch,
> PostParseValidation_Set2_v2.patch, PostParseValidation_Set3.patch,
> PostParseValidation_Set3_v2.patch, PostParseValidation_Set3_v3.patch,
> PostParseValidation_WorkingVersion_2.patch, SchemaMerge.patch
> h2. Pig Script Parser Refactor Proposal
> This is my initial proposal on pig script parser refactor work. Please note
> that I need your opinions for improvements.
> The basic concept is around the fact that currently we do validation logics
> in parsing stage (for example, file existence checking) which I think is not
> clean and difficult to add new validation rules. In the future, we will need
> to add more and more validation logics to improve usability.
> *My proposal:-* (see [^ParserDrawing.png])
> - Only keep parsing logic in the parser and leave output of parsing logic
> being unchecked logical plans. (Therefore the parser only does syntactic
> - Create a new class called LogicalPlanValidationManager which is responsible
> for validations of the AST from the parser.
> - A new validation logic will be subclassing LogicalPlanValidator
> - We can chain a set of LogicalPlanValidators inside
> LogicalPlanValidationManager to do validation work. This allows a new
> LogicalPlanValidator to be added easily like a plug-in.
> - This greatly promotes modularity of the validation logics which is
> +particularly good when we have a lot of people working on different things+
> (eg. streaming may require a special validation logic)
> - We can set the execution order of validators
> - There might be some backend specific validations needed when we implement
> new execution engines (For example a logical operation that one backend can
> do but others can't). We can plug-in this kind of validations on-the-fly
> based on the backend in use.
> *List of LogicalPlanValidators extracted from the current parser logic:-*
> - File existence validator
> - Alias existence validator
> *Logics possibly be added in the very near future:-*
> - Streaming script test execution
> - Type checking + casting promotion + type inference
> - Untyped plan test execution
> - Logic to prevent reading and writing from/to the same file
> The common way to implement a LogicalPlanValidator will be based on Visitor
> - By having every validation logic traversing AST from the root node every
> time, there is a performance hit. However I think this is neglectable due to
> the fact that Pig is very expressive and normally queries aren't too big (99%
> of queries contain no more than 1000 AST nodes).
> *Next Step:-*
> LogicalPlanFinalizer which is also a pipeline except that each stage can
> modify the input AST. This component will generally do a kind of global
> *Further ideas:-*
> - Composite visitor can make validations more efficient in some cases but I
> don't think we need
> - ASTs within the pipeline never change (read-only) so validations can be
> done in parallel to improve responsiveness. But again I don't think we need
> this unless we have so many I/O bound logics.
> - The same pipeline concept can also be applied in physical plan
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.