[ 
https://issues.apache.org/jira/browse/PIG-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592725#action_12592725
 ] 

Pi Song commented on PIG-143:
-----------------------------

@Alan

1) But if you declare it as a visitor and don't use visitor functionalities, 
isn't it confusing?

2) That example might not be good enough. How about this one? 
- I want to know if ( Count(OperatorX) > Count(OperatorY) )  AND all the 
operatorXs are connected to operatorZs.
So here you have multiple visitor logics (might be multiple walkers as well). 
If we just stick one visitor as validator itself this  will lead to a question 
"Why this visitor logic is associated with validators but not for others?"

3) So again, because conceptually they are not the same. At least we should not 
force them to be the same. I agree that in most cases, a validator should be 
implemented using one visitor so the most logical way to do is to create a 
special visitor+validator that has the boilerplate code in it. This way 
validators that use only one visitor can be implemented more easily but 
unfortunately there is no way to do such a thing in Java. A work around is to 
convert some classes to interfaces but that might be too intrusive.


SchemaMerge is in response to what you've said here:-

[pi] If I have tuple schemas [ int, int, int] and [long, long, long]. Are they 
compatible? This seems like parameters of type parameters problem in Java (e.g. 
List<List<Dog>> and List<ArrayList<Dog>>). I will have to think through it if 
we still don't know the solution.
[alan] On my to do list is to add a method to Schema to do schema merging.
This would handle exactly issues like this, where type promotion needs to be
done, and throwing errors in cases where types can't be promoted.



> Proposal for refactoring of parsing logic in Pig
> ------------------------------------------------
>
>                 Key: PIG-143
>                 URL: https://issues.apache.org/jira/browse/PIG-143
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>            Reporter: Pi Song
>            Assignee: Pi Song
>         Attachments: ParserDrawing.png, pigtype_cycle_check.patch, 
> PostParseValidation_WorkingVersion_1.patch, SchemaMerge.patch
>
>
> h2. Pig Script Parser Refactor Proposal 
> This is my initial proposal on pig script parser refactor work. Please note 
> that I need your opinions for improvements.
> *Problem*
> The basic concept is around the fact that currently we do validation logics 
> in parsing stage (for example, file existence checking) which I think is not 
> clean and difficult to add new validation rules. In the future, we will need 
> to add more and more validation logics to improve usability.
> *My proposal:-*  (see [^ParserDrawing.png])
> - Only keep parsing logic in the parser and leave output of parsing logic 
> being unchecked logical plans. (Therefore the parser only does syntactic 
> checking)
> - Create a new class called LogicalPlanValidationManager which is responsible 
> for validations of the AST from the parser.
> - A new validation logic will be subclassing LogicalPlanValidator 
> - We can chain a set of LogicalPlanValidators inside 
> LogicalPlanValidationManager to do validation work. This allows a new 
> LogicalPlanValidator to be added easily like a plug-in. 
> - This greatly promotes modularity of the validation logics which  is 
> +particularly good when we have a lot of people working on different things+ 
> (eg. streaming may require a special validation logic)
> - We can set the execution order of validators
> - There might be some backend specific validations needed when we implement 
> new execution engines (For example a logical operation that one backend can 
> do but others can't).  We can plug-in this kind of validations on-the-fly 
> based on the backend in use.
> *List of LogicalPlanValidators extracted from the current parser logic:-*
> - File existence validator
> - Alias existence validator
> *Logics possibly be added in the very near future:-*
> - Streaming script test execution
> - Type checking + casting promotion + type inference
> - Untyped plan test execution
> - Logic to prevent reading and writing from/to the same file
> The common way to implement a LogicalPlanValidator will be based on Visitor 
> pattern. 
> *Cons:-*
>  - By having every validation logic traversing AST from the root node every 
> time, there is a performance hit. However I think this is neglectable due to 
> the fact that Pig is very expressive and normally queries aren't too big (99% 
> of queries contain no more than 1000 AST nodes).
> *Next Step:-*
> LogicalPlanFinalizer which is also a pipeline except that each stage can 
> modify the input AST. This component will generally do a kind of global 
> optimizations.
> *Further ideas:-*
> - Composite visitor can make validations more efficient in some cases but I 
> don't think we need
> - ASTs within the pipeline never change (read-only) so validations can be 
> done in parallel to improve responsiveness. But again I don't think we need 
> this unless we have so many I/O bound logics.
> - The same pipeline concept can also be applied in physical plan 
> validation/optimization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to