Hello, I am currently playing around with Hive Semantic Analysis code, to understand how DAGs or Map Reduce plans are generated from Abstract Syntax Trees. The idea is to explore various possible DAGs and compare their performance based on execution run time.
The function "analyzeInternal" seems to be handling the entire the plan generation process. The different steps (at a high level) as described in the comment section are: 1. Get Resolved Parse Tree from Syntax Tree 2. Get OP tree (Operator tree?) from Resolved parse tree 3. Deduce Result Set schema 4. Generate Parse Context 5. Do View creation 6. Collect Table Access stats 7. Perform Logical Optimization 8. Get Column Access Stats 9. Optimize Physical OP tree. 10. Translate to target execution engine. I understand that step 7 (Logical Optimization) applies multiple transforms ( e.g. Join Reordering, Constant Propagation, Predicate pushdown) to alter the AST and thus, different DAGs can be obtained by choosing whether to apply or not apply certain transformations. Can changes to the code in Steps 1-2 and 9 also possibly affect the resulting DAGs ? How does the AST get affected in these steps ? Any pointers / explanations will be helpful. Thanks, Raajay
