[jira] [Commented] (HIVE-3027) The optimizer architecture of Hive is terrible, need code refactoring

Edward Capriolo (JIRA) Fri, 25 May 2012 17:49:25 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283856#comment-13283856
 ]


Edward Capriolo commented on HIVE-3027:
---------------------------------------

Patches welcome. I am sure if you re factor the code and make it better no one 
will be adverse .
                
> The optimizer architecture of Hive is terrible, need code refactoring
> ---------------------------------------------------------------------
>
>                 Key: HIVE-3027
>                 URL: https://issues.apache.org/jira/browse/HIVE-3027
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 0.8.1
>            Reporter: anders
>              Labels: architecture, optimizer, ysmart
>
> Now I want to add a complete cost-based optimization for hive. but when I 
> begin the work, I found it very difficult to do using current hive 
> optimization framework. The current code of hive, optimizations are all done 
> after generating DAG of operators. It is a awful design and makes me mad. For 
> example, the map-side optimization, it scans the whole operators' DAG and try 
> to find the operators that can be replaced by map-operation and then replace 
> it. How terrible and stupid the code is!!! The terrible code expands to 1000 
> lines, and only implements the map-side optimizations!!! 
> In my opinion, optimization shouldn't be done in a separated step, differnt 
> optimization should be done in appropriate time. For example, join reorder, 
> should be done when we parse the input query, and we can generate Map-Reduce 
> operators or only Map-Operator for each join according to the cost 
> estimation. And, in the process, we can do join and aggreagation merge, and, 
> we shoud push down predicate in proper time and generate proper data 
> sturcture, to insure the cose-estimation module can fetch corresponding 
> predicate of each base table for estimating JOIN cost. How concise and 
> graceful the code will be if we do the optimization this way!!!  But Now, in 
> order to complying with the Optimiser framework of Hive, I have to write lots 
> of ugly code with amazing redundancy, and, the code is very very difficult to 
> debug!!!! Now there is a patch of cost-based JOIN reorder and merge optimizer 
> called YSMART, I glance at it. It use 6000+ code and is difficult to read!! 
> And it's optimization is incompleted.
> The optimizer architecture of Hive is terrible, How can I do now?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3027) The optimizer architecture of Hive is terrible, need code refactoring

Reply via email to