[ https://issues.apache.org/jira/browse/HIVE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283856#comment-13283856 ]
Edward Capriolo commented on HIVE-3027: --------------------------------------- Patches welcome. I am sure if you re factor the code and make it better no one will be adverse . > The optimizer architecture of Hive is terrible, need code refactoring > --------------------------------------------------------------------- > > Key: HIVE-3027 > URL: https://issues.apache.org/jira/browse/HIVE-3027 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 0.8.1 > Reporter: anders > Labels: architecture, optimizer, ysmart > > Now I want to add a complete cost-based optimization for hive. but when I > begin the work, I found it very difficult to do using current hive > optimization framework. The current code of hive, optimizations are all done > after generating DAG of operators. It is a awful design and makes me mad. For > example, the map-side optimization, it scans the whole operators' DAG and try > to find the operators that can be replaced by map-operation and then replace > it. How terrible and stupid the code is!!! The terrible code expands to 1000 > lines, and only implements the map-side optimizations!!! > In my opinion, optimization shouldn't be done in a separated step, differnt > optimization should be done in appropriate time. For example, join reorder, > should be done when we parse the input query, and we can generate Map-Reduce > operators or only Map-Operator for each join according to the cost > estimation. And, in the process, we can do join and aggreagation merge, and, > we shoud push down predicate in proper time and generate proper data > sturcture, to insure the cose-estimation module can fetch corresponding > predicate of each base table for estimating JOIN cost. How concise and > graceful the code will be if we do the optimization this way!!! But Now, in > order to complying with the Optimiser framework of Hive, I have to write lots > of ugly code with amazing redundancy, and, the code is very very difficult to > debug!!!! Now there is a patch of cost-based JOIN reorder and merge optimizer > called YSMART, I glance at it. It use 6000+ code and is difficult to read!! > And it's optimization is incompleted. > The optimizer architecture of Hive is terrible, How can I do now? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira