Hi all, I am trying to increase the performance of some queries in hive, all queries mostly contain left outer join , group by and conditional checks, union all. I have over riden some properities in hive shell
Set io.sort.mb=512 Set io.sort.factor=100 Set mapred.child.jvm.opts=-Xmx2048mb Set hive.map.aggr=true Set hive.exec.parallel=true Set mapred.tasks.reuse.num.tasks=-1 Set hive.mapred.map.speculative.execution=false Set hive.mapred.reduce.speculative.execution=false I got some performance gain. Still want to improve the performance of these queries Which of the following gives me better performance Rcfile Indexing Bucketing Sequence file Combination of above Or Some configuration parameter tuning Which one from above yields good performance?? Thanks in advance. Regards Abhi