Thanks Bejoy.I did it thank you Sent from my iPhone
On Sep 28, 2012, at 2:52 PM, "Bejoy KS" <bejoy...@yahoo.com> wrote: > Hi Abshiek > > I don't think Partition By and Clustered By is supported in CTAS. > > You need to create the bucketed > Table separately, then enable hive.enforce.bucketing , after that use Select > statement from the parent table to load data into the bucketed one. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > From: Abhishek <abhishek.dod...@gmail.com> > Date: Fri, 28 Sep 2012 11:14:56 -0400 > To: Bejoy Ks<bejoy...@yahoo.com> > ReplyTo: user@hive.apache.org > Cc: user@hive.apache.org<user@hive.apache.org> > Subject: Re: Performance tuning in hive > > Hi Bejoy, > > How to use CTAS with Clustered By. > > I am getting following error when doing > > Create table as select > > CTAS does not support partitioning in the target table. > > Regards > Abhi > > Sent from my iPhone > > On Sep 28, 2012, at 5:32 AM, Bejoy KS <bejoy...@yahoo.com> wrote: > >> Hi Abshiek >> >> Which optimization you have to choose totally depends o your queries or the >> kind of queries fired on those tables. Based on that you need to bucket and >> index them to get better performance. From a birds eye point of view, >> bucketing + indexing + map joins would be a good combination if those suits >> your data set. >> >> Regards, >> Bejoy KS >> >> From: Abhishek <abhishek.dod...@gmail.com> >> To: "user@hive.apache.org" <user@hive.apache.org> >> Cc: "user@hive.apache.org" <user@hive.apache.org> >> Sent: Friday, September 28, 2012 5:16 AM >> Subject: Re: Performance tuning in hive >> >> Hi Bejoy, >> >> Thanks for the reply.Can I know whether combination of >> 1) Indexing and Bucketing >> Or >> 2) bucketing with Rc file >> Or >> 3) sequence file with bucketing and indexing >> Or >> 4) map join with indexes >> Or >> >> Any other combination of above mentioned or non mentioned, would fetch a >> better performance. >> >> Regards >> Abhi >> >> Sent from my iPhone >> >> On Sep 27, 2012, at 2:44 PM, Bejoy KS <bejoy...@yahoo.com> wrote: >> >>> Hi Abshiek >>> >>> You can have a look at join optimizations as well as group by optimizations >>> >>> Join optimization - Based on your data sets you can go in with map side >>> join or bucketed map join or >>> to enable map join -> set hive.auto.convert.join = true; >>> >>> to enable bucketed map join -> set hive.optimize.bucketmapjoin = true ( >>> The prerequisite here is both the tables should be bucketed on the join >>> column.) >>> If the data in buckets are sorted then you can go in with a sort merge join >>> as well, you need to enable the following properties >>> set >>> hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; >>> set hive.optimize.bucketmapjoin = true; >>> set hive.optimize.bucketmapjoin.sortedmerge = true; >>> >>> For details you can refer the following url >>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins >>> >>> Group By OPtimization - You can go ahead with a few group by optimizations >>> as well. A few pointers in here >>> http://mail-archives.apache.org/mod_mbox/hive-user/201209.mbox/%3cb55ff166-239e-4e39-bf92-3ae59eb78...@gmail.com%3E >>> >>> >>> Hive Indexes - Join and Group by gets optimized better with buckets. Based >>> on your query you need to pre determine how your tables need to be >>> bucketed. Indexing also gives you great performance advantage over queries >>> that involves group by and where. Join optimization using indexes is in >>> progress >>> https://issues.apache.org/jira/browse/HIVE-2845 >>> >>> >>> RC file or Sequence File is a choice to be made based on the query >>> patterns. If you are querying only a few columns then RC files gives you a >>> performance edge but if the queries are spanned across pretty much all >>> columns then use the more generalized Sequence Files. >>> >>> >>> Regards, >>> Bejoy KS >>> >>> From: Abhishek <abhishek.dod...@gmail.com> >>> To: Hive <user@hive.apache.org> >>> Sent: Thursday, September 27, 2012 7:03 PM >>> Subject: Performance tuning in hive >>> >>> Hi all, >>> >>> I am trying to increase the performance of some queries in hive, all >>> queries mostly contain left outer join , group by and conditional checks, >>> union all. I have over riden some properities in hive shell >>> >>> Set io.sort.mb=512 >>> Set io.sort.factor=100 >>> Set mapred.child.jvm.opts=-Xmx2048mb >>> Set hive.map.aggr=true >>> Set hive.exec.parallel=true >>> Set mapred.tasks.reuse.num.tasks=-1 >>> Set hive.mapred.map.speculative.execution=false >>> Set hive.mapred.reduce.speculative.execution=false >>> >>> I got some performance gain. >>> >>> Still want to improve the performance of these queries >>> >>> Which of the following gives me better performance >>> >>> Rcfile >>> Indexing >>> Bucketing >>> Sequence file >>> Combination of above >>> >>> Or >>> >>> Some configuration parameter tuning >>> >>> Which one from above yields good performance?? >>> >>> Thanks in advance. >>> >>> Regards >>> Abhi >>> >>> >>> >> >>