Hi Sudeep You can also look at join optimizations like map join, bucketed map join,sort merge join etc and choose the right one that fits your requirement.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins Regards, Bejoy KS ________________________________ From: sudeep tokala <sudeeptok...@gmail.com> To: user@hive.apache.org Sent: Tuesday, August 14, 2012 11:00 PM Subject: Re: OPTIMIZING A HIVE QUERY hi Bertrand, Thanks for the reply. My question was every join in a hive query would constitute to a Mapreduce job. Mapreduce job goes through serialization and deserilaization of objects Isnt it a overhead. Store data in the smarter way? can you please elaborate on this. Regards Sudeep On Tue, Aug 14, 2012 at 11:39 AM, Bertrand Dechoux <decho...@gmail.com> wrote: You may want to be clearer. Is your question : how can I change the serialization strategy of Hive? (If so I let other users answer and I am also interested in the answer.) > >Else the answer is simple. If you want to join data which can not be stored >into memory, you need to serialize them. The only solution is to store the >data in a smarter way which would not require you to do the join. By the way, >how do you know the serialisation is the bottleneck? > >Bertrand > > > >On Tue, Aug 14, 2012 at 5:11 PM, sudeep tokala <sudeeptok...@gmail.com> wrote: > > >> >> >>On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala <sudeeptok...@gmail.com> >>wrote: >> >>Hi all, >>> >>>How to avoid serialization and deserialization overhead in hive join query ? >>>will this optimize my query performance. >>> >>>Regardssudeep >> > > >-- >Bertrand Dechoux >