Hi Sudeep

You can also look at join optimizations like map join, bucketed map join,sort 
merge join etc and choose the right one that fits your requirement.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins



 
Regards,
Bejoy KS


________________________________
 From: sudeep tokala <sudeeptok...@gmail.com>
To: user@hive.apache.org 
Sent: Tuesday, August 14, 2012 11:00 PM
Subject: Re: OPTIMIZING A HIVE QUERY
 

hi Bertrand,
 
Thanks for the reply.
 
My question was every join in a hive query would constitute to a Mapreduce job.
Mapreduce job goes through serialization and deserilaization of objects Isnt it 
a overhead.
 
Store data in the smarter way? can you please elaborate on this.
 
Regards
Sudeep


On Tue, Aug 14, 2012 at 11:39 AM, Bertrand Dechoux <decho...@gmail.com> wrote:

You may want to be clearer. Is your question : how can I change the 
serialization strategy of Hive? (If so I let other users answer and I am also 
interested in the answer.)
>
>Else the answer is simple. If you want to join data which can not be stored 
>into memory, you need to serialize them. The only solution is to store the 
>data in a smarter way which would not require you to do the join. By the way, 
>how do you know the serialisation is the bottleneck?
>
>Bertrand 
>
>
>
>On Tue, Aug 14, 2012 at 5:11 PM, sudeep tokala <sudeeptok...@gmail.com> wrote:
>
>
>>
>>
>>On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala <sudeeptok...@gmail.com> 
>>wrote:
>>
>>Hi all,
>>> 
>>>How to avoid serialization and deserialization overhead in hive join query ? 
>>>will this optimize my query performance.
>>> 
>>>Regardssudeep
>>
>
>
>-- 
>Bertrand Dechoux
>

Reply via email to