Re: Need urgent help on hive query performance

kulkarni.swar...@gmail.com Fri, 30 May 2014 16:00:25 -0700

> It has innumerable no of joins. Since its client specific query, u
understand I cannot share. Sorry about that


Like I said, Joins are slow and in not done correctly could have terrible
performance. A couple of handy techniques depend on how exactly are you
trying to perform the join. For instance, if you are trying to join a
smaller table to a larger one, a map join could work well for you where the
smaller table is kept in-memory when the join is performed. Also if you are
able to break your table down to smaller buckets, you might as well be able
to use a bucketed map join for instance. Following link should be
helpful[1][2].

Hope this helps.

[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
[2]
http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables


On Fri, May 30, 2014 at 5:38 PM, <shouvanik.hal...@accenture.com> wrote:

>  Pls find the answers
>
>
>
>
>
>
>
> *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
> *Sent:* Friday, May 30, 2014 3:34 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> I feel it's pretty hard to answer this without understanding the following:
>
>
>
> 1.      What exactly are you trying to query? CSV? Avro? ....
>
> HIVE table
>
> 2.      Where is your data? HDFS? HBase? Local filesystem?
>
> Data is in s3
>
> 3.      What version of hive are you using?
>
> Hive 0.12
>
> 4.      What is an example of a query that is slow? Some queries like
> joins and stuff would be inherently slower than other simpler ones(though
> can be optimized).
>
> It has innumerable no of joins. Since its client specific query, u
> understand I cannot share. Sorry about that
>
>
>
> Thanks,
>
>
>
> --
> Swarnim
>
>
>
> On Fri, May 30, 2014 at 5:32 PM, <shouvanik.hal...@accenture.com> wrote:
>
> Can you please give a specific example or blog to refer to. I did not
> understand
>
>
>
> *From:* Ashish Garg [mailto:gargcreation1...@gmail.com]
> *Sent:* Friday, May 30, 2014 3:31 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> try partitioning the table and run the queries which are partition
> specific. Hope this helps.
>
> Thanks and Regards,
>
> Ashish Garg.
>
>
>
> On Fri, May 30, 2014 at 6:05 PM, <shouvanik.hal...@accenture.com> wrote:
>
> Hi,
>
>
>
> Does anybody  help urgently on optimizing hive query performance? I am
> looking more Hadoop tuning point of view. Currently, small amount of table
> takes much time to query?
>
>
>
> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task
> Nodes.
>
>
>
> Quick help is much appreciated.
>
>
>
> Thanks,
>
> Shouvanik
>
>
>  ------------------------------
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
>
>
>
>
> --
> Swarnim
>



-- 
Swarnim

Re: Need urgent help on hive query performance

Reply via email to