Re: Impact of partitioning on certain queries

Jörn Franke Fri, 08 Jan 2016 01:54:58 -0800

Try explain dependency


> On 08 Jan 2016, at 10:47, Mich Talebzadeh <m...@peridale.co.uk> wrote:
> 
> Thanks Gopal.
>  
> Basically the following is true:
>  
> 1.    The storage layer is HDFS
> 2.    The execution engine is MR, Tez, Spark etc
> 3.    The access layer is Hive
>  
> When we say the access layer is Hive, is the assumption correct that we are 
> referring to optimiser (loosly related to the optimiser in RDBMS). For 
> example is Hive optimiser aware of the number of underlying partitions. The 
> reason I am asking this question is that with EXPLAIN I only see Table scan 
> and it does refer to any partition or partition elimination?
>  
>  
> Cheers
>  
>  
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Peridale Technology Ltd, its 
> subsidiaries or their employees, unless expressly so stated. It is the 
> responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept 
> any responsibility.
>  
>  
> -----Original Message-----
> From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal 
> Vijayaraghavan
> Sent: 08 January 2016 09:34
> To: user@hive.apache.org
> Subject: Re: Impact of partitioning on certain queries
>  
>  
> > Ok we hope that partitioning improves performance where the predicate
> >is on partitioned columns
>  
> Nope.
>  
> Partitioning *only* improves performance if your queries run with
>  
> set hive.mapred.mode=strict;
>  
> That's the "use strict" easy way to make sure you're writing good queries.
>  
> Even then, schema design in hive is something you need to learn with the 
> assumption that neither the storage layer, nor the compute layer is part of 
> "hive".
>  
> It floats itself in an "access" layer above both. Not sure there's any legacy 
> tech to draw parallels with that.
>  
> If you haven't seen this before, here's an example of the problem
>  
> http://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-tren
> ches/24
>  
>  
> Cheers,
> Gopal

Re: Impact of partitioning on certain queries

Reply via email to