we did empirical test on a partitioned table with ~50 million rows, and 
filtering from a view (which includes an explode() on an array column) vs 
filtering the underlining table on a specific partition yielding similar 
results.i also noticed that an "explain dependency" statement on both the view 
and the table shows the analizer does indeed pick up the right partition on 
both cases (view and table).
      From: Mich Talebzadeh <m...@peridale.co.uk>
 To: user@hive.apache.org; 'cobby cohen' <ququr...@yahoo.com> 
 Sent: Monday, March 16, 2015 9:36 PM
 Subject: RE: view over partitioned table
   
#yiv3664821214 #yiv3664821214 -- _filtered #yiv3664821214 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv3664821214 
{panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv3664821214 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv3664821214 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv3664821214 
#yiv3664821214 p.yiv3664821214MsoNormal, #yiv3664821214 
li.yiv3664821214MsoNormal, #yiv3664821214 div.yiv3664821214MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv3664821214 a:link, 
#yiv3664821214 span.yiv3664821214MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv3664821214 a:visited, #yiv3664821214 
span.yiv3664821214MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv3664821214 
p.yiv3664821214MsoAcetate, #yiv3664821214 li.yiv3664821214MsoAcetate, 
#yiv3664821214 div.yiv3664821214MsoAcetate 
{margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv3664821214 
p.yiv3664821214msonormal, #yiv3664821214 li.yiv3664821214msonormal, 
#yiv3664821214 div.yiv3664821214msonormal 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv3664821214 
p.yiv3664821214msochpdefault, #yiv3664821214 li.yiv3664821214msochpdefault, 
#yiv3664821214 div.yiv3664821214msochpdefault 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv3664821214 
span.yiv3664821214msohyperlink {}#yiv3664821214 
span.yiv3664821214msohyperlinkfollowed {}#yiv3664821214 
span.yiv3664821214emailstyle17 {}#yiv3664821214 p.yiv3664821214msonormal1, 
#yiv3664821214 li.yiv3664821214msonormal1, #yiv3664821214 
div.yiv3664821214msonormal1 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv3664821214 
span.yiv3664821214msohyperlink1 
{color:blue;text-decoration:underline;}#yiv3664821214 
span.yiv3664821214msohyperlinkfollowed1 
{color:purple;text-decoration:underline;}#yiv3664821214 
span.yiv3664821214emailstyle171 {color:windowtext;}#yiv3664821214 
p.yiv3664821214msochpdefault1, #yiv3664821214 li.yiv3664821214msochpdefault1, 
#yiv3664821214 div.yiv3664821214msochpdefault1 
{margin-right:0cm;margin-left:0cm;font-size:10.0pt;}#yiv3664821214 
span.yiv3664821214BalloonTextChar {}#yiv3664821214 
span.yiv3664821214EmailStyle29 {color:windowtext;}#yiv3664821214 
.yiv3664821214MsoChpDefault {font-size:10.0pt;} _filtered #yiv3664821214 
{margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv3664821214 
div.yiv3664821214WordSection1 {}#yiv3664821214 Hi Cobby,  In your statement and 
I quote:  “Empirical tests, by the show, suggest that indeed, partition 
optimization does take place. “  Do you have an example of this please?   
Thanks,  Mich  Creating in-memory Data Grid for Trading Systems with Oracle 
TimesTen and Coherence Cache  NOTE: The information in this email is 
proprietary and confidential. This message is for the designated recipient 
only, if you are not the intended recipient, you should destroy it immediately. 
Any information in this message shall not be understood as given or endorsed by 
Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. 
It is the responsibility of the recipient to ensure that this email is virus 
free, therefore neither Peridale Ltd, its subsidiaries nor their employees 
accept any responsibility.  

From: cobby cohen [mailto:ququr...@yahoo.com] 
Sent: 16 March 2015 16:14
To: user@hive.apache.org
Subject: Re: view over partitioned table  hi,i am using CDH 5.2 (hive 13).as 
for partitioned views, to my understanding they are more about abstracting a 
non partitioned back end. My questing is related to how and if the query 
optimizer can detect possible uses of partitioned column, even if it might be 
"masked" by a view (or multiple view).Empirical tests, by the show, suggest 
that indeed, partition optimization does take place.  regards, cobby.  From: 
"Moore, Douglas" <douglas.mo...@thinkbiganalytics.com>
To: "user@hive.apache.org" <user@hive.apache.org>; 'cobby cohen' 
<ququr...@yahoo.com> 
Sent: Monday, March 16, 2015 4:19 PM
Subject: Re: view over partitioned table  Mich,  What version of Hive are you 
running?Have you seen 
this?https://cwiki.apache.org/confluence/display/Hive/PartitionedViews  - 
Douglas  From: Mich Talebzadeh <m...@peridale.co.uk>
Reply-To: <user@hive.apache.org>
Date: Sun, 15 Mar 2015 19:01:57 +0000
To: <user@hive.apache.org>, 'cobby cohen' <ququr...@yahoo.com>
Subject: RE: view over partitioned table  OK, This is the way I read it. 
Crearte table t1 .. partitioned by date will use horizontal partitioning as per 
common with any RDBMS say Oracle. The view I will create it as follows: hive> 
create view v1 as select * from t1;OKTime taken: 0.073 seconds hive> analyze 
table t1 partition (date) compute statistics;Query ID = 
hduser_20150315184646_af3890b2-4079-4f81-b38b-da9af67b2751Total jobs = 
1Launching Job 1 out of 1-----Stage-Stage-0: Map: 1   Cumulative CPU: 0.6 sec   
HDFS Read: 299 HDFS Write: 0 SUCCESSTotal MapReduce CPU Time Spent: 600 msecOK 
hive> insert into table t1 partition (date)  values (1,'2015-04-04');hive> 
insert into table t1 partition (date)  values (1,'2015-05-04'); Help to update 
stats hive> analyze table t1 partition (date) compute statistics; select from 
the view after exiting hive and back  hive> explain select id from v1 where 
date = '2015-04-04';OKSTAGE DEPENDENCIES:  Stage-0 is a root stage STAGE PLANS: 
 Stage: Stage-0    Fetch Operator      limit: -1      Processor Tree:        
TableScan          alias: t1          Statistics: Num rows: 1 Data size: 1 
Basic stats: COMPLETE Column stats: NONE          Select Operator            
expressions: id (type: string)            outputColumnNames: _col0            
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE   
         ListSink Time taken: 0.627 seconds, Fetched: 17 row(s)  Now select the 
same from base table hive> explain select id from t1 where date = 
'2015-04-04';OKSTAGE DEPENDENCIES:  Stage-0 is a root stage STAGE PLANS:  
Stage: Stage-0    Fetch Operator      limit: -1      Processor Tree:        
TableScan          alias: t1          Statistics: Num rows: 1 Data size: 1 
Basic stats: COMPLETE Column stats: NONE          Select Operator            
expressions: id (type: string)            outputColumnNames: _col0            
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE   
         ListSink Time taken: 0.597 seconds, Fetched: 17 row(s) The plan looks 
the same but it is slower on the view HTH Mich  NOTE: The information in this 
email is proprietary and confidential. This message is for the designated 
recipient only, if you are not the intended recipient, you should destroy it 
immediately. Any information in this message shall not be understood as given 
or endorsed by Peridale Ltd, its subsidiaries or their employees, unless 
expressly so stated. It is the responsibility of the recipient to ensure that 
this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor 
their employees accept any responsibility. From: cobby cohen 
[mailto:ququr...@yahoo.com] 
Sent: 15 March 2015 17:52
To: user@hive.apache.org
Subject: view over partitioned table in the following case: create table T1(id 
string)partitioned by (date date); create view V1(id, date)select id, date from 
T1 if i do the followingselect id from V1 where date = '2015-04-04'will the 
query optimize on T1's partitioning? and what if V1 is a bit more complex, 
further depending on other views (all of which expose the partitioned column 
as-is)? thx!  

  

Reply via email to