[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-06-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048517#comment-14048517
 ] 

Lefty Leverenz commented on HIVE-6492:
--

*hive.limit.query.max.table.partition* is documented in the wiki here:

* [Configuration Properties:  hive.limit.query.max.table.partition | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.limit.query.max.table.partition]

I also added a comment to HIVE-6586 so *hive.limit.query.max.table.partition* 
won't get lost in the shuffle when HIVE-6037 changes HiveConf.java.

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951150#comment-13951150
 ] 

Selina Zhang commented on HIVE-6492:


[~leftylev] Thanks for reminding! We can put 
This controls how many partitions can be scanned for each partitioned table. 
The default value -1 means no limit.  
What do you think? 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951695#comment-13951695
 ] 

Lefty Leverenz commented on HIVE-6492:
--

That's good, if that's enough information for users.  Though I'm curious what 
happens when a query exceeds the limit ... oh ... you explained that in your 
March 4th comment, the query fails with an error message.

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-27 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948934#comment-13948934
 ] 

Lefty Leverenz commented on HIVE-6492:
--

This adds *hive.limit.query.max.table.partition* to HiveConf.java but it needs 
a description.  There's plenty of description in the comments, but a release 
note would be helpful.  Then I could put it in the wiki, and make sure the 
description goes into the new HiveConf.java (via HIVE-6586) after HIVE-6037 
gets committed.

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948609#comment-13948609
 ] 

Alan Gates commented on HIVE-6492:
--

Ran the tests locally, all looks good.

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947223#comment-13947223
 ] 

Ashutosh Chauhan commented on HIVE-6492:


I left following comment:
bq. I thought you want this limit to be applied on cumulative partitions count 
or limit is meant for per TSOperator? 

I see you don't updated that part of code. Was it intentional? Currently, limit 
would be considered per TSOperator (table), not across all tables referred in 
query. Either way is fine with me, just want to confirm you intend limit per 
table, not across all tables in query.
Other than looks good.


 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-25 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947410#comment-13947410
 ] 

Selina Zhang commented on HIVE-6492:


Thanks, Ashutosh! Yes, it just limits partition per table scan intentionally. 
It based on the assumption that most of queries only involve one instance 
table. And it is more like a supplement for strict mode. 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947433#comment-13947433
 ] 

Ashutosh Chauhan commented on HIVE-6492:


Ok. +1

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-20 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942548#comment-13942548
 ] 

Selina Zhang commented on HIVE-6492:


Review request is here:
https://reviews.apache.org/r/19373/

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-07 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924459#comment-13924459
 ] 

Selina Zhang commented on HIVE-6492:


[~hagleitn] Thank you for the suggestions!

I will work on the suggestion 2 and move the code to the driver. Because 
currently I am working on a patch to shorten the execution time for the 
metadata only query (which is important for BI tools). I prefer leaving out 
metadata only query from this limitation.  What do you think?

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-06 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923257#comment-13923257
 ] 

Gunther Hagleitner commented on HIVE-6492:
--

[~selinazh] limit_partition_3.q:

The query (select count(*) from part) will succeed if you turn ON 
compute.query.using.stats and will fail if you turn it off. That's because in 
the first case it doesn't do a table scan, while in the second it does. (the 
limit_partition_3.q.out is hard to read, but you can see it there).

The (select hr from srcpart) ... yeah you're right. I missed that. Let me take 
another look.

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-06 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923657#comment-13923657
 ] 

Gunther Hagleitner commented on HIVE-6492:
--

Looked at it some more. Finally get what you were saying about metadata only. I 
think we can go two ways:

a) use patch as is. since metadata only still launches a job with potentially a 
lot of tasks (one split per file it seems).
b) fix it like you were, but change the variable to count files not partitions 
(you don't have access to partitions anymore in the lower layers.) and move the 
code to driver so it works for both mr and tez.

[~selinazh] - what works better for you? since i sent you on this wild goose 
chase, i can take another shot at updating it...

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-04 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920146#comment-13920146
 ] 

Gunther Hagleitner commented on HIVE-6492:
--

[~selinazh] - sorry if i messed up the metadata only part. Can you give me an 
example where the patch doesn't work?

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-04 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920237#comment-13920237
 ] 

Selina Zhang commented on HIVE-6492:


In the new test case limit_partition_2.q:
select distinct hr from srcpart;
should let pass because hr is the partition key. With the new patch, it is 
blocked:
FAILED: SemanticException Number of partitions scanned (=4) on table srcpart 
exceeds limit (=1). This is controlled by hive.limit.query.max.table.partition.

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-04 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920358#comment-13920358
 ] 

Selina Zhang commented on HIVE-6492:


Also should let the test case pass in limit_partition_3.q 

set hive.compute.query.using.stats=true;
set hive.limit.query.max.table.partition=1;
select count(*) from part;

for it does not need a table scan. 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920579#comment-13920579
 ] 

Hive QA commented on HIVE-6492:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12632689/HIVE-6492.5.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5358 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1623/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1623/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12632689

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-03 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918793#comment-13918793
 ] 

Gunther Hagleitner commented on HIVE-6492:
--

[~selinazh] can you open a reviewboard request for this. I have a few more 
comments:

- Can you add a test for stats optimizer? I think since you're checking for 
explicit limit on fetch operator that would still bail (i.e.: select count(*) 
from foo with stats available and hive.compute.query.using.stats = true)
- Your patch only works in MR (since you're computing access at the physical 
level)
- We already have the pruned list of partitions available at the logical level

If you move your code to right after we call Optimizer.optimize in the 
SemanticAnalyzer you can make both cases work.

Logic should be:
- If there is a fetch operator at this level let it pass (no mapreduce job will 
be launched)
- Otherwise go through parse context's top ops and use opToPartPruner to find 
out how many partitions are going to be accessed.

Does that make sense?

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918886#comment-13918886
 ] 

Hive QA commented on HIVE-6492:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12632353/HIVE-6492.3.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5239 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1608/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1608/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12632353

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917363#comment-13917363
 ] 

Hive QA commented on HIVE-6492:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12631911/HIVE-6492.2.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5202 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1584/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1584/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12631911

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915648#comment-13915648
 ] 

Gunther Hagleitner commented on HIVE-6492:
--

[~selinazh]: There's a similar safety variable already present in hive: 
HiveConf.ConfVars.HIVEMAPREDMODE / hive.mapred.mode

When turned on it enforces that every query has a condition that prunes 
partitions from the table it's running against. It's not the same but very 
similar and might satisfy your requirements. The assumption is that if a user 
has added a pruning condition they have though about properly limiting the 
amount of data to be scanned. Does that work for you?



 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916310#comment-13916310
 ] 

Selina Zhang commented on HIVE-6492:


Strict mode disables types of queries we cannot disabled. We need:
1. enable queries on small table without partition filters;
2. select * from table issues from Tableau, because it is a must to enable 
Tableau connects Hive Server directly through ODBC driver;
3. Enable aggregation on partition keys without partition limits. 
Thanks for reviewing the changes!

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916358#comment-13916358
 ] 

Gunther Hagleitner commented on HIVE-6492:
--

Thanks, Selina. Just trying to understand the requirements to see what's the 
best way to get this in.

One question is whether you can deploy different configs in these scenarios. 
E.g: use a different site file is someone is starting hive on the console v 
tools. Or use an alias to add a --hiveconf on the node where users start hive. 
You're trying to protect the cluster from large jobs - in your case you seem to 
want to turn this on for certain interfaces and off for others, but for other 
deployments that might not make much sense (the interface (ODBC/JDBC/CLI) 
doesn't say if it's a human, tool, etc).

But specifically:

1) What's small? Sounds like if it's a query doesn't submit a job you want to 
let it go through? Or only if there's an explicit limit clause?
2) That's the same as 1 - if you just check for no job started
3) Aggregation on partition key right now will scan the entire table in a 
massive map-red job. Definitely something that should be fixed - but there's no 
optimization for that yet afaik. Allowing this query seems to defeat the 
purpose of the this flag doesn't it? Seems like again you just want to check 
for no job started.

With that - it would make sense to update/extend the hive.mapred.mode variable 
to allow for queries that don't actually start a job (and allow jobs only with 
explicit partition pruning). That change + different config for different 
interfaces you should get all that you want and would be simpler. Correct?

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916765#comment-13916765
 ] 

Selina Zhang commented on HIVE-6492:


The original patch actually has two tasks included:
1. limit the partition number when a table scan happens:
2. a hack to identify the query from Tableau and do special handling for it.
As we discussed, the second task is just a hack and probably it is not helpful 
if commit it to trunk. So I created a new patch which only contains the first 
task. 
The reason of introducing this configure variable is we want to limit the 
number of partitions when do table scan. As for metadata only query, since 
HIVE-1003 has the optimization on this type of query , the table scan is not a 
problem any more. 



 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-26 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913795#comment-13913795
 ] 

Selina Zhang commented on HIVE-6492:


It is not a rare case when a table has 1000+ partitions. To avoid people issue 
a query lack of knowledge how many partitions will be scanned, introducing one 
more configure variable hive.limit.query.max.table.partition will enable 
system admin to protect the grid. 

The default value is set to -1 which means no limit. 

This variable will be ignored in the following cases:
1. Simple fetch query with limit : 
select * from table limit n;
2. Metadata only query: 
select distinct partition_key from partition_table;

There is one special case: Sometimes BI tools such as Tableau (connected 
through ODBC driver) will issue 
   select * from table
at the initial stage to figure out table meta data. It will not hurt the grid 
because Tableau will cancel the query after it receives one or two rows. To 
allow Tableau still can work, code is added to mark the query client types such 
as CLIDriver and JDBC. And only allow ODBC-sourced query go through. 




 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)