[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2021-04-26 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332473#comment-17332473
 ] 

Denys Kuzmenko commented on HIVE-21354:
---

Closing as duplicate.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.2.0, 4.0.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-14 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083243#comment-17083243
 ] 

David Mollitor commented on HIVE-21354:
---

[~pvary]

{code:none}
 _  _
| )/ )
 \\ |//,' __
 (")(_)-"()))=-
(\\
 _   _
  HEELP ( | / )
  \\ \|/,' __
\_o_/ (")(_)-"()))=-
   ) <\\
  /\__
_ \ 
{code} 

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-14 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083169#comment-17083169
 ] 

Peter Vary commented on HIVE-21354:
---

[~belugabehr]: I fear what we found here is a hornet nest. Consider:
{code}
0: jdbc:hive2://localhost:10003> explain locks insert into acid_part select * 
from acid_part where j=1;

+---+
|Explain|
+---+
| LOCK INFORMATION: |
| default.acid_part -> SHARED_READ  |
| default.acid_part.j=1 -> SHARED_READ  |
| default.acid_part -> SHARED_READ  |
+---+
{code}
The "first" table level lock is not needed (the source is the read which only 
reads {{j=1}} partition), but the "second" table level lock is needed (the 
source is the dynamic partitioning write) :) So I would create another jira to 
rationalize which locks are needed, and which locks are not needed.

In this jira we should concentrate on the final filtering / escalation of the 
locks, with the steps you already suggested:
* Remove partition locks, if we have a same type table level lock
* Remove partition locks, and replace them with table level lock with the same 
type, if the number of locks with this type is higher than the configured 
(hive.lock.escalation.num) value.

Your thoughts [~belugabehr], [~dkuzmenko]?

Thanks,
Peter

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-14 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082966#comment-17082966
 ] 

Denys Kuzmenko commented on HIVE-21354:
---

[~belugabehr], [~pvary], checkLock searches stuff hierarchically, so if you 
have conflicting locks on any level - it's gonna backoff and try later. 

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082586#comment-17082586
 ] 

Peter Vary commented on HIVE-21354:
---

HIVE-22888

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082578#comment-17082578
 ] 

David Mollitor commented on HIVE-21354:
---

bq. So it all comes down if the lock check does exact matches, or it checks 
stuff hierarchically.

Yes. Exactly :)

I think we are just both guessing on which one is employed.  I will need to dig 
in to figure it out, unless you can point me at the code that does this 
implicit locking check.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082571#comment-17082571
 ] 

Peter Vary commented on HIVE-21354:
---

[~belugabehr]: AFAIK checkLock checks locks hierarchically. So it prevents 
aquiring table level exclusive lock if a partition level exclusive lock is 
already acquired by another query.

Explicitly mentioning table level shared lock like you suggested in your 
comment for drop partition is philosophical question IMHO if the handling of 
the partition level lock already prevents exclusive lock on a table anyway. 
With the current implementation the performance is better with fewer number of 
locks.
Having extra table level lock when querying from a single partition is a bug if 
the table level lock prevents dropping other partitions, which is not a desired 
behaviour.

So it all comes down if the lock check does exact matches, or it checks stuff 
hierarchically.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082505#comment-17082505
 ] 

David Mollitor commented on HIVE-21354:
---

... something like:

{code:none}
explain locks alter table web_logs drop partition(`date`='2015-11-18')

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> EXCLUSIVE
{code}

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082501#comment-17082501
 ] 

David Mollitor commented on HIVE-21354:
---

[~pvary] I do not think that Hive has any logic that says "if a partition of a 
table is locked, then the table is locked."  I think it does this this a a 
simple way... it comes up with a list of all the required locks and the first 
one is always the table lock, the rest are the required partitions.  That is to 
say, it takes an explicit lock on the table,... there is no logic for an 
implicit table lock:

{code:none}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082440#comment-17082440
 ] 

Peter Vary commented on HIVE-21354:
---

[~belugabehr]: My understanding is that having a lock on a partition 
automatically locks the table (even without further, table specific lock), and 
having a lock on a table prevents further conflicting locks on any partition of 
the given table by a different query.
This means that the extra table level lock is not only unnecessary, but it also 
prevents parallelism which should be allowed. (DROP PARTITION for p1, INSERT 
INTO p2)
The question is whether this code is used for legacy locks as well which might 
have different logic. (Most probably not)

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-13 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082362#comment-17082362
 ] 

David Mollitor commented on HIVE-21354:
---

[~pvary] I'm not sure on the exact relationship between table and partition.  I 
believe they overlap in some meta data, but maybe not all?  There might be an 
issue of:

* Client 1: Read partition 'a'
* Client 2: Change the table-level meta data
* Client 1: Read partition 'b'

.. but I don't know


Doing a 'DROP' makes sense to lock just the partition... whatever the meta data 
change might be is irrelevant because... well,... it's going to be dropped 
anyway.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-10 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080592#comment-17080592
 ] 

Peter Vary commented on HIVE-21354:
---

[~belugabehr]: I do not get it. Even stranger:
{code}
explain locks select * from web_logs where `date`='2015-11-18'

Explain
LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
{code}

Seems like the assumption is that we only check for the exact matches on locks.
We should double check that we really prevent getting a shared lock on a 
partition if some other query has an exclusive lock on the table. [~dkuzmenko] 
can help us here :)

Just a fun fact:
{code}
explain locks alter table web_logs drop partition(`date`='2015-11-18')

LOCK INFORMATION:
default.web_logs.date=2015-11-18 -> EXCLUSIVE
{code}

This might merit another Jira (or do it here?): do not request unnecessary 
locks (why do we request full table lock with a select?). In the current state 
we would prevent dropping a partition even if it is not used in the query.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080512#comment-17080512
 ] 

David Mollitor commented on HIVE-21354:
---

[~pvary] Since the queries are always taking the table lock... why do they also 
take the partition locks?

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-10 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080510#comment-17080510
 ] 

David Mollitor commented on HIVE-21354:
---

Hey [~pvary], I'll take a crack at it.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-09 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079445#comment-17079445
 ] 

Peter Vary commented on HIVE-21354:
---

Same for 4.0:
{code:java}
0: jdbc:hive2://localhost:10003> explain locks select * from acid_part;
++
|Explain |
++
| LOCK INFORMATION:  |
| default.acid_part -> SHARED_READ   |
| default.acid_part.j=1 -> SHARED_READ   |
| default.acid_part.j=10 -> SHARED_READ  |
| default.acid_part.j=2 -> SHARED_READ   |
++ {code}
I think it would be worth to add a new configuration value for the maximum 
number of partition level locks (hive.lock.escalation.num?). So if the number 
of locks is above this level then we should request a table level lock instead 
of partition level lock. Like:
* -1 to turn off lock escalation (default, as this is the backward compatible 
solution)
* 1 to prevent using partition level locks

This configuration should be changed by the user on session level, so if there 
is a long query where it is important to allow as much concurrency as possible 
then the user can set it to -1, and if the session is used for fast queries 
where the latency is more important, then use 1 instead.

The easiest place to implement it would be {{AcidUtils.makeLockComponents}}.

[~belugabehr]: Do you plan to work on this?

Thanks,
Peter
 

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2020-04-09 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079388#comment-17079388
 ] 

Peter Vary commented on HIVE-21354:
---

I have found this in Hive 3.1:
{code:java}
EXPLAIN LOCKS SELECT * FROM web_logs;

LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
default.web_logs.date=2015-11-19 -> SHARED_READ
default.web_logs.date=2015-11-20 -> SHARED_READ
default.web_logs.date=2015-11-21 -> SHARED_READ
{code}
[~gopalv], [~belugabehr]: Are you aware of any change in 4.0 which changes 
this, but not backported to 3.1?

Thanks,

Peter

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2019-02-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781165#comment-16781165
 ] 

Gopal V commented on HIVE-21354:


ACID is locked hierarchically - you can probably run "EXPLAIN LOCKS" on latest 
build?

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2019-02-28 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781159#comment-16781159
 ] 

BELUGA BEHR commented on HIVE-21354:


Thanks for the input [~gopalv].

What about just a simple {{SELECT * FROM TABLE WHERE (non-partitioned-value)=?}}

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2019-02-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781140#comment-16781140
 ] 

Gopal V commented on HIVE-21354:


bq.  Does ACIDv2 apply to Parquet, Avro, JSON, etc?

Yes, it does - you can't UPDATE rows in those formats, but you still get atomic 
"insert overwrite" across multiple partitions.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2019-02-28 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780821#comment-16780821
 ] 

BELUGA BEHR commented on HIVE-21354:


bq. This is only true when you disable ACID

So does this only apply to ORC tables?  Does ACIDv2 apply to Parquet, Avro, 
JSON, etc?

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2019-02-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780731#comment-16780731
 ] 

Gopal V commented on HIVE-21354:


bq. One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.

This is only true when you disable ACID - enabling ACIDv2 solved a number of 
those issues (i.e the heartbeating mechanism is for the txn-id etc).

The ACID MERGE will lock the entire table, but one partition by partition - 
this is easy enough to fix in ACID than in ZK.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)