[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332473#comment-17332473 ] Denys Kuzmenko commented on HIVE-21354: --- Closing as duplicate. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.2.0, 4.0.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083243#comment-17083243 ] David Mollitor commented on HIVE-21354: --- [~pvary] {code:none} _ _ | )/ ) \\ |//,' __ (")(_)-"()))=- (\\ _ _ HEELP ( | / ) \\ \|/,' __ \_o_/ (")(_)-"()))=- ) <\\ /\__ _ \ {code} > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083169#comment-17083169 ] Peter Vary commented on HIVE-21354: --- [~belugabehr]: I fear what we found here is a hornet nest. Consider: {code} 0: jdbc:hive2://localhost:10003> explain locks insert into acid_part select * from acid_part where j=1; +---+ |Explain| +---+ | LOCK INFORMATION: | | default.acid_part -> SHARED_READ | | default.acid_part.j=1 -> SHARED_READ | | default.acid_part -> SHARED_READ | +---+ {code} The "first" table level lock is not needed (the source is the read which only reads {{j=1}} partition), but the "second" table level lock is needed (the source is the dynamic partitioning write) :) So I would create another jira to rationalize which locks are needed, and which locks are not needed. In this jira we should concentrate on the final filtering / escalation of the locks, with the steps you already suggested: * Remove partition locks, if we have a same type table level lock * Remove partition locks, and replace them with table level lock with the same type, if the number of locks with this type is higher than the configured (hive.lock.escalation.num) value. Your thoughts [~belugabehr], [~dkuzmenko]? Thanks, Peter > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082966#comment-17082966 ] Denys Kuzmenko commented on HIVE-21354: --- [~belugabehr], [~pvary], checkLock searches stuff hierarchically, so if you have conflicting locks on any level - it's gonna backoff and try later. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082586#comment-17082586 ] Peter Vary commented on HIVE-21354: --- HIVE-22888 > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082578#comment-17082578 ] David Mollitor commented on HIVE-21354: --- bq. So it all comes down if the lock check does exact matches, or it checks stuff hierarchically. Yes. Exactly :) I think we are just both guessing on which one is employed. I will need to dig in to figure it out, unless you can point me at the code that does this implicit locking check. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082571#comment-17082571 ] Peter Vary commented on HIVE-21354: --- [~belugabehr]: AFAIK checkLock checks locks hierarchically. So it prevents aquiring table level exclusive lock if a partition level exclusive lock is already acquired by another query. Explicitly mentioning table level shared lock like you suggested in your comment for drop partition is philosophical question IMHO if the handling of the partition level lock already prevents exclusive lock on a table anyway. With the current implementation the performance is better with fewer number of locks. Having extra table level lock when querying from a single partition is a bug if the table level lock prevents dropping other partitions, which is not a desired behaviour. So it all comes down if the lock check does exact matches, or it checks stuff hierarchically. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082505#comment-17082505 ] David Mollitor commented on HIVE-21354: --- ... something like: {code:none} explain locks alter table web_logs drop partition(`date`='2015-11-18') LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> EXCLUSIVE {code} > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082501#comment-17082501 ] David Mollitor commented on HIVE-21354: --- [~pvary] I do not think that Hive has any logic that says "if a partition of a table is locked, then the table is locked." I think it does this this a a simple way... it comes up with a list of all the required locks and the first one is always the table lock, the rest are the required partitions. That is to say, it takes an explicit lock on the table,... there is no logic for an implicit table lock: {code:none} EXPLAIN LOCKS SELECT * FROM web_logs; LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ default.web_logs.date=2015-11-19 -> SHARED_READ default.web_logs.date=2015-11-20 -> SHARED_READ default.web_logs.date=2015-11-21 -> SHARED_READ > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082440#comment-17082440 ] Peter Vary commented on HIVE-21354: --- [~belugabehr]: My understanding is that having a lock on a partition automatically locks the table (even without further, table specific lock), and having a lock on a table prevents further conflicting locks on any partition of the given table by a different query. This means that the extra table level lock is not only unnecessary, but it also prevents parallelism which should be allowed. (DROP PARTITION for p1, INSERT INTO p2) The question is whether this code is used for legacy locks as well which might have different logic. (Most probably not) > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082362#comment-17082362 ] David Mollitor commented on HIVE-21354: --- [~pvary] I'm not sure on the exact relationship between table and partition. I believe they overlap in some meta data, but maybe not all? There might be an issue of: * Client 1: Read partition 'a' * Client 2: Change the table-level meta data * Client 1: Read partition 'b' .. but I don't know Doing a 'DROP' makes sense to lock just the partition... whatever the meta data change might be is irrelevant because... well,... it's going to be dropped anyway. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080592#comment-17080592 ] Peter Vary commented on HIVE-21354: --- [~belugabehr]: I do not get it. Even stranger: {code} explain locks select * from web_logs where `date`='2015-11-18' Explain LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ {code} Seems like the assumption is that we only check for the exact matches on locks. We should double check that we really prevent getting a shared lock on a partition if some other query has an exclusive lock on the table. [~dkuzmenko] can help us here :) Just a fun fact: {code} explain locks alter table web_logs drop partition(`date`='2015-11-18') LOCK INFORMATION: default.web_logs.date=2015-11-18 -> EXCLUSIVE {code} This might merit another Jira (or do it here?): do not request unnecessary locks (why do we request full table lock with a select?). In the current state we would prevent dropping a partition even if it is not used in the query. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080512#comment-17080512 ] David Mollitor commented on HIVE-21354: --- [~pvary] Since the queries are always taking the table lock... why do they also take the partition locks? > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080510#comment-17080510 ] David Mollitor commented on HIVE-21354: --- Hey [~pvary], I'll take a crack at it. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079445#comment-17079445 ] Peter Vary commented on HIVE-21354: --- Same for 4.0: {code:java} 0: jdbc:hive2://localhost:10003> explain locks select * from acid_part; ++ |Explain | ++ | LOCK INFORMATION: | | default.acid_part -> SHARED_READ | | default.acid_part.j=1 -> SHARED_READ | | default.acid_part.j=10 -> SHARED_READ | | default.acid_part.j=2 -> SHARED_READ | ++ {code} I think it would be worth to add a new configuration value for the maximum number of partition level locks (hive.lock.escalation.num?). So if the number of locks is above this level then we should request a table level lock instead of partition level lock. Like: * -1 to turn off lock escalation (default, as this is the backward compatible solution) * 1 to prevent using partition level locks This configuration should be changed by the user on session level, so if there is a long query where it is important to allow as much concurrency as possible then the user can set it to -1, and if the session is used for fast queries where the latency is more important, then use 1 instead. The easiest place to implement it would be {{AcidUtils.makeLockComponents}}. [~belugabehr]: Do you plan to work on this? Thanks, Peter > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079388#comment-17079388 ] Peter Vary commented on HIVE-21354: --- I have found this in Hive 3.1: {code:java} EXPLAIN LOCKS SELECT * FROM web_logs; LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ default.web_logs.date=2015-11-19 -> SHARED_READ default.web_logs.date=2015-11-20 -> SHARED_READ default.web_logs.date=2015-11-21 -> SHARED_READ {code} [~gopalv], [~belugabehr]: Are you aware of any change in 4.0 which changes this, but not backported to 3.1? Thanks, Peter > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781165#comment-16781165 ] Gopal V commented on HIVE-21354: ACID is locked hierarchically - you can probably run "EXPLAIN LOCKS" on latest build? > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: BELUGA BEHR >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781159#comment-16781159 ] BELUGA BEHR commented on HIVE-21354: Thanks for the input [~gopalv]. What about just a simple {{SELECT * FROM TABLE WHERE (non-partitioned-value)=?}} > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: BELUGA BEHR >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781140#comment-16781140 ] Gopal V commented on HIVE-21354: bq. Does ACIDv2 apply to Parquet, Avro, JSON, etc? Yes, it does - you can't UPDATE rows in those formats, but you still get atomic "insert overwrite" across multiple partitions. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: BELUGA BEHR >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780821#comment-16780821 ] BELUGA BEHR commented on HIVE-21354: bq. This is only true when you disable ACID So does this only apply to ORC tables? Does ACIDv2 apply to Parquet, Avro, JSON, etc? > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: BELUGA BEHR >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780731#comment-16780731 ] Gopal V commented on HIVE-21354: bq. One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. This is only true when you disable ACID - enabling ACIDv2 solved a number of those issues (i.e the heartbeating mechanism is for the txn-id etc). The ACID MERGE will lock the entire table, but one partition by partition - this is easy enough to fix in ACID than in ZK. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: BELUGA BEHR >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)