Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-19 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
---

(Updated Feb. 19, 2020, 2:10 p.m.)


Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
---

- Main change is in TxnHandler::checkLock. 
- When all incoming requests are SHARED_READ, we can add a condition in the 
query to retrieve only relevant rows. This avoids significant number of rows 
fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of 
"SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the 
jumpttable. This condition can be optimised later.
- Also, removed the "HL_PARTITION IN" clause which could potentially 
overflow for oracle. Partition details can be filtered out, if the earlier 
query actually returned any rows.
- Rest of the changes, are related to refactoring 
"TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java a8b9653411 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/4/

Changes: https://reviews.apache.org/r/72129/diff/3-4/


Testing
---


File Attachments


HIVE-22850.5.patch
  
https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch


Thanks,

Rajesh Balamohan



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-14 Thread Denys Kuzmenko via Review Board


> On Feb. 13, 2020, 3:08 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 4546 (patched)
> > 
> >
> > Hi Rajesh, could you please explain, what is the reason of doing 
> > partition filtering on HMS side, not backend db?
> 
> Rajesh Balamohan wrote:
> By adding all the partition details, the query can become large and has 
> the issue of overflowing in the case of oracle (i,e have to batch with 1000 
> entries). Also, its incurs parsing in sql server side, as it is executed as 
> Statement. Given that we have additional filter now, it would make it a lot 
> simpler to do this in client side.  This was pointed out in the JIRA by Gopal.

Can we rewrite the query with JOIN operator? Somethinkg like:
https://issues.apache.org/jira/browse/HIVE-22888


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
---


On Feb. 14, 2020, 1:27 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> ---
> 
> (Updated Feb. 14, 2020, 1:27 a.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> - Main change is in TxnHandler::checkLock. 
> - When all incoming requests are SHARED_READ, we can add a condition in 
> the query to retrieve only relevant rows. This avoids significant number of 
> rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner 
> condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in 
> the jumpttable. This condition can be optimised later.
> - Also, removed the "HL_PARTITION IN" clause which could potentially 
> overflow for oracle. Partition details can be filtered out, if the earlier 
> query actually returned any rows.
> - Rest of the changes, are related to refactoring 
> "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java a8b9653411 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/3/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22850.5.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
---

(Updated Feb. 14, 2020, 1:27 a.m.)


Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Changes
---

- Removed refactoring changes to simplify changes in this patch.
- Building filter conditions in buildJumpTable(). SQL filters are built off 
these.


Repository: hive-git


Description
---

- Main change is in TxnHandler::checkLock. 
- When all incoming requests are SHARED_READ, we can add a condition in the 
query to retrieve only relevant rows. This avoids significant number of rows 
fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of 
"SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the 
jumpttable. This condition can be optimised later.
- Also, removed the "HL_PARTITION IN" clause which could potentially 
overflow for oracle. Partition details can be filtered out, if the earlier 
query actually returned any rows.
- Rest of the changes, are related to refactoring 
"TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java a8b9653411 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/3/

Changes: https://reviews.apache.org/r/72129/diff/2-3/


Testing
---


File Attachments


HIVE-22850.5.patch
  
https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch


Thanks,

Rajesh Balamohan



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Rajesh Balamohan


> On Feb. 13, 2020, 3:08 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 4546 (patched)
> > 
> >
> > Hi Rajesh, could you please explain, what is the reason of doing 
> > partition filtering on HMS side, not backend db?

By adding all the partition details, the query can become large and has the 
issue of overflowing in the case of oracle (i,e have to batch with 1000 
entries). Also, its incurs parsing in sql server side, as it is executed as 
Statement. Given that we have additional filter now, it would make it a lot 
simpler to do this in client side.  This was pointed out in the JIRA by Gopal.


- Rajesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
---


On Feb. 13, 2020, 1:22 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> ---
> 
> (Updated Feb. 13, 2020, 1:22 p.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> - Main change is in TxnHandler::checkLock. 
> - When all incoming requests are SHARED_READ, we can add a condition in 
> the query to retrieve only relevant rows. This avoids significant number of 
> rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner 
> condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in 
> the jumpttable. This condition can be optimised later.
> - Also, removed the "HL_PARTITION IN" clause which could potentially 
> overflow for oracle. Partition details can be filtered out, if the earlier 
> query actually returned any rows.
> - Rest of the changes, are related to refactoring 
> "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/2/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22850.5.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 4546 (patched)


Hi Rajesh, could you please explain, what is the reason of doing partition 
filtering on HMS side, not backend db?


- Denys Kuzmenko


On Feb. 13, 2020, 1:22 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> ---
> 
> (Updated Feb. 13, 2020, 1:22 p.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> - Main change is in TxnHandler::checkLock. 
> - When all incoming requests are SHARED_READ, we can add a condition in 
> the query to retrieve only relevant rows. This avoids significant number of 
> rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner 
> condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in 
> the jumpttable. This condition can be optimised later.
> - Also, removed the "HL_PARTITION IN" clause which could potentially 
> overflow for oracle. Partition details can be filtered out, if the earlier 
> query actually returned any rows.
> - Rest of the changes, are related to refactoring 
> "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/2/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22850.5.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
---

(Updated Feb. 13, 2020, 1:22 p.m.)


Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
---

- Main change is in TxnHandler::checkLock. 
- When all incoming requests are SHARED_READ, we can add a condition in the 
query to retrieve only relevant rows. This avoids significant number of rows 
fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of 
"SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the 
jumpttable. This condition can be optimised later.
- Also, removed the "HL_PARTITION IN" clause which could potentially 
overflow for oracle. Partition details can be filtered out, if the earlier 
query actually returned any rows.
- Rest of the changes, are related to refactoring 
"TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs (updated)
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/2/

Changes: https://reviews.apache.org/r/72129/diff/1-2/


Testing
---


File Attachments (updated)


HIVE-22850.5.patch
  
https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch


Thanks,

Rajesh Balamohan



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219572
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 4546 (patched)


We shouldn't remove NULL partitions (lock on db/table level).


- Denys Kuzmenko


On Feb. 13, 2020, 11:40 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> ---
> 
> (Updated Feb. 13, 2020, 11:40 a.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> - Main change is in TxnHandler::checkLock. 
> - When all incoming requests are SHARED_READ, we can add a condition in 
> the query to retrieve only relevant rows. This avoids significant number of 
> rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner 
> condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in 
> the jumpttable. This condition can be optimised later.
> - Also, removed the "HL_PARTITION IN" clause which could potentially 
> overflow for oracle. Partition details can be filtered out, if the earlier 
> query actually returned any rows.
> - Rest of the changes, are related to refactoring 
> "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
---

Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
---

- Main change is in TxnHandler::checkLock. 
- When all incoming requests are SHARED_READ, we can add a condition in the 
query to retrieve only relevant rows. This avoids significant number of rows 
fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of 
"SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the 
jumpttable. This condition can be optimised later.
- Also, removed the "HL_PARTITION IN" clause which could potentially 
overflow for oracle. Partition details can be filtered out, if the earlier 
query actually returned any rows.
- Rest of the changes, are related to refactoring 
"TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/1/


Testing
---


Thanks,

Rajesh Balamohan