Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
---

(Updated Feb. 14, 2020, 1:27 a.m.)


Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Changes
---

- Removed refactoring changes to simplify changes in this patch.
- Building filter conditions in buildJumpTable(). SQL filters are built off 
these.


Repository: hive-git


Description
---

- Main change is in TxnHandler::checkLock. 
- When all incoming requests are SHARED_READ, we can add a condition in the 
query to retrieve only relevant rows. This avoids significant number of rows 
fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of 
"SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the 
jumpttable. This condition can be optimised later.
- Also, removed the "HL_PARTITION IN" clause which could potentially 
overflow for oracle. Partition details can be filtered out, if the earlier 
query actually returned any rows.
- Rest of the changes, are related to refactoring 
"TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java a8b9653411 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/3/

Changes: https://reviews.apache.org/r/72129/diff/2-3/


Testing
---


File Attachments


HIVE-22850.5.patch
  
https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch


Thanks,

Rajesh Balamohan



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Rajesh Balamohan


> On Feb. 13, 2020, 3:08 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 4546 (patched)
> > 
> >
> > Hi Rajesh, could you please explain, what is the reason of doing 
> > partition filtering on HMS side, not backend db?

By adding all the partition details, the query can become large and has the 
issue of overflowing in the case of oracle (i,e have to batch with 1000 
entries). Also, its incurs parsing in sql server side, as it is executed as 
Statement. Given that we have additional filter now, it would make it a lot 
simpler to do this in client side.  This was pointed out in the JIRA by Gopal.


- Rajesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
---


On Feb. 13, 2020, 1:22 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> ---
> 
> (Updated Feb. 13, 2020, 1:22 p.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> - Main change is in TxnHandler::checkLock. 
> - When all incoming requests are SHARED_READ, we can add a condition in 
> the query to retrieve only relevant rows. This avoids significant number of 
> rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner 
> condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in 
> the jumpttable. This condition can be optimised later.
> - Also, removed the "HL_PARTITION IN" clause which could potentially 
> overflow for oracle. Partition details can be filtered out, if the earlier 
> query actually returned any rows.
> - Rest of the changes, are related to refactoring 
> "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/2/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22850.5.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



[DISCUSS] Moving Hive site from Apache CMS/SVN to Git

2020-02-13 Thread Roy Lenferink
Hello Hive community,

I noticed Hive is still serving its site from SVN with help of the Apache CMS. 
At the Apache Celix 
project we decided to move away from the CMS and move towards git for serving 
our website [1]. 
This because the CMS was introduced in 2010 and is currently not accepting any 
new projects. With 
the move to Hugo we're sure that we still can update the site when the ASF CMS 
will be 
decommissioned.

The master branch contains the actual website sources whereas the generated 
website is served 
from the asf-site branch. Hugo is being used as static website generator. 
Content is written in 
Markdown.

I had a look at Hive its site and it was quite easy to migrate Hive its site to 
use Hugo as well. An 
example of Hugo its output is (for demo purposes only) available on [2]. These 
are the existing
markdown files with a slightly changed header.

Moving to git can enlarge the visibility of how projects are functioning. Next 
to that, new contributors 
can simply create a pull request against the website repository if they find 
anything they want to 
improve. An 'Edit on GitHub' button can be added as well to the site, e.g. an 
example of a page on
the Celix website [3].

If the community is interested in this move what I propose:
- Someone from the PMC to request a new git repository for the website (e.g. 
hive-site) via [4]
- Me creating the pull request from the repository I am temporarily using for 
the website contents [5] 
to the official hive-site repository
- Create a Jenkins job to automatically build the site after changes happen on 
the master branch.
- When the pull request is reviewed and merged ask INFRA to move over from the 
current svnpubsub
to the gitpubsub approach and remove MINA from the Apache CMS.

Next to that Hive is also having javadocs which are only part of the production 
site (won't be included
when people clone the Hive website SVN repo). These can still be served from 
e.g. a release-docs
branch. In the README we can then mention that the hive-site repo needs to be 
cloned using the
--single-branch option. An alternative would be serving the javadocs from a 
separate repository (e.g.
hive-release-docs).

I'd like to hear everyone's opinion on this :)

Best regards,
Roy

[1] https://github.com/apache/celix-site
[2] http://hive.roylenferink.nl/
[3] http://celix.apache.org/contributing/releasing.html
[4] https://gitbox.apache.org/setup/newrepo.html
[5] https://github.com/rlenferink/hive-site



Re: Review Request 72113: DML execution on TEZ always outputs the message 'No rows affected'

2020-02-13 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72113/
---

(Updated Feb. 13, 2020, 3:40 p.m.)


Review request for hive, Laszlo Bodor, Mustafa Iman, Panos Garefalakis, and 
Ramesh Kumar Thangarajan.


Bugs: HIVE-22870
https://issues.apache.org/jira/browse/HIVE-22870


Repository: hive-git


Description
---

Executing an update or insert statement in beeline doesn't show the actual rows 
inserted/updated.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 25dd970a9b1 
  ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 9c5695ae603 
  ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out f9b5f8f0d4d 
  ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 9ad0a9b7faf 
  ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
3e99e0ee627 
  ql/src/test/results/clientpositive/llap/retry_failure_reorder.q.out 
baeac434d79 
  ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 885cb0a9cba 


Diff: https://reviews.apache.org/r/72113/diff/2/

Changes: https://reviews.apache.org/r/72113/diff/1-2/


Testing
---

with insert and updates


Thanks,

Attila Magyar



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 4546 (patched)


Hi Rajesh, could you please explain, what is the reason of doing partition 
filtering on HMS side, not backend db?


- Denys Kuzmenko


On Feb. 13, 2020, 1:22 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> ---
> 
> (Updated Feb. 13, 2020, 1:22 p.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> - Main change is in TxnHandler::checkLock. 
> - When all incoming requests are SHARED_READ, we can add a condition in 
> the query to retrieve only relevant rows. This avoids significant number of 
> rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner 
> condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in 
> the jumpttable. This condition can be optimised later.
> - Also, removed the "HL_PARTITION IN" clause which could potentially 
> overflow for oracle. Partition details can be filtered out, if the earlier 
> query actually returned any rows.
> - Rest of the changes, are related to refactoring 
> "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/2/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22850.5.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



[jira] [Created] (HIVE-22888) Rewrite checkLock inner select with JOIN operator

2020-02-13 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-22888:
-

 Summary: Rewrite checkLock inner select with JOIN operator
 Key: HIVE-22888
 URL: https://issues.apache.org/jira/browse/HIVE-22888
 Project: Hive
  Issue Type: Improvement
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
---

(Updated Feb. 13, 2020, 1:22 p.m.)


Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
---

- Main change is in TxnHandler::checkLock. 
- When all incoming requests are SHARED_READ, we can add a condition in the 
query to retrieve only relevant rows. This avoids significant number of rows 
fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of 
"SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the 
jumpttable. This condition can be optimised later.
- Also, removed the "HL_PARTITION IN" clause which could potentially 
overflow for oracle. Partition details can be filtered out, if the earlier 
query actually returned any rows.
- Rest of the changes, are related to refactoring 
"TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs (updated)
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/2/

Changes: https://reviews.apache.org/r/72129/diff/1-2/


Testing
---


File Attachments (updated)


HIVE-22850.5.patch
  
https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch


Thanks,

Rajesh Balamohan



Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219572
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 4546 (patched)


We shouldn't remove NULL partitions (lock on db/table level).


- Denys Kuzmenko


On Feb. 13, 2020, 11:40 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> ---
> 
> (Updated Feb. 13, 2020, 11:40 a.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> - Main change is in TxnHandler::checkLock. 
> - When all incoming requests are SHARED_READ, we can add a condition in 
> the query to retrieve only relevant rows. This avoids significant number of 
> rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner 
> condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in 
> the jumpttable. This condition can be optimised later.
> - Also, removed the "HL_PARTITION IN" clause which could potentially 
> overflow for oracle. Partition details can be filtered out, if the earlier 
> query actually returned any rows.
> - Rest of the changes, are related to refactoring 
> "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



[jira] [Created] (HIVE-22887) MR Job cannot set custom OutputCommitter during job execution.

2020-02-13 Thread Renukaprasad C (Jira)
Renukaprasad C created HIVE-22887:
-

 Summary: MR Job cannot set custom OutputCommitter during job 
execution.
 Key: HIVE-22887
 URL: https://issues.apache.org/jira/browse/HIVE-22887
 Project: Hive
  Issue Type: Bug
Reporter: Renukaprasad C
Assignee: Renukaprasad C


MapRedTask set jobs OutputCommitter to NullOutputCommitter.
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.prepareJobOutput(JobConf) this 
 sets the OutputCommitter to NullOutputCommitter always.
conf.setOutputCommitter(NullOutputCommitter.class);

There are some cases where we need to customize the OutputCommitter. 
Like on SUCCESS, they need to write some file or do some custom operation in 
theie own OutputCommitter
If someone wants to provide their own OutputCommitter then its not possible in 
the current implementation.

Related issues: MR - MAPREDUCE-1802, Hive - HIVE-1355.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-13 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
---

Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
---

- Main change is in TxnHandler::checkLock. 
- When all incoming requests are SHARED_READ, we can add a condition in the 
query to retrieve only relevant rows. This avoids significant number of rows 
fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of 
"SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the 
jumpttable. This condition can be optimised later.
- Also, removed the "HL_PARTITION IN" clause which could potentially 
overflow for oracle. Partition details can be filtered out, if the earlier 
query actually returned any rows.
- Rest of the changes, are related to refactoring 
"TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/1/


Testing
---


Thanks,

Rajesh Balamohan