date:20201204

[jira] [Work logged] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24491?focusedWorklogId=520434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520434
 ]

ASF GitHub Bot logged work on HIVE-24491:
-

Author: ASF GitHub Bot
Created on: 05/Dec/20 05:58
Start Date: 05/Dec/20 05:58
Worklog Time Spent: 10m 
  Work Description: rajkrrsingh opened a new pull request #1746:
URL: https://github.com/apache/hive/pull/1746


   
   
   ### What changes were proposed in this pull request?
   HIVE-23026 add capability to set tez.job.name but it's not effective if tez 
session pool manager is configured or tez session reuse. with this change 
whenever user set tez.job.name, hive will force a new Tez Session instead using 
the one from the pool.
   
   
   ### Why are the changes needed?
   without this change setting tez.job.name is not effective if tez session is 
reuse.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   on local reproduce cluster.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520434)
Remaining Estimate: 0h
Time Spent: 10m

> setting custom job name is ineffective if the tez session pool is configured 
> or in case of session reuse.
> -
>
> Key: HIVE-24491
> URL: https://issues.apache.org/jira/browse/HIVE-24491
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23026 add capability to set tez.job.name but it's not effective if tez 
> session pool manager is configured or tez session reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24491:
--
Labels: pull-request-available  (was: )

> setting custom job name is ineffective if the tez session pool is configured 
> or in case of session reuse.
> -
>
> Key: HIVE-24491
> URL: https://issues.apache.org/jira/browse/HIVE-24491
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23026 add capability to set tez.job.name but it's not effective if tez 
> session pool manager is configured or tez session reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.

2020-12-04 Thread Rajkumar Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh reassigned HIVE-24491:
-


> setting custom job name is ineffective if the tez session pool is configured 
> or in case of session reuse.
> -
>
> Key: HIVE-24491
> URL: https://issues.apache.org/jira/browse/HIVE-24491
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>
> HIVE-23026 add capability to set tez.job.name but it's not effective if tez 
> session pool manager is configured or tez session reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24228) Support complex types in LLAP

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24228?focusedWorklogId=520389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520389
 ]

ASF GitHub Bot logged work on HIVE-24228:
-

Author: ASF GitHub Bot
Created on: 05/Dec/20 00:48
Start Date: 05/Dec/20 00:48
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1551:
URL: https://github.com/apache/hive/pull/1551#issuecomment-739093971


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520389)
Time Spent: 20m  (was: 10m)

> Support complex types in LLAP
> -
>
> Key: HIVE-24228
> URL: https://issues.apache.org/jira/browse/HIVE-24228
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The idea of this improvement is to support complex types (arrays, maps, 
> structs) returned from LLAP data reader. This is useful when consuming LLAP 
> data later in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24204) LLAP: Invalid TEZ Job token in multi fragment query

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24204?focusedWorklogId=520390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520390
 ]

ASF GitHub Bot logged work on HIVE-24204:
-

Author: ASF GitHub Bot
Created on: 05/Dec/20 00:48
Start Date: 05/Dec/20 00:48
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1530:
URL: https://github.com/apache/hive/pull/1530


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520390)
Time Spent: 0.5h  (was: 20m)

> LLAP: Invalid TEZ Job token in multi fragment query
> ---
>
> Key: HIVE-24204
> URL: https://issues.apache.org/jira/browse/HIVE-24204
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.3.0
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When using LLAP server in the Kerberized environment and submitting the query 
> via LLAP client that is planned as multi fragment (multiple splits), the 
> following error occurs and the query fails:
> org.apache.hadoop.ipc.Server: javax.security.sasl.SaslException: DIGEST-MD5: 
> digest response format violation. Mismatched response.
> This occurs because each split uses its own connection to LLAP server and its 
> own TEZ job token while LLAP server stores only one token binding it to the 
> whole query and not the separate fragment. When LLAP server communicates to 
> the clients and uses the stored token, this causes Sasl exception due to 
> using invalid token.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23737?focusedWorklogId=520391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520391
 ]

ASF GitHub Bot logged work on HIVE-23737:
-

Author: ASF GitHub Bot
Created on: 05/Dec/20 00:48
Start Date: 05/Dec/20 00:48
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1195:
URL: https://github.com/apache/hive/pull/1195


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520391)
Time Spent: 1h 40m  (was: 1.5h)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-12930) Support SSL Shuffle for LLAP

2020-12-04 Thread Krishnadas (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244365#comment-17244365
 ] 

Krishnadas commented on HIVE-12930:
---

Any updates on this one?

> Support SSL Shuffle for LLAP
> 
>
> Key: HIVE-12930
> URL: https://issues.apache.org/jira/browse/HIVE-12930
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table

2020-12-04 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24489.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL 
> metastore table
> --
>
> Key: HIVE-24489
> URL: https://issues.apache.org/jira/browse/HIVE-24489
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failures can be seen here:
> [http://ci.hive.apache.org/job/hive-precommit/job/master/373/]
> The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
> {noformat}
> Caused by: MetaException(message:Unable to select from transaction database 
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
> constraint "min_history_level_pkey"
>  Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat}
> The content of the respective table inside the docker image is shown below.
> {noformat}
> SELECT * FROM "MIN_HISTORY_LEVEL" ;
>  MHL_TXNID | MHL_MIN_OPEN_TXNID 
>  --+---
>  6853 | 6687
>  7480 | 6947
>  7481 | 6947
>  6870 | 6687
>  7858 | 7858
>  6646 | 5946
>  7397 | 6947
>  7399 | 6947
>  5946 | 5946
>  6947 | 6947
>  7769 | 6947{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24489?focusedWorklogId=520377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520377
 ]

ASF GitHub Bot logged work on HIVE-24489:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 23:52
Start Date: 04/Dec/20 23:52
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1745:
URL: https://github.com/apache/hive/pull/1745


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520377)
Time Spent: 20m  (was: 10m)

> TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL 
> metastore table
> --
>
> Key: HIVE-24489
> URL: https://issues.apache.org/jira/browse/HIVE-24489
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failures can be seen here:
> [http://ci.hive.apache.org/job/hive-precommit/job/master/373/]
> The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
> {noformat}
> Caused by: MetaException(message:Unable to select from transaction database 
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
> constraint "min_history_level_pkey"
>  Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat}
> The content of the respective table inside the docker image is shown below.
> {noformat}
> SELECT * FROM "MIN_HISTORY_LEVEL" ;
>  MHL_TXNID | MHL_MIN_OPEN_TXNID 
>  --+---
>  6853 | 6687
>  7480 | 6947
>  7481 | 6947
>  6870 | 6687
>  7858 | 7858
>  6646 | 5946
>  7397 | 6947
>  7399 | 6947
>  5946 | 5946
>  6947 | 6947
>  7769 | 6947{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24489?focusedWorklogId=520340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520340
 ]

ASF GitHub Bot logged work on HIVE-24489:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 20:59
Start Date: 04/Dec/20 20:59
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #1745:
URL: https://github.com/apache/hive/pull/1745


   ### What changes were proposed in this pull request?
   Update docker metastore version to zabetak/postgres-tpcds-metastore:1.3
   on which some cleanup was performed to remove stale entries in various
   tables including MIN_HISTORY_LEVEL.
   
   See 
https://github.com/zabetak/hive-postgres-metastore/commit/04be8bd2e9400d7ae604a4112aaa4531dd36
   
   ### Why are the changes needed?
   To fix test failures while running `TestTezTPCDS30TBPerfCliDriver` tests. 
More details in the JIRA.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   `mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520340)
Remaining Estimate: 0h
Time Spent: 10m

> TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL 
> metastore table
> --
>
> Key: HIVE-24489
> URL: https://issues.apache.org/jira/browse/HIVE-24489
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The failures can be seen here:
> [http://ci.hive.apache.org/job/hive-precommit/job/master/373/]
> The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
> {noformat}
> Caused by: MetaException(message:Unable to select from transaction database 
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
> constraint "min_history_level_pkey"
>  Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat}
> The content of the respective table inside the docker image is shown below.
> {noformat}
> SELECT * FROM "MIN_HISTORY_LEVEL" ;
>  MHL_TXNID | MHL_MIN_OPEN_TXNID 
>  --+---
>  6853 | 6687
>  7480 | 6947
>  7481 | 6947
>  6870 | 6687
>  7858 | 7858
>  6646 | 5946
>  7397 | 6947
>  7399 | 6947
>  5946 | 5946
>  6947 | 6947
>  7769 | 6947{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24489:
--
Labels: pull-request-available  (was: )

> TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL 
> metastore table
> --
>
> Key: HIVE-24489
> URL: https://issues.apache.org/jira/browse/HIVE-24489
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The failures can be seen here:
> [http://ci.hive.apache.org/job/hive-precommit/job/master/373/]
> The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
> {noformat}
> Caused by: MetaException(message:Unable to select from transaction database 
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
> constraint "min_history_level_pkey"
>  Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat}
> The content of the respective table inside the docker image is shown below.
> {noformat}
> SELECT * FROM "MIN_HISTORY_LEVEL" ;
>  MHL_TXNID | MHL_MIN_OPEN_TXNID 
>  --+---
>  6853 | 6687
>  7480 | 6947
>  7481 | 6947
>  6870 | 6687
>  7858 | 7858
>  6646 | 5946
>  7397 | 6947
>  7399 | 6947
>  5946 | 5946
>  6947 | 6947
>  7769 | 6947{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24397?focusedWorklogId=520314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520314
 ]

ASF GitHub Bot logged work on HIVE-24397:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 19:31
Start Date: 04/Dec/20 19:31
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1681:
URL: https://github.com/apache/hive/pull/1681#issuecomment-738974480


   Fix has been merged to master. Thank you @vnhive 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520314)
Time Spent: 2h  (was: 1h 50m)

> Add the projection specification to the table request object and add 
> placeholders in ObjectStore.java
> -
>
> Key: HIVE-24397
> URL: https://issues.apache.org/jira/browse/HIVE-24397
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java

2020-12-04 Thread Naveen Gangam (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24397.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been committed to master. Thank you for the contribute [~vnhive]

> Add the projection specification to the table request object and add 
> placeholders in ObjectStore.java
> -
>
> Key: HIVE-24397
> URL: https://issues.apache.org/jira/browse/HIVE-24397
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24490) Implement projections for tables in CachedStore.

2020-12-04 Thread Narayanan Venkateswaran (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayanan Venkateswaran reassigned HIVE-24490:
--


> Implement projections for tables in CachedStore.
> 
>
> Key: HIVE-24490
> URL: https://issues.apache.org/jira/browse/HIVE-24490
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table

2020-12-04 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-24489:
---
Description: 
The failures can be seen here:

[http://ci.hive.apache.org/job/hive-precommit/job/master/373/]

The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
{noformat}
Caused by: MetaException(message:Unable to select from transaction database 
org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
constraint "min_history_level_pkey"
 Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat}
The content of the respective table inside the docker image is shown below.
{noformat}
SELECT * FROM "MIN_HISTORY_LEVEL" ;
 MHL_TXNID | MHL_MIN_OPEN_TXNID 
 --+---
 6853 | 6687
 7480 | 6947
 7481 | 6947
 6870 | 6687
 7858 | 7858
 6646 | 5946
 7397 | 6947
 7399 | 6947
 5946 | 5946
 6947 | 6947
 7769 | 6947{noformat}

  was:
The failures can be seen here:

[http://ci.hive.apache.org/job/hive-precommit/job/master/373/]

The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
Caused by: MetaException(message:Unable to select from transaction database 
org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
constraint "min_history_level_pkey"
  Detail: Key ("MHL_TXNID")=(7858) already exists.
The content of the respective table inside the docker image is shown below.
SELECT * FROM "MIN_HISTORY_LEVEL" ;
 MHL_TXNID | MHL_MIN_OPEN_TXNID 
---+
  6853 |   6687
  7480 |   6947
  7481 |   6947
  6870 |   6687
  7858 |   7858
  6646 |   5946
  7397 |   6947
  7399 |   6947
  5946 |   5946
  6947 |   6947
  7769 |   6947


> TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL 
> metastore table
> --
>
> Key: HIVE-24489
> URL: https://issues.apache.org/jira/browse/HIVE-24489
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The failures can be seen here:
> [http://ci.hive.apache.org/job/hive-precommit/job/master/373/]
> The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
> {noformat}
> Caused by: MetaException(message:Unable to select from transaction database 
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
> constraint "min_history_level_pkey"
>  Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat}
> The content of the respective table inside the docker image is shown below.
> {noformat}
> SELECT * FROM "MIN_HISTORY_LEVEL" ;
>  MHL_TXNID | MHL_MIN_OPEN_TXNID 
>  --+---
>  6853 | 6687
>  7480 | 6947
>  7481 | 6947
>  6870 | 6687
>  7858 | 7858
>  6646 | 5946
>  7397 | 6947
>  7399 | 6947
>  5946 | 5946
>  6947 | 6947
>  7769 | 6947{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table

2020-12-04 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24489:
--


> TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL 
> metastore table
> --
>
> Key: HIVE-24489
> URL: https://issues.apache.org/jira/browse/HIVE-24489
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The failures can be seen here:
> [http://ci.hive.apache.org/job/hive-precommit/job/master/373/]
> The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
> Caused by: MetaException(message:Unable to select from transaction database 
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
> constraint "min_history_level_pkey"
>   Detail: Key ("MHL_TXNID")=(7858) already exists.
> The content of the respective table inside the docker image is shown below.
> SELECT * FROM "MIN_HISTORY_LEVEL" ;
>  MHL_TXNID | MHL_MIN_OPEN_TXNID 
> ---+
>   6853 |   6687
>   7480 |   6947
>   7481 |   6947
>   6870 |   6687
>   7858 |   7858
>   6646 |   5946
>   7397 |   6947
>   7399 |   6947
>   5946 |   5946
>   6947 |   6947
>   7769 |   6947



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table

2020-12-04 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24489 started by Stamatis Zampetakis.
--
> TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL 
> metastore table
> --
>
> Key: HIVE-24489
> URL: https://issues.apache.org/jira/browse/HIVE-24489
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The failures can be seen here:
> [http://ci.hive.apache.org/job/hive-precommit/job/master/373/]
> The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
> {noformat}
> Caused by: MetaException(message:Unable to select from transaction database 
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
> constraint "min_history_level_pkey"
>  Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat}
> The content of the respective table inside the docker image is shown below.
> {noformat}
> SELECT * FROM "MIN_HISTORY_LEVEL" ;
>  MHL_TXNID | MHL_MIN_OPEN_TXNID 
>  --+---
>  6853 | 6687
>  7480 | 6947
>  7481 | 6947
>  6870 | 6687
>  7858 | 7858
>  6646 | 5946
>  7397 | 6947
>  7399 | 6947
>  5946 | 5946
>  6947 | 6947
>  7769 | 6947{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24485) Make the slow-start behavior tunable

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24485?focusedWorklogId=520248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520248
 ]

ASF GitHub Bot logged work on HIVE-24485:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 16:18
Start Date: 04/Dec/20 16:18
Worklog Time Spent: 10m 
  Work Description: okumin opened a new pull request #1744:
URL: https://github.com/apache/hive/pull/1744


   ### What changes were proposed in this pull request?
   Make it possible to apply `tez.shuffle-vertex-manager.min-src-fraction` and 
`tez.shuffle-vertex-manager.max-src-fraction` even if auto reducer parallelism 
is enabled.
   
   https://issues.apache.org/jira/browse/HIVE-24485
   
   ### Why are the changes needed?
   With this PR, we can tweak the trade-off between timing to start and 
accuracy of estimation.
   Tez can gather more samples with higher fractions while it delays the start 
of the next vertex.
   
   ### Does this PR introduce _any_ user-facing change?
   Users who are configuring 
`tez.shuffle-vertex-manager.{min,max}-src-fraction` or 
`mapreduce.job.reduce.slowstart.completedmaps`, the alternative of 
`tez.shuffle-vertex-manager.min-src-fraction`, can face the change of behavior 
when they enable auto reducer parallelism.
   
   
https://github.com/apache/tez/blob/dadc09f5a44c1cb61af00efecb3d27b92c92aa8f/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/DeprecatedKeys.java#L123
   
   ### How was this patch tested?
   No test cases will be added because `Vertex#VertexManagerPluginDescriptor` 
is invisible from the outside of a Tez package.
   We can check the change by running a job and then checking the Tez log.
   
   ```
   beeline -e '
   SET hive.tez.auto.reducer.parallelism=true;
   SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism
   SET tez.shuffle-vertex-manager.min-src-fraction=0.55;
   SET tez.shuffle-vertex-manager.max-src-fraction=0.95;
   CREATE TABLE mofu (name string);
   INSERT INTO mofu (name) VALUES ('12345');
   SELECT name, count(*) FROM mofu GROUP BY name;
   '
   ```
   
   ```
   2020-12-04 16:10:47,170 [INFO] [Dispatcher thread {Central}] 
|vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.55 maxFrac: 0.95 
auto: true desiredTaskIput: 25600
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520248)
Remaining Estimate: 0h
Time Spent: 10m

> Make the slow-start behavior tunable
> 
>
> Key: HIVE-24485
> URL: https://issues.apache.org/jira/browse/HIVE-24485
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Tez
>Affects Versions: 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket would enable users to configure the timing of slow-start with 
> `tez.shuffle-vertex-manager.min-src-fraction` and 
> `tez.shuffle-vertex-manager.max-src-fraction`.
> Hive on Tez currently doesn't honor these parameters and ShuffleVertexManager 
> always uses the default value.
> We can control the timing to start vertexes the accuracy of estimated input 
> size if we can tweak these ones. This is useful when a vertex has tasks that 
> process a different amount of data.
>  
> We can reproduce the issue with this query.
> {code:java}
> SET hive.tez.auto.reducer.parallelism=true;
> SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism
> SET tez.shuffle-vertex-manager.min-src-fraction=0.55;
> SET tez.shuffle-vertex-manager.max-src-fraction=0.95;
> CREATE TABLE mofu (name string);
> INSERT INTO mofu (name) VALUES ('12345');
> SELECT name, count(*) FROM mofu GROUP BY name;{code}
> The fractions are ignored.
> {code:java}
> 2020-12-04 11:41:42,484 [INFO] [Dispatcher thread {Central}] 
> |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.25 maxFrac: 
> 0.75 auto: true desiredTaskIput: 25600
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24485) Make the slow-start behavior tunable

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24485:
--
Labels: pull-request-available  (was: )

> Make the slow-start behavior tunable
> 
>
> Key: HIVE-24485
> URL: https://issues.apache.org/jira/browse/HIVE-24485
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Tez
>Affects Versions: 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket would enable users to configure the timing of slow-start with 
> `tez.shuffle-vertex-manager.min-src-fraction` and 
> `tez.shuffle-vertex-manager.max-src-fraction`.
> Hive on Tez currently doesn't honor these parameters and ShuffleVertexManager 
> always uses the default value.
> We can control the timing to start vertexes the accuracy of estimated input 
> size if we can tweak these ones. This is useful when a vertex has tasks that 
> process a different amount of data.
>  
> We can reproduce the issue with this query.
> {code:java}
> SET hive.tez.auto.reducer.parallelism=true;
> SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism
> SET tez.shuffle-vertex-manager.min-src-fraction=0.55;
> SET tez.shuffle-vertex-manager.max-src-fraction=0.95;
> CREATE TABLE mofu (name string);
> INSERT INTO mofu (name) VALUES ('12345');
> SELECT name, count(*) FROM mofu GROUP BY name;{code}
> The fractions are ignored.
> {code:java}
> 2020-12-04 11:41:42,484 [INFO] [Dispatcher thread {Central}] 
> |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.25 maxFrac: 
> 0.75 auto: true desiredTaskIput: 25600
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24419) Refactor junit database rules to exploit testcontainers

2020-12-04 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244070#comment-17244070
 ] 

Stamatis Zampetakis commented on HIVE-24419:


Yes, they also take care of cleaning up in case you force shutdown the VM. They 
create another container managing the container :P

> Refactor junit database rules to exploit testcontainers
> ---
>
> Key: HIVE-24419
> URL: https://issues.apache.org/jira/browse/HIVE-24419
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Minor
>
> The 
> [DatabaseRule|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java]
>  and its subclasses allow tests to run over dockerized metastores using 
> different database backends. Essentially some part of the code manages 
> containers by invoking explicitly docker commands.
> The [testcontainers|https://www.testcontainers.org/modules/databases/] 
> project provides the necessary modules for managing dockerized databases in 
> an easy and intuitive way.
> The goal of this issue is to refactor the {{DatabaseRule}} hierarchy to take 
> advantage of testcontainers in order to delegate the burden of managing 
> containers outside of Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24487) Use alternate ports for dockerized databases during testing

2020-12-04 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244069#comment-17244069
 ] 

Stamatis Zampetakis commented on HIVE-24487:


Yes, I use it in other projects, it's quite cool what they provide.

> Use alternate ports for dockerized databases during testing
> ---
>
> Key: HIVE-24487
> URL: https://issues.apache.org/jira/browse/HIVE-24487
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>
> like 5432 for postgres and 3306 for mysql
> https://github.com/apache/hive/blob/52cf467836df71485e95b08c9e91e197e9898b79/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java#L35



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24419) Refactor junit database rules to exploit testcontainers

2020-12-04 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244063#comment-17244063
 ] 

Zoltan Haindrich commented on HIVE-24419:
-

I also had to alter the existing code to clean up before it starts a new one 
...and the next one on my list would have been to randomize the container name 
because I may run the same tests from different containers on the same 
machineI guess testcontainers might help a lot in stuff like that!

> Refactor junit database rules to exploit testcontainers
> ---
>
> Key: HIVE-24419
> URL: https://issues.apache.org/jira/browse/HIVE-24419
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Minor
>
> The 
> [DatabaseRule|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java]
>  and its subclasses allow tests to run over dockerized metastores using 
> different database backends. Essentially some part of the code manages 
> containers by invoking explicitly docker commands.
> The [testcontainers|https://www.testcontainers.org/modules/databases/] 
> project provides the necessary modules for managing dockerized databases in 
> an easy and intuitive way.
> The goal of this issue is to refactor the {{DatabaseRule}} hierarchy to take 
> advantage of testcontainers in order to delegate the burden of managing 
> containers outside of Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24487) Use alternate ports for dockerized databases during testing

2020-12-04 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244062#comment-17244062
 ] 

Zoltan Haindrich commented on HIVE-24487:
-

yeah - that could be the thing we need!

it has this at https://www.testcontainers.org/quickstart/junit_4_quickstart/
{code}
String address = redis.getHost();
Integer port = redis.getFirstMappedPort();
{code}

..it might already be prepared to understand how things change when docker host 
is set ! 

> Use alternate ports for dockerized databases during testing
> ---
>
> Key: HIVE-24487
> URL: https://issues.apache.org/jira/browse/HIVE-24487
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>
> like 5432 for postgres and 3306 for mysql
> https://github.com/apache/hive/blob/52cf467836df71485e95b08c9e91e197e9898b79/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java#L35



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17709?focusedWorklogId=520227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520227
 ]

ASF GitHub Bot logged work on HIVE-17709:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 15:22
Start Date: 04/Dec/20 15:22
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1739:
URL: https://github.com/apache/hive/pull/1739#issuecomment-738842995


   could you please take a look @belugabehr? I confirmed this patch and it's 
working on JDK11 LLAP (Cloudera Data Warehouse)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520227)
Time Spent: 20m  (was: 10m)

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24487) Use alternate ports for dockerized databases during testing

2020-12-04 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244054#comment-17244054
 ] 

Stamatis Zampetakis commented on HIVE-24487:


It totally makes sense. Actually, I logged HIVE-24419 as more ambitious way to 
take care of such problems.

> Use alternate ports for dockerized databases during testing
> ---
>
> Key: HIVE-24487
> URL: https://issues.apache.org/jira/browse/HIVE-24487
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>
> like 5432 for postgres and 3306 for mysql
> https://github.com/apache/hive/blob/52cf467836df71485e95b08c9e91e197e9898b79/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java#L35



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24488) Make docker host configurable for metastoredb/perf tests

2020-12-04 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24488:
---


> Make docker host configurable for metastoredb/perf tests
> 
>
> Key: HIVE-24488
> URL: https://issues.apache.org/jira/browse/HIVE-24488
> Project: Hive
>  Issue Type: Improvement
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> I tend to develop patches inside containers (hive-dev-box) to be able to work 
> on multiple patches in parallel
> Running tests which do use docker were always a bit problematic for me - when 
> I wanted to do it before: I manually exposed /var/lib/docker and added a 
> rinetd forward by hand (which is not nice)
> ...with the current move to run Perf tests as well against a dockerized 
> metastore exposes this problem a bit more for me.
> I'm also considering to add the ability to use minikube with hive-dev-box ; 
> but that's still needs exploring
> it would be much easier to expose the address of the docker host I'm using...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-04 Thread Ashish Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244018#comment-17244018
 ] 

Ashish Sharma commented on HIVE-24482:
--

[~kishendas] are you implementing the above change. If you can it take over the 
ticket?

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-04 Thread Ashish Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244018#comment-17244018
 ] 

Ashish Sharma edited comment on HIVE-24482 at 12/4/20, 2:02 PM:


[~kishendas] are you implementing the above change. If you are not can it take 
over the ticket?


was (Author: ashish-kumar-sharma):
[~kishendas] are you implementing the above change. If you can it take over the 
ticket?

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23605) 'Wrong FS' error during _external_tables_info creation when staging location is remote

2020-12-04 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23605:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> 'Wrong FS' error during _external_tables_info creation when staging location 
> is remote
> --
>
> Key: HIVE-23605
> URL: https://issues.apache.org/jira/browse/HIVE-23605
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23605.01.patch, HIVE-23605.02.patch, 
> HIVE-23605.03.patch, HIVE-23605.04.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When staging location is on target cluster, Repl Dump fails to create 
> _external_tables_info file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-04 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23965.
-
Resolution: Fixed

merged the new changes - resolving again :)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24486) Enhance operator merge logic to also consider going thru RS operators

2020-12-04 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24486:
---


> Enhance operator merge logic to also consider going thru RS operators
> -
>
> Key: HIVE-24486
> URL: https://issues.apache.org/jira/browse/HIVE-24486
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> the targeted situation looks like this:
> {code}
> OP1 -> RS1.1 -> JOIN1.1
> OP1 -> RS1.2 -> JOIN1.2 
> OP2 -> RS2.1 -> JOIN1.1 -> RS3.1 
> OP2 -> RS2.2 -> JOIN1.2 -> RS3.2 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24474?focusedWorklogId=520180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520180
 ]

ASF GitHub Bot logged work on HIVE-24474:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 12:54
Start Date: 04/Dec/20 12:54
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1735:
URL: https://github.com/apache/hive/pull/1735#discussion_r536079340



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -561,12 +561,12 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
   } catch (Throwable e) {
 LOG.error("Caught exception while trying to compact " + ci +
 ".  Marking failed to avoid repeated failures", e);
-abortCompactionAndMarkFailed(ci, compactorTxnId, e);
+compactorTxnId = abortCompactionAndMarkFailed(ci, compactorTxnId, e);
   }
 } catch (TException | IOException t) {
   LOG.error("Caught an exception in the main loop of compactor worker " + 
workerName, t);
   try {
-abortCompactionAndMarkFailed(ci, compactorTxnId, t);
+compactorTxnId = abortCompactionAndMarkFailed(ci, compactorTxnId, t);

Review comment:
   I thought that this way it would be more obvious to future developers 
that the compactorTxnId needs to be unset every time it's aborted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520180)
Time Spent: 0.5h  (was: 20m)

> Failed compaction always logs TxnAbortedException (again)
> -
>
> Key: HIVE-24474
> URL: https://issues.apache.org/jira/browse/HIVE-24474
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Re-introduced with HIVE-24096.
> If there is an error during compaction, the compaction's txn is aborted but 
> in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws 
> a TxnAbortedException.
> We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is 
> aborted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=520179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520179
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 12:47
Start Date: 04/Dec/20 12:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1714:
URL: https://github.com/apache/hive/pull/1714


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520179)
Time Spent: 7h 40m  (was: 7.5h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24485) Make the slow-start behavior tunable

2020-12-04 Thread okumin (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

okumin reassigned HIVE-24485:
-


> Make the slow-start behavior tunable
> 
>
> Key: HIVE-24485
> URL: https://issues.apache.org/jira/browse/HIVE-24485
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Tez
>Affects Versions: 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>
> This ticket would enable users to configure the timing of slow-start with 
> `tez.shuffle-vertex-manager.min-src-fraction` and 
> `tez.shuffle-vertex-manager.max-src-fraction`.
> Hive on Tez currently doesn't honor these parameters and ShuffleVertexManager 
> always uses the default value.
> We can control the timing to start vertexes the accuracy of estimated input 
> size if we can tweak these ones. This is useful when a vertex has tasks that 
> process a different amount of data.
>  
> We can reproduce the issue with this query.
> {code:java}
> SET hive.tez.auto.reducer.parallelism=true;
> SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism
> SET tez.shuffle-vertex-manager.min-src-fraction=0.55;
> SET tez.shuffle-vertex-manager.max-src-fraction=0.95;
> CREATE TABLE mofu (name string);
> INSERT INTO mofu (name) VALUES ('12345');
> SELECT name, count(*) FROM mofu GROUP BY name;{code}
> The fractions are ignored.
> {code:java}
> 2020-12-04 11:41:42,484 [INFO] [Dispatcher thread {Central}] 
> |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.25 maxFrac: 
> 0.75 auto: true desiredTaskIput: 25600
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24474?focusedWorklogId=520152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520152
 ]

ASF GitHub Bot logged work on HIVE-24474:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 11:08
Start Date: 04/Dec/20 11:08
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1735:
URL: https://github.com/apache/hive/pull/1735#discussion_r536020769



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -561,12 +561,12 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
   } catch (Throwable e) {
 LOG.error("Caught exception while trying to compact " + ci +
 ".  Marking failed to avoid repeated failures", e);
-abortCompactionAndMarkFailed(ci, compactorTxnId, e);
+compactorTxnId = abortCompactionAndMarkFailed(ci, compactorTxnId, e);
   }
 } catch (TException | IOException t) {
   LOG.error("Caught an exception in the main loop of compactor worker " + 
workerName, t);
   try {
-abortCompactionAndMarkFailed(ci, compactorTxnId, t);
+compactorTxnId = abortCompactionAndMarkFailed(ci, compactorTxnId, t);

Review comment:
   Why not just set it to TXN_ID_NOT_SET after calling 
abortComapctionAndMarkFailed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520152)
Time Spent: 20m  (was: 10m)

> Failed compaction always logs TxnAbortedException (again)
> -
>
> Key: HIVE-24474
> URL: https://issues.apache.org/jira/browse/HIVE-24474
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Re-introduced with HIVE-24096.
> If there is an error during compaction, the compaction's txn is aborted but 
> in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws 
> a TxnAbortedException.
> We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is 
> aborted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24475) Generalize fixacidkeyindex utility

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24475?focusedWorklogId=520150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520150
 ]

ASF GitHub Bot logged work on HIVE-24475:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 11:07
Start Date: 04/Dec/20 11:07
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1730:
URL: https://github.com/apache/hive/pull/1730#issuecomment-738722498


   LGTM +1 (non-binding)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520150)
Time Spent: 50m  (was: 40m)

> Generalize fixacidkeyindex utility
> --
>
> Key: HIVE-24475
> URL: https://issues.apache.org/jira/browse/HIVE-24475
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Transactions
>Affects Versions: 3.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There is a utility in hive which can validate/fix corrupted 
> hive.acid.key.index.
> hive --service fixacidkeyindex
> Unfortunately it is only tailored for a specific problem 
> (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally 
> validating and recovering the hive.acid.key.index from the stripe data itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=520147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520147
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 11:00
Start Date: 04/Dec/20 11:00
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1710:
URL: https://github.com/apache/hive/pull/1710#issuecomment-738719330


   LGTM +1 (non-binding)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520147)
Time Spent: 1.5h  (was: 1h 20m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-04 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-2.
--
Resolution: Won't Fix

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=520139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520139
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 10:38
Start Date: 04/Dec/20 10:38
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #1716:
URL: https://github.com/apache/hive/pull/1716#issuecomment-738709447


   I will close this because HIVE-24403 is making HIVE-23107 etc. backwards 
compatible so this change will not be needed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520139)
Time Spent: 7h 20m  (was: 7h 10m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=520140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520140
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 10:38
Start Date: 04/Dec/20 10:38
Worklog Time Spent: 10m 
  Work Description: klcopp closed pull request #1716:
URL: https://github.com/apache/hive/pull/1716


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520140)
Time Spent: 7.5h  (was: 7h 20m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24481) Skipped compaction can cause data corruption with streaming

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24481?focusedWorklogId=520118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520118
 ]

ASF GitHub Bot logged work on HIVE-24481:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 09:40
Start Date: 04/Dec/20 09:40
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1738:
URL: https://github.com/apache/hive/pull/1738#discussion_r535965368



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -566,20 +567,20 @@ else if (filename.startsWith(BUCKET_PREFIX)) {
   public static final class DirectoryImpl implements Directory {
 private final List abortedDirectories;
 private final Set abortedWriteIds;
+private final boolean uncompactedAborts;
 private final boolean isBaseInRawFormat;
 private final List original;
 private final List obsolete;
 private final List deltas;
 private final Path base;
 private List baseFiles;
 
-public DirectoryImpl(List abortedDirectories, Set 
abortedWriteIds,
-boolean isBaseInRawFormat, List original,
-List obsolete, List deltas, Path base) {
-  this.abortedDirectories = abortedDirectories == null ?
-  Collections.emptyList() : abortedDirectories;
-  this.abortedWriteIds = abortedWriteIds == null ?
-Collections.emptySet() : abortedWriteIds;
+public DirectoryImpl(List abortedDirectories, Set 
abortedWriteIds, boolean uncompactedAborts,

Review comment:
   It's starting to affect readability, maybe refactor in the following 
patches.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520118)
Time Spent: 40m  (was: 0.5h)

> Skipped compaction can cause data corruption with streaming
> ---
>
> Key: HIVE-24481
> URL: https://issues.apache.org/jira/browse/HIVE-24481
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: Compaction, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Timeline:
> 1. create a partitioned table, add one static partition
> 2. transaction 1 writes delta_1, and aborts
> 3. create streaming connection, with batch 3, withStaticPartitionValues with 
> the existing partition
> 4. beginTransaction, write, commitTransaction
> 5. beginTransaction, write, abortTransaction
> 6. beingTransaction, write, commitTransaction
> 7. close connection, count of the table is 2
> 8. run manual minor compaction on the partition. it will skip compaction, 
> because deltacount =1 but clean, because there is aborted txn1
> 9. cleaner will remove both aborted record from txn_components
> 10. wait for acidhousekeeper to remove empty aborted txns
> 11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-12-04 Thread Peter Varga (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga resolved HIVE-24403.

Fix Version/s: 4.0.0
   Resolution: Fixed

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24481) Skipped compaction can cause data corruption with streaming

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24481?focusedWorklogId=520102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520102
 ]

ASF GitHub Bot logged work on HIVE-24481:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 08:59
Start Date: 04/Dec/20 08:59
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1738:
URL: https://github.com/apache/hive/pull/1738#discussion_r535939329



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -1369,14 +1383,14 @@ private static Directory getAcidState(FileSystem 
fileSystem, Path candidateDirec
 if (childrenWithId != null) {
   for (HdfsFileStatusWithId child : childrenWithId) {
 getChildState(child, writeIdList, working, originalDirectories, 
original, obsolete,
-bestBase, ignoreEmptyFiles, abortedDirectories, abortedWriteIds, 
fs, validTxnList);
+bestBase, ignoreEmptyFiles, abortedDirectories, abortedWriteIds, 
uncompactedAborts, fs, validTxnList);

Review comment:
   In a follow up Jira it might be worth change this whole AcidUtils 
approach and start to put everything from the beginning in a DirectoryImpl, so 
the argument count could be decreased to a sane amount.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520102)
Time Spent: 0.5h  (was: 20m)

> Skipped compaction can cause data corruption with streaming
> ---
>
> Key: HIVE-24481
> URL: https://issues.apache.org/jira/browse/HIVE-24481
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: Compaction, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Timeline:
> 1. create a partitioned table, add one static partition
> 2. transaction 1 writes delta_1, and aborts
> 3. create streaming connection, with batch 3, withStaticPartitionValues with 
> the existing partition
> 4. beginTransaction, write, commitTransaction
> 5. beginTransaction, write, abortTransaction
> 6. beingTransaction, write, commitTransaction
> 7. close connection, count of the table is 2
> 8. run manual minor compaction on the partition. it will skip compaction, 
> because deltacount =1 but clean, because there is aborted txn1
> 9. cleaner will remove both aborted record from txn_components
> 10. wait for acidhousekeeper to remove empty aborted txns
> 11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=520103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520103
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 09:00
Start Date: 04/Dec/20 09:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1688:
URL: https://github.com/apache/hive/pull/1688


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520103)
Time Spent: 4h 20m  (was: 4h 10m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24481) Skipped compaction can cause data corruption with streaming

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24481?focusedWorklogId=520097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520097
 ]

ASF GitHub Bot logged work on HIVE-24481:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 08:57
Start Date: 04/Dec/20 08:57
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1738:
URL: https://github.com/apache/hive/pull/1738#issuecomment-738658011


   @deniskuzZ @klcopp can I ask for a review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520097)
Time Spent: 20m  (was: 10m)

> Skipped compaction can cause data corruption with streaming
> ---
>
> Key: HIVE-24481
> URL: https://issues.apache.org/jira/browse/HIVE-24481
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: Compaction, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Timeline:
> 1. create a partitioned table, add one static partition
> 2. transaction 1 writes delta_1, and aborts
> 3. create streaming connection, with batch 3, withStaticPartitionValues with 
> the existing partition
> 4. beginTransaction, write, commitTransaction
> 5. beginTransaction, write, abortTransaction
> 6. beingTransaction, write, commitTransaction
> 7. close connection, count of the table is 2
> 8. run manual minor compaction on the partition. it will skip compaction, 
> because deltacount =1 but clean, because there is aborted txn1
> 9. cleaner will remove both aborted record from txn_components
> 10. wait for acidhousekeeper to remove empty aborted txns
> 11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=520074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520074
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 08:21
Start Date: 04/Dec/20 08:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1712:
URL: https://github.com/apache/hive/pull/1712#discussion_r535910850



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2877,6 +2877,20 @@ private static String normalizeCase(String s) {
 return s == null ? null : s.toLowerCase();
   }
 
+  private static String normalizePartitionCase(String s) {

Review comment:
   @nareshpr, LGTM, however could you please try to reuse 
FileUtils.makePartName(List partCols, List vals):
   
   Map map = Splitter.on( "=" ).withKeyValueSeparator( 
Path.SEPARATOR ).split(lc.getPartitionname());
   return FileUtils.makePartName(new ArrayList<>(map.keySet()), new 
ArrayList<>(map.values()));





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520074)
Time Spent: 2.5h  (was: 2h 20m)

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bangalore, assuming it has been dropped and 
> moving on{code}
> I verifed below 4 SQL's with my PR, those all produced correct 
> PartitionKeyValue
> i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
> {code:java}
> insert into table abc PARTITION(CitY='Bangalore') values('Dan');
> insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
> update table abc set Name='xy' where CiTy='Bangalore';
> delete from abc where CiTy='Bangalore';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=520070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520070
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 08:11
Start Date: 04/Dec/20 08:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1712:
URL: https://github.com/apache/hive/pull/1712#discussion_r535910850



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2877,6 +2877,20 @@ private static String normalizeCase(String s) {
 return s == null ? null : s.toLowerCase();
   }
 
+  private static String normalizePartitionCase(String s) {

Review comment:
   @nareshpr, LGTM, however could you please try to reuse 
FileUtils.makePartName(List partCols, List vals)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520070)
Time Spent: 2h 20m  (was: 2h 10m)

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bangalore, assuming it has been dropped and 
> moving on{code}
> I verifed below 4 SQL's with my PR, those all produced correct 
> PartitionKeyValue
> i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
> {code:java}
> insert into table abc PARTITION(CitY='Bangalore') values('Dan');
> insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
> update table abc set Name='xy' where CiTy='Bangalore';
> delete from abc where CiTy='Bangalore';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-12-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=520066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520066
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 08:06
Start Date: 04/Dec/20 08:06
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1712:
URL: https://github.com/apache/hive/pull/1712#discussion_r535906390



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2877,6 +2877,20 @@ private static String normalizeCase(String s) {
 return s == null ? null : s.toLowerCase();
   }
 
+  private static String normalizePartitionCase(String s) {
+if (s == null) {
+  return null;
+} else {

Review comment:
   No need for the else clause





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520066)
Time Spent: 2h 10m  (was: 2h)

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bangalore, assuming it has been dropped and 
> moving on{code}
> I verifed below 4 SQL's with my PR, those all produced correct 
> PartitionKeyValue
> i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
> {code:java}
> insert into table abc PARTITION(CitY='Bangalore') values('Dan');
> insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
> update table abc set Name='xy' where CiTy='Bangalore';
> delete from abc where CiTy='Bangalore';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

48 matches

Mail list logo