[jira] [Work logged] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.
[ https://issues.apache.org/jira/browse/HIVE-24491?focusedWorklogId=520434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520434 ] ASF GitHub Bot logged work on HIVE-24491: - Author: ASF GitHub Bot Created on: 05/Dec/20 05:58 Start Date: 05/Dec/20 05:58 Worklog Time Spent: 10m Work Description: rajkrrsingh opened a new pull request #1746: URL: https://github.com/apache/hive/pull/1746 ### What changes were proposed in this pull request? HIVE-23026 add capability to set tez.job.name but it's not effective if tez session pool manager is configured or tez session reuse. with this change whenever user set tez.job.name, hive will force a new Tez Session instead using the one from the pool. ### Why are the changes needed? without this change setting tez.job.name is not effective if tez session is reuse. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? on local reproduce cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520434) Remaining Estimate: 0h Time Spent: 10m > setting custom job name is ineffective if the tez session pool is configured > or in case of session reuse. > - > > Key: HIVE-24491 > URL: https://issues.apache.org/jira/browse/HIVE-24491 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-23026 add capability to set tez.job.name but it's not effective if tez > session pool manager is configured or tez session reuse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.
[ https://issues.apache.org/jira/browse/HIVE-24491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24491: -- Labels: pull-request-available (was: ) > setting custom job name is ineffective if the tez session pool is configured > or in case of session reuse. > - > > Key: HIVE-24491 > URL: https://issues.apache.org/jira/browse/HIVE-24491 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-23026 add capability to set tez.job.name but it's not effective if tez > session pool manager is configured or tez session reuse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24491) setting custom job name is ineffective if the tez session pool is configured or in case of session reuse.
[ https://issues.apache.org/jira/browse/HIVE-24491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajkumar Singh reassigned HIVE-24491: - > setting custom job name is ineffective if the tez session pool is configured > or in case of session reuse. > - > > Key: HIVE-24491 > URL: https://issues.apache.org/jira/browse/HIVE-24491 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > > HIVE-23026 add capability to set tez.job.name but it's not effective if tez > session pool manager is configured or tez session reuse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24228) Support complex types in LLAP
[ https://issues.apache.org/jira/browse/HIVE-24228?focusedWorklogId=520389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520389 ] ASF GitHub Bot logged work on HIVE-24228: - Author: ASF GitHub Bot Created on: 05/Dec/20 00:48 Start Date: 05/Dec/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1551: URL: https://github.com/apache/hive/pull/1551#issuecomment-739093971 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520389) Time Spent: 20m (was: 10m) > Support complex types in LLAP > - > > Key: HIVE-24228 > URL: https://issues.apache.org/jira/browse/HIVE-24228 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Yuriy Baltovskyy >Assignee: Yuriy Baltovskyy >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The idea of this improvement is to support complex types (arrays, maps, > structs) returned from LLAP data reader. This is useful when consuming LLAP > data later in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24204) LLAP: Invalid TEZ Job token in multi fragment query
[ https://issues.apache.org/jira/browse/HIVE-24204?focusedWorklogId=520390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520390 ] ASF GitHub Bot logged work on HIVE-24204: - Author: ASF GitHub Bot Created on: 05/Dec/20 00:48 Start Date: 05/Dec/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1530: URL: https://github.com/apache/hive/pull/1530 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520390) Time Spent: 0.5h (was: 20m) > LLAP: Invalid TEZ Job token in multi fragment query > --- > > Key: HIVE-24204 > URL: https://issues.apache.org/jira/browse/HIVE-24204 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.3.0 >Reporter: Yuriy Baltovskyy >Assignee: Yuriy Baltovskyy >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When using LLAP server in the Kerberized environment and submitting the query > via LLAP client that is planned as multi fragment (multiple splits), the > following error occurs and the query fails: > org.apache.hadoop.ipc.Server: javax.security.sasl.SaslException: DIGEST-MD5: > digest response format violation. Mismatched response. > This occurs because each split uses its own connection to LLAP server and its > own TEZ job token while LLAP server stores only one token binding it to the > whole query and not the separate fragment. When LLAP server communicates to > the clients and uses the stored token, this causes Sasl exception due to > using invalid token. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?focusedWorklogId=520391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520391 ] ASF GitHub Bot logged work on HIVE-23737: - Author: ASF GitHub Bot Created on: 05/Dec/20 00:48 Start Date: 05/Dec/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1195: URL: https://github.com/apache/hive/pull/1195 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520391) Time Spent: 1h 40m (was: 1.5h) > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-12930) Support SSL Shuffle for LLAP
[ https://issues.apache.org/jira/browse/HIVE-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244365#comment-17244365 ] Krishnadas commented on HIVE-12930: --- Any updates on this one? > Support SSL Shuffle for LLAP > > > Key: HIVE-12930 > URL: https://issues.apache.org/jira/browse/HIVE-12930 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Siddharth Seth >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table
[ https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-24489. Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master, thanks [~zabetak]! > TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL > metastore table > -- > > Key: HIVE-24489 > URL: https://issues.apache.org/jira/browse/HIVE-24489 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The failures can be seen here: > [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] > The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. > {noformat} > Caused by: MetaException(message:Unable to select from transaction database > org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique > constraint "min_history_level_pkey" > Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat} > The content of the respective table inside the docker image is shown below. > {noformat} > SELECT * FROM "MIN_HISTORY_LEVEL" ; > MHL_TXNID | MHL_MIN_OPEN_TXNID > --+--- > 6853 | 6687 > 7480 | 6947 > 7481 | 6947 > 6870 | 6687 > 7858 | 7858 > 6646 | 5946 > 7397 | 6947 > 7399 | 6947 > 5946 | 5946 > 6947 | 6947 > 7769 | 6947{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table
[ https://issues.apache.org/jira/browse/HIVE-24489?focusedWorklogId=520377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520377 ] ASF GitHub Bot logged work on HIVE-24489: - Author: ASF GitHub Bot Created on: 04/Dec/20 23:52 Start Date: 04/Dec/20 23:52 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1745: URL: https://github.com/apache/hive/pull/1745 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520377) Time Spent: 20m (was: 10m) > TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL > metastore table > -- > > Key: HIVE-24489 > URL: https://issues.apache.org/jira/browse/HIVE-24489 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The failures can be seen here: > [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] > The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. > {noformat} > Caused by: MetaException(message:Unable to select from transaction database > org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique > constraint "min_history_level_pkey" > Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat} > The content of the respective table inside the docker image is shown below. > {noformat} > SELECT * FROM "MIN_HISTORY_LEVEL" ; > MHL_TXNID | MHL_MIN_OPEN_TXNID > --+--- > 6853 | 6687 > 7480 | 6947 > 7481 | 6947 > 6870 | 6687 > 7858 | 7858 > 6646 | 5946 > 7397 | 6947 > 7399 | 6947 > 5946 | 5946 > 6947 | 6947 > 7769 | 6947{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table
[ https://issues.apache.org/jira/browse/HIVE-24489?focusedWorklogId=520340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520340 ] ASF GitHub Bot logged work on HIVE-24489: - Author: ASF GitHub Bot Created on: 04/Dec/20 20:59 Start Date: 04/Dec/20 20:59 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #1745: URL: https://github.com/apache/hive/pull/1745 ### What changes were proposed in this pull request? Update docker metastore version to zabetak/postgres-tpcds-metastore:1.3 on which some cleanup was performed to remove stale entries in various tables including MIN_HISTORY_LEVEL. See https://github.com/zabetak/hive-postgres-metastore/commit/04be8bd2e9400d7ae604a4112aaa4531dd36 ### Why are the changes needed? To fix test failures while running `TestTezTPCDS30TBPerfCliDriver` tests. More details in the JIRA. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520340) Remaining Estimate: 0h Time Spent: 10m > TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL > metastore table > -- > > Key: HIVE-24489 > URL: https://issues.apache.org/jira/browse/HIVE-24489 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The failures can be seen here: > [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] > The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. > {noformat} > Caused by: MetaException(message:Unable to select from transaction database > org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique > constraint "min_history_level_pkey" > Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat} > The content of the respective table inside the docker image is shown below. > {noformat} > SELECT * FROM "MIN_HISTORY_LEVEL" ; > MHL_TXNID | MHL_MIN_OPEN_TXNID > --+--- > 6853 | 6687 > 7480 | 6947 > 7481 | 6947 > 6870 | 6687 > 7858 | 7858 > 6646 | 5946 > 7397 | 6947 > 7399 | 6947 > 5946 | 5946 > 6947 | 6947 > 7769 | 6947{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table
[ https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24489: -- Labels: pull-request-available (was: ) > TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL > metastore table > -- > > Key: HIVE-24489 > URL: https://issues.apache.org/jira/browse/HIVE-24489 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The failures can be seen here: > [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] > The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. > {noformat} > Caused by: MetaException(message:Unable to select from transaction database > org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique > constraint "min_history_level_pkey" > Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat} > The content of the respective table inside the docker image is shown below. > {noformat} > SELECT * FROM "MIN_HISTORY_LEVEL" ; > MHL_TXNID | MHL_MIN_OPEN_TXNID > --+--- > 6853 | 6687 > 7480 | 6947 > 7481 | 6947 > 6870 | 6687 > 7858 | 7858 > 6646 | 5946 > 7397 | 6947 > 7399 | 6947 > 5946 | 5946 > 6947 | 6947 > 7769 | 6947{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java
[ https://issues.apache.org/jira/browse/HIVE-24397?focusedWorklogId=520314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520314 ] ASF GitHub Bot logged work on HIVE-24397: - Author: ASF GitHub Bot Created on: 04/Dec/20 19:31 Start Date: 04/Dec/20 19:31 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #1681: URL: https://github.com/apache/hive/pull/1681#issuecomment-738974480 Fix has been merged to master. Thank you @vnhive This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520314) Time Spent: 2h (was: 1h 50m) > Add the projection specification to the table request object and add > placeholders in ObjectStore.java > - > > Key: HIVE-24397 > URL: https://issues.apache.org/jira/browse/HIVE-24397 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java
[ https://issues.apache.org/jira/browse/HIVE-24397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-24397. -- Fix Version/s: 4.0.0 Resolution: Fixed Fix has been committed to master. Thank you for the contribute [~vnhive] > Add the projection specification to the table request object and add > placeholders in ObjectStore.java > - > > Key: HIVE-24397 > URL: https://issues.apache.org/jira/browse/HIVE-24397 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24490) Implement projections for tables in CachedStore.
[ https://issues.apache.org/jira/browse/HIVE-24490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narayanan Venkateswaran reassigned HIVE-24490: -- > Implement projections for tables in CachedStore. > > > Key: HIVE-24490 > URL: https://issues.apache.org/jira/browse/HIVE-24490 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table
[ https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-24489: --- Description: The failures can be seen here: [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. {noformat} Caused by: MetaException(message:Unable to select from transaction database org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "min_history_level_pkey" Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat} The content of the respective table inside the docker image is shown below. {noformat} SELECT * FROM "MIN_HISTORY_LEVEL" ; MHL_TXNID | MHL_MIN_OPEN_TXNID --+--- 6853 | 6687 7480 | 6947 7481 | 6947 6870 | 6687 7858 | 7858 6646 | 5946 7397 | 6947 7399 | 6947 5946 | 5946 6947 | 6947 7769 | 6947{noformat} was: The failures can be seen here: [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. Caused by: MetaException(message:Unable to select from transaction database org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "min_history_level_pkey" Detail: Key ("MHL_TXNID")=(7858) already exists. The content of the respective table inside the docker image is shown below. SELECT * FROM "MIN_HISTORY_LEVEL" ; MHL_TXNID | MHL_MIN_OPEN_TXNID ---+ 6853 | 6687 7480 | 6947 7481 | 6947 6870 | 6687 7858 | 7858 6646 | 5946 7397 | 6947 7399 | 6947 5946 | 5946 6947 | 6947 7769 | 6947 > TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL > metastore table > -- > > Key: HIVE-24489 > URL: https://issues.apache.org/jira/browse/HIVE-24489 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The failures can be seen here: > [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] > The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. > {noformat} > Caused by: MetaException(message:Unable to select from transaction database > org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique > constraint "min_history_level_pkey" > Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat} > The content of the respective table inside the docker image is shown below. > {noformat} > SELECT * FROM "MIN_HISTORY_LEVEL" ; > MHL_TXNID | MHL_MIN_OPEN_TXNID > --+--- > 6853 | 6687 > 7480 | 6947 > 7481 | 6947 > 6870 | 6687 > 7858 | 7858 > 6646 | 5946 > 7397 | 6947 > 7399 | 6947 > 5946 | 5946 > 6947 | 6947 > 7769 | 6947{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table
[ https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-24489: -- > TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL > metastore table > -- > > Key: HIVE-24489 > URL: https://issues.apache.org/jira/browse/HIVE-24489 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The failures can be seen here: > [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] > The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. > Caused by: MetaException(message:Unable to select from transaction database > org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique > constraint "min_history_level_pkey" > Detail: Key ("MHL_TXNID")=(7858) already exists. > The content of the respective table inside the docker image is shown below. > SELECT * FROM "MIN_HISTORY_LEVEL" ; > MHL_TXNID | MHL_MIN_OPEN_TXNID > ---+ > 6853 | 6687 > 7480 | 6947 > 7481 | 6947 > 6870 | 6687 > 7858 | 7858 > 6646 | 5946 > 7397 | 6947 > 7399 | 6947 > 5946 | 5946 > 6947 | 6947 > 7769 | 6947 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table
[ https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24489 started by Stamatis Zampetakis. -- > TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL > metastore table > -- > > Key: HIVE-24489 > URL: https://issues.apache.org/jira/browse/HIVE-24489 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The failures can be seen here: > [http://ci.hive.apache.org/job/hive-precommit/job/master/373/] > The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table. > {noformat} > Caused by: MetaException(message:Unable to select from transaction database > org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique > constraint "min_history_level_pkey" > Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat} > The content of the respective table inside the docker image is shown below. > {noformat} > SELECT * FROM "MIN_HISTORY_LEVEL" ; > MHL_TXNID | MHL_MIN_OPEN_TXNID > --+--- > 6853 | 6687 > 7480 | 6947 > 7481 | 6947 > 6870 | 6687 > 7858 | 7858 > 6646 | 5946 > 7397 | 6947 > 7399 | 6947 > 5946 | 5946 > 6947 | 6947 > 7769 | 6947{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24485) Make the slow-start behavior tunable
[ https://issues.apache.org/jira/browse/HIVE-24485?focusedWorklogId=520248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520248 ] ASF GitHub Bot logged work on HIVE-24485: - Author: ASF GitHub Bot Created on: 04/Dec/20 16:18 Start Date: 04/Dec/20 16:18 Worklog Time Spent: 10m Work Description: okumin opened a new pull request #1744: URL: https://github.com/apache/hive/pull/1744 ### What changes were proposed in this pull request? Make it possible to apply `tez.shuffle-vertex-manager.min-src-fraction` and `tez.shuffle-vertex-manager.max-src-fraction` even if auto reducer parallelism is enabled. https://issues.apache.org/jira/browse/HIVE-24485 ### Why are the changes needed? With this PR, we can tweak the trade-off between timing to start and accuracy of estimation. Tez can gather more samples with higher fractions while it delays the start of the next vertex. ### Does this PR introduce _any_ user-facing change? Users who are configuring `tez.shuffle-vertex-manager.{min,max}-src-fraction` or `mapreduce.job.reduce.slowstart.completedmaps`, the alternative of `tez.shuffle-vertex-manager.min-src-fraction`, can face the change of behavior when they enable auto reducer parallelism. https://github.com/apache/tez/blob/dadc09f5a44c1cb61af00efecb3d27b92c92aa8f/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/DeprecatedKeys.java#L123 ### How was this patch tested? No test cases will be added because `Vertex#VertexManagerPluginDescriptor` is invisible from the outside of a Tez package. We can check the change by running a job and then checking the Tez log. ``` beeline -e ' SET hive.tez.auto.reducer.parallelism=true; SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism SET tez.shuffle-vertex-manager.min-src-fraction=0.55; SET tez.shuffle-vertex-manager.max-src-fraction=0.95; CREATE TABLE mofu (name string); INSERT INTO mofu (name) VALUES ('12345'); SELECT name, count(*) FROM mofu GROUP BY name; ' ``` ``` 2020-12-04 16:10:47,170 [INFO] [Dispatcher thread {Central}] |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.55 maxFrac: 0.95 auto: true desiredTaskIput: 25600 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520248) Remaining Estimate: 0h Time Spent: 10m > Make the slow-start behavior tunable > > > Key: HIVE-24485 > URL: https://issues.apache.org/jira/browse/HIVE-24485 > Project: Hive > Issue Type: Improvement > Components: Hive, Tez >Affects Versions: 3.1.2, 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This ticket would enable users to configure the timing of slow-start with > `tez.shuffle-vertex-manager.min-src-fraction` and > `tez.shuffle-vertex-manager.max-src-fraction`. > Hive on Tez currently doesn't honor these parameters and ShuffleVertexManager > always uses the default value. > We can control the timing to start vertexes the accuracy of estimated input > size if we can tweak these ones. This is useful when a vertex has tasks that > process a different amount of data. > > We can reproduce the issue with this query. > {code:java} > SET hive.tez.auto.reducer.parallelism=true; > SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism > SET tez.shuffle-vertex-manager.min-src-fraction=0.55; > SET tez.shuffle-vertex-manager.max-src-fraction=0.95; > CREATE TABLE mofu (name string); > INSERT INTO mofu (name) VALUES ('12345'); > SELECT name, count(*) FROM mofu GROUP BY name;{code} > The fractions are ignored. > {code:java} > 2020-12-04 11:41:42,484 [INFO] [Dispatcher thread {Central}] > |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.25 maxFrac: > 0.75 auto: true desiredTaskIput: 25600 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24485) Make the slow-start behavior tunable
[ https://issues.apache.org/jira/browse/HIVE-24485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24485: -- Labels: pull-request-available (was: ) > Make the slow-start behavior tunable > > > Key: HIVE-24485 > URL: https://issues.apache.org/jira/browse/HIVE-24485 > Project: Hive > Issue Type: Improvement > Components: Hive, Tez >Affects Versions: 3.1.2, 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This ticket would enable users to configure the timing of slow-start with > `tez.shuffle-vertex-manager.min-src-fraction` and > `tez.shuffle-vertex-manager.max-src-fraction`. > Hive on Tez currently doesn't honor these parameters and ShuffleVertexManager > always uses the default value. > We can control the timing to start vertexes the accuracy of estimated input > size if we can tweak these ones. This is useful when a vertex has tasks that > process a different amount of data. > > We can reproduce the issue with this query. > {code:java} > SET hive.tez.auto.reducer.parallelism=true; > SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism > SET tez.shuffle-vertex-manager.min-src-fraction=0.55; > SET tez.shuffle-vertex-manager.max-src-fraction=0.95; > CREATE TABLE mofu (name string); > INSERT INTO mofu (name) VALUES ('12345'); > SELECT name, count(*) FROM mofu GROUP BY name;{code} > The fractions are ignored. > {code:java} > 2020-12-04 11:41:42,484 [INFO] [Dispatcher thread {Central}] > |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.25 maxFrac: > 0.75 auto: true desiredTaskIput: 25600 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24419) Refactor junit database rules to exploit testcontainers
[ https://issues.apache.org/jira/browse/HIVE-24419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244070#comment-17244070 ] Stamatis Zampetakis commented on HIVE-24419: Yes, they also take care of cleaning up in case you force shutdown the VM. They create another container managing the container :P > Refactor junit database rules to exploit testcontainers > --- > > Key: HIVE-24419 > URL: https://issues.apache.org/jira/browse/HIVE-24419 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Minor > > The > [DatabaseRule|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java] > and its subclasses allow tests to run over dockerized metastores using > different database backends. Essentially some part of the code manages > containers by invoking explicitly docker commands. > The [testcontainers|https://www.testcontainers.org/modules/databases/] > project provides the necessary modules for managing dockerized databases in > an easy and intuitive way. > The goal of this issue is to refactor the {{DatabaseRule}} hierarchy to take > advantage of testcontainers in order to delegate the burden of managing > containers outside of Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24487) Use alternate ports for dockerized databases during testing
[ https://issues.apache.org/jira/browse/HIVE-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244069#comment-17244069 ] Stamatis Zampetakis commented on HIVE-24487: Yes, I use it in other projects, it's quite cool what they provide. > Use alternate ports for dockerized databases during testing > --- > > Key: HIVE-24487 > URL: https://issues.apache.org/jira/browse/HIVE-24487 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Priority: Major > > like 5432 for postgres and 3306 for mysql > https://github.com/apache/hive/blob/52cf467836df71485e95b08c9e91e197e9898b79/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java#L35 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24419) Refactor junit database rules to exploit testcontainers
[ https://issues.apache.org/jira/browse/HIVE-24419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244063#comment-17244063 ] Zoltan Haindrich commented on HIVE-24419: - I also had to alter the existing code to clean up before it starts a new one ...and the next one on my list would have been to randomize the container name because I may run the same tests from different containers on the same machineI guess testcontainers might help a lot in stuff like that! > Refactor junit database rules to exploit testcontainers > --- > > Key: HIVE-24419 > URL: https://issues.apache.org/jira/browse/HIVE-24419 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Minor > > The > [DatabaseRule|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java] > and its subclasses allow tests to run over dockerized metastores using > different database backends. Essentially some part of the code manages > containers by invoking explicitly docker commands. > The [testcontainers|https://www.testcontainers.org/modules/databases/] > project provides the necessary modules for managing dockerized databases in > an easy and intuitive way. > The goal of this issue is to refactor the {{DatabaseRule}} hierarchy to take > advantage of testcontainers in order to delegate the burden of managing > containers outside of Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24487) Use alternate ports for dockerized databases during testing
[ https://issues.apache.org/jira/browse/HIVE-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244062#comment-17244062 ] Zoltan Haindrich commented on HIVE-24487: - yeah - that could be the thing we need! it has this at https://www.testcontainers.org/quickstart/junit_4_quickstart/ {code} String address = redis.getHost(); Integer port = redis.getFirstMappedPort(); {code} ..it might already be prepared to understand how things change when docker host is set ! > Use alternate ports for dockerized databases during testing > --- > > Key: HIVE-24487 > URL: https://issues.apache.org/jira/browse/HIVE-24487 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Priority: Major > > like 5432 for postgres and 3306 for mysql > https://github.com/apache/hive/blob/52cf467836df71485e95b08c9e91e197e9898b79/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java#L35 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?focusedWorklogId=520227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520227 ] ASF GitHub Bot logged work on HIVE-17709: - Author: ASF GitHub Bot Created on: 04/Dec/20 15:22 Start Date: 04/Dec/20 15:22 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #1739: URL: https://github.com/apache/hive/pull/1739#issuecomment-738842995 could you please take a look @belugabehr? I confirmed this patch and it's working on JDK11 LLAP (Cloudera Data Warehouse) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520227) Time Spent: 20m (was: 10m) > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24487) Use alternate ports for dockerized databases during testing
[ https://issues.apache.org/jira/browse/HIVE-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244054#comment-17244054 ] Stamatis Zampetakis commented on HIVE-24487: It totally makes sense. Actually, I logged HIVE-24419 as more ambitious way to take care of such problems. > Use alternate ports for dockerized databases during testing > --- > > Key: HIVE-24487 > URL: https://issues.apache.org/jira/browse/HIVE-24487 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Priority: Major > > like 5432 for postgres and 3306 for mysql > https://github.com/apache/hive/blob/52cf467836df71485e95b08c9e91e197e9898b79/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java#L35 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24488) Make docker host configurable for metastoredb/perf tests
[ https://issues.apache.org/jira/browse/HIVE-24488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-24488: --- > Make docker host configurable for metastoredb/perf tests > > > Key: HIVE-24488 > URL: https://issues.apache.org/jira/browse/HIVE-24488 > Project: Hive > Issue Type: Improvement > Components: Test >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > I tend to develop patches inside containers (hive-dev-box) to be able to work > on multiple patches in parallel > Running tests which do use docker were always a bit problematic for me - when > I wanted to do it before: I manually exposed /var/lib/docker and added a > rinetd forward by hand (which is not nice) > ...with the current move to run Perf tests as well against a dockerized > metastore exposes this problem a bit more for me. > I'm also considering to add the ability to use minikube with hive-dev-box ; > but that's still needs exploring > it would be much easier to expose the address of the docker host I'm using... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244018#comment-17244018 ] Ashish Sharma commented on HIVE-24482: -- [~kishendas] are you implementing the above change. If you can it take over the ticket? > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244018#comment-17244018 ] Ashish Sharma edited comment on HIVE-24482 at 12/4/20, 2:02 PM: [~kishendas] are you implementing the above change. If you are not can it take over the ticket? was (Author: ashish-kumar-sharma): [~kishendas] are you implementing the above change. If you can it take over the ticket? > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23605) 'Wrong FS' error during _external_tables_info creation when staging location is remote
[ https://issues.apache.org/jira/browse/HIVE-23605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-23605: Resolution: Fixed Status: Resolved (was: Patch Available) > 'Wrong FS' error during _external_tables_info creation when staging location > is remote > -- > > Key: HIVE-23605 > URL: https://issues.apache.org/jira/browse/HIVE-23605 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23605.01.patch, HIVE-23605.02.patch, > HIVE-23605.03.patch, HIVE-23605.04.patch > > Time Spent: 40m > Remaining Estimate: 0h > > When staging location is on target cluster, Repl Dump fails to create > _external_tables_info file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-23965. - Resolution: Fixed merged the new changes - resolving again :) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 7h 40m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24486) Enhance operator merge logic to also consider going thru RS operators
[ https://issues.apache.org/jira/browse/HIVE-24486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-24486: --- > Enhance operator merge logic to also consider going thru RS operators > - > > Key: HIVE-24486 > URL: https://issues.apache.org/jira/browse/HIVE-24486 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > the targeted situation looks like this: > {code} > OP1 -> RS1.1 -> JOIN1.1 > OP1 -> RS1.2 -> JOIN1.2 > OP2 -> RS2.1 -> JOIN1.1 -> RS3.1 > OP2 -> RS2.2 -> JOIN1.2 -> RS3.2 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)
[ https://issues.apache.org/jira/browse/HIVE-24474?focusedWorklogId=520180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520180 ] ASF GitHub Bot logged work on HIVE-24474: - Author: ASF GitHub Bot Created on: 04/Dec/20 12:54 Start Date: 04/Dec/20 12:54 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1735: URL: https://github.com/apache/hive/pull/1735#discussion_r536079340 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -561,12 +561,12 @@ protected Boolean findNextCompactionAndExecute(boolean computeStats) throws Inte } catch (Throwable e) { LOG.error("Caught exception while trying to compact " + ci + ". Marking failed to avoid repeated failures", e); -abortCompactionAndMarkFailed(ci, compactorTxnId, e); +compactorTxnId = abortCompactionAndMarkFailed(ci, compactorTxnId, e); } } catch (TException | IOException t) { LOG.error("Caught an exception in the main loop of compactor worker " + workerName, t); try { -abortCompactionAndMarkFailed(ci, compactorTxnId, t); +compactorTxnId = abortCompactionAndMarkFailed(ci, compactorTxnId, t); Review comment: I thought that this way it would be more obvious to future developers that the compactorTxnId needs to be unset every time it's aborted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520180) Time Spent: 0.5h (was: 20m) > Failed compaction always logs TxnAbortedException (again) > - > > Key: HIVE-24474 > URL: https://issues.apache.org/jira/browse/HIVE-24474 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Re-introduced with HIVE-24096. > If there is an error during compaction, the compaction's txn is aborted but > in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws > a TxnAbortedException. > We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is > aborted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=520179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520179 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 04/Dec/20 12:47 Start Date: 04/Dec/20 12:47 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1714: URL: https://github.com/apache/hive/pull/1714 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520179) Time Spent: 7h 40m (was: 7.5h) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 7h 40m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24485) Make the slow-start behavior tunable
[ https://issues.apache.org/jira/browse/HIVE-24485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] okumin reassigned HIVE-24485: - > Make the slow-start behavior tunable > > > Key: HIVE-24485 > URL: https://issues.apache.org/jira/browse/HIVE-24485 > Project: Hive > Issue Type: Improvement > Components: Hive, Tez >Affects Versions: 3.1.2, 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > > This ticket would enable users to configure the timing of slow-start with > `tez.shuffle-vertex-manager.min-src-fraction` and > `tez.shuffle-vertex-manager.max-src-fraction`. > Hive on Tez currently doesn't honor these parameters and ShuffleVertexManager > always uses the default value. > We can control the timing to start vertexes the accuracy of estimated input > size if we can tweak these ones. This is useful when a vertex has tasks that > process a different amount of data. > > We can reproduce the issue with this query. > {code:java} > SET hive.tez.auto.reducer.parallelism=true; > SET hive.tez.min.partition.factor=1.0; -- enforce auto-parallelism > SET tez.shuffle-vertex-manager.min-src-fraction=0.55; > SET tez.shuffle-vertex-manager.max-src-fraction=0.95; > CREATE TABLE mofu (name string); > INSERT INTO mofu (name) VALUES ('12345'); > SELECT name, count(*) FROM mofu GROUP BY name;{code} > The fractions are ignored. > {code:java} > 2020-12-04 11:41:42,484 [INFO] [Dispatcher thread {Central}] > |vertexmanager.ShuffleVertexManagerBase|: Settings minFrac: 0.25 maxFrac: > 0.75 auto: true desiredTaskIput: 25600 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)
[ https://issues.apache.org/jira/browse/HIVE-24474?focusedWorklogId=520152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520152 ] ASF GitHub Bot logged work on HIVE-24474: - Author: ASF GitHub Bot Created on: 04/Dec/20 11:08 Start Date: 04/Dec/20 11:08 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1735: URL: https://github.com/apache/hive/pull/1735#discussion_r536020769 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -561,12 +561,12 @@ protected Boolean findNextCompactionAndExecute(boolean computeStats) throws Inte } catch (Throwable e) { LOG.error("Caught exception while trying to compact " + ci + ". Marking failed to avoid repeated failures", e); -abortCompactionAndMarkFailed(ci, compactorTxnId, e); +compactorTxnId = abortCompactionAndMarkFailed(ci, compactorTxnId, e); } } catch (TException | IOException t) { LOG.error("Caught an exception in the main loop of compactor worker " + workerName, t); try { -abortCompactionAndMarkFailed(ci, compactorTxnId, t); +compactorTxnId = abortCompactionAndMarkFailed(ci, compactorTxnId, t); Review comment: Why not just set it to TXN_ID_NOT_SET after calling abortComapctionAndMarkFailed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520152) Time Spent: 20m (was: 10m) > Failed compaction always logs TxnAbortedException (again) > - > > Key: HIVE-24474 > URL: https://issues.apache.org/jira/browse/HIVE-24474 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Re-introduced with HIVE-24096. > If there is an error during compaction, the compaction's txn is aborted but > in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws > a TxnAbortedException. > We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is > aborted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24475) Generalize fixacidkeyindex utility
[ https://issues.apache.org/jira/browse/HIVE-24475?focusedWorklogId=520150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520150 ] ASF GitHub Bot logged work on HIVE-24475: - Author: ASF GitHub Bot Created on: 04/Dec/20 11:07 Start Date: 04/Dec/20 11:07 Worklog Time Spent: 10m Work Description: pvargacl commented on pull request #1730: URL: https://github.com/apache/hive/pull/1730#issuecomment-738722498 LGTM +1 (non-binding) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520150) Time Spent: 50m (was: 40m) > Generalize fixacidkeyindex utility > -- > > Key: HIVE-24475 > URL: https://issues.apache.org/jira/browse/HIVE-24475 > Project: Hive > Issue Type: Improvement > Components: ORC, Transactions >Affects Versions: 3.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > There is a utility in hive which can validate/fix corrupted > hive.acid.key.index. > hive --service fixacidkeyindex > Unfortunately it is only tailored for a specific problem > (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally > validating and recovering the hive.acid.key.index from the stripe data itself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=520147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520147 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 04/Dec/20 11:00 Start Date: 04/Dec/20 11:00 Worklog Time Spent: 10m Work Description: pvargacl commented on pull request #1710: URL: https://github.com/apache/hive/pull/1710#issuecomment-738719330 LGTM +1 (non-binding) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520147) Time Spent: 1.5h (was: 1h 20m) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage resolved HIVE-2. -- Resolution: Won't Fix > compactor.Cleaner should not set state "mark cleaned" if there are obsolete > files in the FS > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > > This is an improvement on HIVE-24314, in which markCleaned() is called only > if +any+ files are deleted by the cleaner. This could cause a problem in the > following case: > Say for table_1 compaction1 cleaning was blocked by an open txn, and > compaction is run again on the same table (compaction2). Both compaction1 and > compaction2 could be in "ready for cleaning" at the same time. By this time > the blocking open txn could be committed. When the cleaner runs, one of > compaction1 and compaction2 will remain in the "ready for cleaning" state: > Say compaction2 is picked up by the cleaner first. The Cleaner deletes all > obsolete files. Then compaction1 is picked up by the cleaner; the cleaner > doesn't remove any files and compaction1 will stay in the queue in a "ready > for cleaning" state. > HIVE-24291 already solves this issue but if it isn't usable (for example if > HMS schema changes are out the question) then HIVE-24314 + this change will > fix the issue of the Cleaner not removing all obsolete files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS
[ https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=520139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520139 ] ASF GitHub Bot logged work on HIVE-2: - Author: ASF GitHub Bot Created on: 04/Dec/20 10:38 Start Date: 04/Dec/20 10:38 Worklog Time Spent: 10m Work Description: klcopp commented on pull request #1716: URL: https://github.com/apache/hive/pull/1716#issuecomment-738709447 I will close this because HIVE-24403 is making HIVE-23107 etc. backwards compatible so this change will not be needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520139) Time Spent: 7h 20m (was: 7h 10m) > compactor.Cleaner should not set state "mark cleaned" if there are obsolete > files in the FS > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 7h 20m > Remaining Estimate: 0h > > This is an improvement on HIVE-24314, in which markCleaned() is called only > if +any+ files are deleted by the cleaner. This could cause a problem in the > following case: > Say for table_1 compaction1 cleaning was blocked by an open txn, and > compaction is run again on the same table (compaction2). Both compaction1 and > compaction2 could be in "ready for cleaning" at the same time. By this time > the blocking open txn could be committed. When the cleaner runs, one of > compaction1 and compaction2 will remain in the "ready for cleaning" state: > Say compaction2 is picked up by the cleaner first. The Cleaner deletes all > obsolete files. Then compaction1 is picked up by the cleaner; the cleaner > doesn't remove any files and compaction1 will stay in the queue in a "ready > for cleaning" state. > HIVE-24291 already solves this issue but if it isn't usable (for example if > HMS schema changes are out the question) then HIVE-24314 + this change will > fix the issue of the Cleaner not removing all obsolete files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS
[ https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=520140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520140 ] ASF GitHub Bot logged work on HIVE-2: - Author: ASF GitHub Bot Created on: 04/Dec/20 10:38 Start Date: 04/Dec/20 10:38 Worklog Time Spent: 10m Work Description: klcopp closed pull request #1716: URL: https://github.com/apache/hive/pull/1716 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520140) Time Spent: 7.5h (was: 7h 20m) > compactor.Cleaner should not set state "mark cleaned" if there are obsolete > files in the FS > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > > This is an improvement on HIVE-24314, in which markCleaned() is called only > if +any+ files are deleted by the cleaner. This could cause a problem in the > following case: > Say for table_1 compaction1 cleaning was blocked by an open txn, and > compaction is run again on the same table (compaction2). Both compaction1 and > compaction2 could be in "ready for cleaning" at the same time. By this time > the blocking open txn could be committed. When the cleaner runs, one of > compaction1 and compaction2 will remain in the "ready for cleaning" state: > Say compaction2 is picked up by the cleaner first. The Cleaner deletes all > obsolete files. Then compaction1 is picked up by the cleaner; the cleaner > doesn't remove any files and compaction1 will stay in the queue in a "ready > for cleaning" state. > HIVE-24291 already solves this issue but if it isn't usable (for example if > HMS schema changes are out the question) then HIVE-24314 + this change will > fix the issue of the Cleaner not removing all obsolete files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24481) Skipped compaction can cause data corruption with streaming
[ https://issues.apache.org/jira/browse/HIVE-24481?focusedWorklogId=520118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520118 ] ASF GitHub Bot logged work on HIVE-24481: - Author: ASF GitHub Bot Created on: 04/Dec/20 09:40 Start Date: 04/Dec/20 09:40 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1738: URL: https://github.com/apache/hive/pull/1738#discussion_r535965368 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -566,20 +567,20 @@ else if (filename.startsWith(BUCKET_PREFIX)) { public static final class DirectoryImpl implements Directory { private final List abortedDirectories; private final Set abortedWriteIds; +private final boolean uncompactedAborts; private final boolean isBaseInRawFormat; private final List original; private final List obsolete; private final List deltas; private final Path base; private List baseFiles; -public DirectoryImpl(List abortedDirectories, Set abortedWriteIds, -boolean isBaseInRawFormat, List original, -List obsolete, List deltas, Path base) { - this.abortedDirectories = abortedDirectories == null ? - Collections.emptyList() : abortedDirectories; - this.abortedWriteIds = abortedWriteIds == null ? -Collections.emptySet() : abortedWriteIds; +public DirectoryImpl(List abortedDirectories, Set abortedWriteIds, boolean uncompactedAborts, Review comment: It's starting to affect readability, maybe refactor in the following patches. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520118) Time Spent: 40m (was: 0.5h) > Skipped compaction can cause data corruption with streaming > --- > > Key: HIVE-24481 > URL: https://issues.apache.org/jira/browse/HIVE-24481 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: Compaction, pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Timeline: > 1. create a partitioned table, add one static partition > 2. transaction 1 writes delta_1, and aborts > 3. create streaming connection, with batch 3, withStaticPartitionValues with > the existing partition > 4. beginTransaction, write, commitTransaction > 5. beginTransaction, write, abortTransaction > 6. beingTransaction, write, commitTransaction > 7. close connection, count of the table is 2 > 8. run manual minor compaction on the partition. it will skip compaction, > because deltacount =1 but clean, because there is aborted txn1 > 9. cleaner will remove both aborted record from txn_components > 10. wait for acidhousekeeper to remove empty aborted txns > 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga resolved HIVE-24403. Fix Version/s: 4.0.0 Resolution: Fixed > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24481) Skipped compaction can cause data corruption with streaming
[ https://issues.apache.org/jira/browse/HIVE-24481?focusedWorklogId=520102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520102 ] ASF GitHub Bot logged work on HIVE-24481: - Author: ASF GitHub Bot Created on: 04/Dec/20 08:59 Start Date: 04/Dec/20 08:59 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1738: URL: https://github.com/apache/hive/pull/1738#discussion_r535939329 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -1369,14 +1383,14 @@ private static Directory getAcidState(FileSystem fileSystem, Path candidateDirec if (childrenWithId != null) { for (HdfsFileStatusWithId child : childrenWithId) { getChildState(child, writeIdList, working, originalDirectories, original, obsolete, -bestBase, ignoreEmptyFiles, abortedDirectories, abortedWriteIds, fs, validTxnList); +bestBase, ignoreEmptyFiles, abortedDirectories, abortedWriteIds, uncompactedAborts, fs, validTxnList); Review comment: In a follow up Jira it might be worth change this whole AcidUtils approach and start to put everything from the beginning in a DirectoryImpl, so the argument count could be decreased to a sane amount. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520102) Time Spent: 0.5h (was: 20m) > Skipped compaction can cause data corruption with streaming > --- > > Key: HIVE-24481 > URL: https://issues.apache.org/jira/browse/HIVE-24481 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: Compaction, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Timeline: > 1. create a partitioned table, add one static partition > 2. transaction 1 writes delta_1, and aborts > 3. create streaming connection, with batch 3, withStaticPartitionValues with > the existing partition > 4. beginTransaction, write, commitTransaction > 5. beginTransaction, write, abortTransaction > 6. beingTransaction, write, commitTransaction > 7. close connection, count of the table is 2 > 8. run manual minor compaction on the partition. it will skip compaction, > because deltacount =1 but clean, because there is aborted txn1 > 9. cleaner will remove both aborted record from txn_components > 10. wait for acidhousekeeper to remove empty aborted txns > 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=520103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520103 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 04/Dec/20 09:00 Start Date: 04/Dec/20 09:00 Worklog Time Spent: 10m Work Description: deniskuzZ merged pull request #1688: URL: https://github.com/apache/hive/pull/1688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520103) Time Spent: 4h 20m (was: 4h 10m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24481) Skipped compaction can cause data corruption with streaming
[ https://issues.apache.org/jira/browse/HIVE-24481?focusedWorklogId=520097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520097 ] ASF GitHub Bot logged work on HIVE-24481: - Author: ASF GitHub Bot Created on: 04/Dec/20 08:57 Start Date: 04/Dec/20 08:57 Worklog Time Spent: 10m Work Description: pvargacl commented on pull request #1738: URL: https://github.com/apache/hive/pull/1738#issuecomment-738658011 @deniskuzZ @klcopp can I ask for a review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520097) Time Spent: 20m (was: 10m) > Skipped compaction can cause data corruption with streaming > --- > > Key: HIVE-24481 > URL: https://issues.apache.org/jira/browse/HIVE-24481 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: Compaction, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Timeline: > 1. create a partitioned table, add one static partition > 2. transaction 1 writes delta_1, and aborts > 3. create streaming connection, with batch 3, withStaticPartitionValues with > the existing partition > 4. beginTransaction, write, commitTransaction > 5. beginTransaction, write, abortTransaction > 6. beingTransaction, write, commitTransaction > 7. close connection, count of the table is 2 > 8. run manual minor compaction on the partition. it will skip compaction, > because deltacount =1 but clean, because there is aborted txn1 > 9. cleaner will remove both aborted record from txn_components > 10. wait for acidhousekeeper to remove empty aborted txns > 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=520074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520074 ] ASF GitHub Bot logged work on HIVE-24433: - Author: ASF GitHub Bot Created on: 04/Dec/20 08:21 Start Date: 04/Dec/20 08:21 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1712: URL: https://github.com/apache/hive/pull/1712#discussion_r535910850 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -2877,6 +2877,20 @@ private static String normalizeCase(String s) { return s == null ? null : s.toLowerCase(); } + private static String normalizePartitionCase(String s) { Review comment: @nareshpr, LGTM, however could you please try to reuse FileUtils.makePartName(List partCols, List vals): Map map = Splitter.on( "=" ).withKeyValueSeparator( Path.SEPARATOR ).split(lc.getPartitionname()); return FileUtils.makePartName(new ArrayList<>(map.keySet()), new ArrayList<>(map.values())); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520074) Time Spent: 2.5h (was: 2h 20m) > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries > from proper partition values. > When query completes, the entry moves from TXN_COMPONENTS to > COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the > partition & considers it as invalid partition > {code:java} > create table abc(name string) partitioned by(city string) stored as orc > tblproperties('transactional'='true'); > insert into abc partition(city='Bangalore') values('aaa'); > {code} > Example entry in COMPLETED_TXN_COMPONENTS > {noformat} > +---+--++---+-+-+---+ > | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | > CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | > +---+--++---+-+-+---+ > | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 > | 1 | N | > +---+--++---+-+-+---+ > {noformat} > > AutoCompaction fails to get triggered with below error > {code:java} > 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(98)) - Checking to see if we should compact > default.abc.city=bangalore > 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(155)) - Can't find partition > default.compaction_test.city=bangalore, assuming it has been dropped and > moving on{code} > I verifed below 4 SQL's with my PR, those all produced correct > PartitionKeyValue > i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore" > {code:java} > insert into table abc PARTITION(CitY='Bangalore') values('Dan'); > insert overwrite table abc partition(CiTy='Bangalore') select Name from abc; > update table abc set Name='xy' where CiTy='Bangalore'; > delete from abc where CiTy='Bangalore';{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=520070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520070 ] ASF GitHub Bot logged work on HIVE-24433: - Author: ASF GitHub Bot Created on: 04/Dec/20 08:11 Start Date: 04/Dec/20 08:11 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1712: URL: https://github.com/apache/hive/pull/1712#discussion_r535910850 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -2877,6 +2877,20 @@ private static String normalizeCase(String s) { return s == null ? null : s.toLowerCase(); } + private static String normalizePartitionCase(String s) { Review comment: @nareshpr, LGTM, however could you please try to reuse FileUtils.makePartName(List partCols, List vals) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520070) Time Spent: 2h 20m (was: 2h 10m) > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries > from proper partition values. > When query completes, the entry moves from TXN_COMPONENTS to > COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the > partition & considers it as invalid partition > {code:java} > create table abc(name string) partitioned by(city string) stored as orc > tblproperties('transactional'='true'); > insert into abc partition(city='Bangalore') values('aaa'); > {code} > Example entry in COMPLETED_TXN_COMPONENTS > {noformat} > +---+--++---+-+-+---+ > | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | > CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | > +---+--++---+-+-+---+ > | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 > | 1 | N | > +---+--++---+-+-+---+ > {noformat} > > AutoCompaction fails to get triggered with below error > {code:java} > 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(98)) - Checking to see if we should compact > default.abc.city=bangalore > 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(155)) - Can't find partition > default.compaction_test.city=bangalore, assuming it has been dropped and > moving on{code} > I verifed below 4 SQL's with my PR, those all produced correct > PartitionKeyValue > i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore" > {code:java} > insert into table abc PARTITION(CitY='Bangalore') values('Dan'); > insert overwrite table abc partition(CiTy='Bangalore') select Name from abc; > update table abc set Name='xy' where CiTy='Bangalore'; > delete from abc where CiTy='Bangalore';{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=520066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520066 ] ASF GitHub Bot logged work on HIVE-24433: - Author: ASF GitHub Bot Created on: 04/Dec/20 08:06 Start Date: 04/Dec/20 08:06 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1712: URL: https://github.com/apache/hive/pull/1712#discussion_r535906390 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -2877,6 +2877,20 @@ private static String normalizeCase(String s) { return s == null ? null : s.toLowerCase(); } + private static String normalizePartitionCase(String s) { +if (s == null) { + return null; +} else { Review comment: No need for the else clause This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520066) Time Spent: 2h 10m (was: 2h) > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries > from proper partition values. > When query completes, the entry moves from TXN_COMPONENTS to > COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the > partition & considers it as invalid partition > {code:java} > create table abc(name string) partitioned by(city string) stored as orc > tblproperties('transactional'='true'); > insert into abc partition(city='Bangalore') values('aaa'); > {code} > Example entry in COMPLETED_TXN_COMPONENTS > {noformat} > +---+--++---+-+-+---+ > | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | > CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | > +---+--++---+-+-+---+ > | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 > | 1 | N | > +---+--++---+-+-+---+ > {noformat} > > AutoCompaction fails to get triggered with below error > {code:java} > 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(98)) - Checking to see if we should compact > default.abc.city=bangalore > 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(155)) - Can't find partition > default.compaction_test.city=bangalore, assuming it has been dropped and > moving on{code} > I verifed below 4 SQL's with my PR, those all produced correct > PartitionKeyValue > i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore" > {code:java} > insert into table abc PARTITION(CitY='Bangalore') values('Dan'); > insert overwrite table abc partition(CiTy='Bangalore') select Name from abc; > update table abc set Name='xy' where CiTy='Bangalore'; > delete from abc where CiTy='Bangalore';{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)