[jira] [Work logged] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?focusedWorklogId=652538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652538 ] ASF GitHub Bot logged work on HIVE-25535: - Author: ASF GitHub Bot Created on: 18/Sep/21 04:58 Start Date: 18/Sep/21 04:58 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on pull request #2651: URL: https://github.com/apache/hive/pull/2651#issuecomment-922184779 @deniskuzZ @mattmccline-microsoft @sankarh Could you guys please review the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652538) Time Spent: 20m (was: 10m) > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > *Use Case* - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take acquires locks on the metastore artefacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like *FileNotFound* for delta directory > because at time of spark acid compilation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > *Solution* - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417017#comment-17417017 ] Ashish Sharma commented on HIVE-25535: -- [~dkuzmenko] I agree with you. Now compactor is running in a transaction so problem like FileNotFound will not come. This config is more intended to HDP-3.1 and lower version users. Where Lock-based Cleaner is still running. Backporting compactor running in transaction is not straight forwards as it required metastore schema change. > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Use Case* - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take acquires locks on the metastore artefacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like *FileNotFound* for delta directory > because at time of spark acid compilation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > *Solution* - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25343) Create or replace view should clean the old table properties
[ https://issues.apache.org/jira/browse/HIVE-25343?focusedWorklogId=652509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652509 ] ASF GitHub Bot logged work on HIVE-25343: - Author: ASF GitHub Bot Created on: 18/Sep/21 00:09 Start Date: 18/Sep/21 00:09 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2492: URL: https://github.com/apache/hive/pull/2492#issuecomment-922142334 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652509) Time Spent: 0.5h (was: 20m) > Create or replace view should clean the old table properties > > > Key: HIVE-25343 > URL: https://issues.apache.org/jira/browse/HIVE-25343 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2021-07-19 at 15.36.29.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > In many cases, users use Spark and Hive together. When a user creates a view > via Spark, the table output columns will store in table properties, such as > !Screen Shot 2021-07-19 at 15.36.29.png|width=80%! > After that, if the user runs the command "create or replace view" via Hive, > to change the schema. The old table properties added by Spark are not cleaned > by Hive. Then users read the table via Spark. The schema didn't change. It > very confused users. > How to reproduce: > {code} > spark-sql>create table lajin_table (a int, b int) stored as parquet; > spark-sql>create view lajin_view as select * from lajin_table; > spark-sql> desc lajin_view; > a int NULLNULL > b int NULLNULL > hive>desc lajin_view; > a int > b int > hive>create or replace view lajin_view as select a, b, 3 as c from > lajin_table; > hive>desc lajin_view; > a int > b int > c int > spark-sql> desc lajin_view; -- not changed > a int NULLNULL > b int NULLNULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25270) To create external table without schema should use db schema instead of the metastore default fs
[ https://issues.apache.org/jira/browse/HIVE-25270?focusedWorklogId=652508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652508 ] ASF GitHub Bot logged work on HIVE-25270: - Author: ASF GitHub Bot Created on: 18/Sep/21 00:09 Start Date: 18/Sep/21 00:09 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2468: URL: https://github.com/apache/hive/pull/2468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652508) Time Spent: 40m (was: 0.5h) > To create external table without schema should use db schema instead of the > metastore default fs > > > Key: HIVE-25270 > URL: https://issues.apache.org/jira/browse/HIVE-25270 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: shezm >Assignee: shezm >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Hi > when hive creates an external table without specifying the schema of the > location, such as the following sql > {code:java} > CREATE EXTERNAL TABLE `user.test_tbl` ( > id string, > name string > ) > LOCATION '/user/data/test_tbl' > {code} > The default schema will use the default.fs of metastore conf. > But in some cases, there will be multiple hadoop namenodes, such as using > hadoop federation or hadoop rbf. > I think that when creating an external table without specifying a schema, the > schema of db should be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=652507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652507 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 18/Sep/21 00:09 Start Date: 18/Sep/21 00:09 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-922142340 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652507) Time Spent: 20m (was: 10m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 20m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416899#comment-17416899 ] Denys Kuzmenko commented on HIVE-25535: --- Lock-based Cleaner implementation was required when Compaction was not running in a transaction. That's not the case anymore, however, HDP-3.1 is still relying on the locks. > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Use Case* - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take acquires locks on the metastore artefacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like *FileNotFound* for delta directory > because at time of spark acid compilation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > *Solution* - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25532) Missing authorization info for KILL QUERY command
[ https://issues.apache.org/jira/browse/HIVE-25532?focusedWorklogId=652411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652411 ] ASF GitHub Bot logged work on HIVE-25532: - Author: ASF GitHub Bot Created on: 17/Sep/21 17:45 Start Date: 17/Sep/21 17:45 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on a change in pull request #2649: URL: https://github.com/apache/hive/pull/2649#discussion_r711241978 ## File path: service/src/java/org/apache/hive/service/server/KillQueryImpl.java ## @@ -116,6 +119,8 @@ public static void killChildYarnJobs(Configuration conf, String tag, String doAs private static boolean isAdmin() { boolean isAdmin = false; +// RANGER-1851 +HivePrivilegeObject serviceNameObj = new HivePrivilegeObject(HivePrivilegeObject.HivePrivilegeObjectType.SERVICE_NAME, null, "hiveservice"); Review comment: Instead of hard-cording "hiveservice" value, have you thought about making this configurable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652411) Time Spent: 20m (was: 10m) > Missing authorization info for KILL QUERY command > - > > Key: HIVE-25532 > URL: https://issues.apache.org/jira/browse/HIVE-25532 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Abhay >Assignee: Abhay >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We added authorization for Kill Query command some time back with the help of > Ranger. Below is the ticket https://issues.apache.org/jira/browse/RANGER-1851 > However, we have observed that this hasn't been working as expected. The > Ranger service expects Hive to send in a privilege object of the type > SERVICE_NAME but we can see below > > [https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/server/KillQueryImpl.java#L131] > that it is sending an empty array list. > The Ranger service never throws an exception to this and this results in any > user being able to kill any query even though they don't have necessary > permissions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416713#comment-17416713 ] Ashish Sharma commented on HIVE-25535: -- [~dkuzmenko] update the use case in description. > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Use Case* - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take acquires locks on the metastore artefacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like *FileNotFound* for delta directory > because at time of spark acid compilation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > *Solution* - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416713#comment-17416713 ] Ashish Sharma edited comment on HIVE-25535 at 9/17/21, 2:01 PM: [~dkuzmenko] updated use case in description. was (Author: ashish-kumar-sharma): [~dkuzmenko] update the use case in description. > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Use Case* - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take acquires locks on the metastore artefacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like *FileNotFound* for delta directory > because at time of spark acid compilation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > *Solution* - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25378) Enable removal of old builds on hive ci
[ https://issues.apache.org/jira/browse/HIVE-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25378: -- Labels: pull-request-available (was: ) > Enable removal of old builds on hive ci > --- > > Key: HIVE-25378 > URL: https://issues.apache.org/jira/browse/HIVE-25378 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We are using the github plugin to run builds on PRs > However to remove old builds that plugin needs to have periodic branch > scanning enabled - however since we also use the plugins merge mechanism; > this will cause to rediscover all open PRs after there is a new commit on the > target branch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25378) Enable removal of old builds on hive ci
[ https://issues.apache.org/jira/browse/HIVE-25378?focusedWorklogId=652322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652322 ] ASF GitHub Bot logged work on HIVE-25378: - Author: ASF GitHub Bot Created on: 17/Sep/21 14:00 Start Date: 17/Sep/21 14:00 Worklog Time Spent: 10m Work Description: kgyrtkirk opened a new pull request #2652: URL: https://github.com/apache/hive/pull/2652 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652322) Remaining Estimate: 0h Time Spent: 10m > Enable removal of old builds on hive ci > --- > > Key: HIVE-25378 > URL: https://issues.apache.org/jira/browse/HIVE-25378 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We are using the github plugin to run builds on PRs > However to remove old builds that plugin needs to have periodic branch > scanning enabled - however since we also use the plugins merge mechanism; > this will cause to rediscover all open PRs after there is a new commit on the > target branch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25535: - Description: *Use Case* - When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take acquires locks on the metastore artefacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like *FileNotFound* for delta directory because at time of spark acid compilation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. *Solution* - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); was: Use Case - When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take acquires locks on the metastore artefacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like *FileNotFound* for delta directory because at time of spark acid compilation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Use Case* - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take acquires locks on the metastore artefacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like *FileNotFound* for delta directory > because at time of spark acid compilation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > *Solution* - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25535: - Description: Use Case - When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take acquires locks on the metastore artefacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like *FileNotFound* for delta directory because at time of spark acid compilation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); was: Use Case - When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take acquires locks on the metastore artefacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like *FileNotFound *for delta directory because at time of spark acid compilation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Use Case - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take acquires locks on the metastore artefacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like *FileNotFound* for delta directory > because at time of spark acid compilation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > Solution - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25535: - Description: Use Case - When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take acquires locks on the metastore artefacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like *FileNotFound *for delta directory because at time of spark acid compilation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); was: Use Case - When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take aquires locks on the metastore artifacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like FileNotFound for delta directory because at time of spark acid complitation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Use Case - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take acquires locks on the metastore artefacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like *FileNotFound *for delta directory > because at time of spark acid compilation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > Solution - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25535: - Description: Use Case - When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take aquires locks on the metastore artifacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like FileNotFound for delta directory because at time of spark acid complitation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); was: Use Case - When external tool like SPARK_ACID try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take aquires locks on the metastore artifacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like FileNotFound for delta directory because at time of spark acid complitation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Use Case - > When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try > to access hive metastore directly instead of accessing LLAP or hs2 which > lacks the ability of take aquires locks on the metastore artifacts. Due to > which if any spark acid jobs starts and at the same time compaction happens > in hive with leads to exceptions like FileNotFound for delta directory > because at time of spark acid complitation phase delta files are present but > when execution start delta files are deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > Solution - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25535: - Description: Use Case - When external tool like SPARK_ACID try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take aquires locks on the metastore artifacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like FileNotFound for delta directory because at time of spark acid complitation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); was: Use Case - When external tool like SPARK_ACID try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take aquires locks on the metastore artifacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like FileNotFound for delta directory because at time of spark acid complitation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED" which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Use Case - > When external tool like SPARK_ACID try to access hive metastore directly > instead of accessing LLAP or hs2 which lacks the ability of take aquires > locks on the metastore artifacts. Due to which if any spark acid jobs starts > and at the same time compaction happens in hive with leads to exceptions like > FileNotFound for delta directory because at time of spark acid complitation > phase delta files are present but when execution start delta files are > deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have > "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]; > which allow us to delay the deletion of "obsolete directories/files" but it > is applicable to all the table in metastore where this config will provide > table and partition level control. > Solution - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25535: - Description: Use Case - When external tool like SPARK_ACID try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take aquires locks on the metastore artifacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like FileNotFound for delta directory because at time of spark acid complitation phase delta files are present but when execution start delta files are deleted by compactor. Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process. We already have "HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED" which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control. Solution - Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); was: Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Use Case - > When external tool like SPARK_ACID try to access hive metastore directly > instead of accessing LLAP or hs2 which lacks the ability of take aquires > locks on the metastore artifacts. Due to which if any spark acid jobs starts > and at the same time compaction happens in hive with leads to exceptions like > FileNotFound for delta directory because at time of spark acid complitation > phase delta files are present but when execution start delta files are > deleted by compactor. > Inorder to tackle problem like this I am proposing to add a config > "NO_CLEANUP" is table properties and partition properties which provide > higher control on table and partition compaction process. > We already have "HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED" which allow us to > delay the deletion of "obsolete directories/files" but it is applicable to > all the table in metastore where this config will provide table and partition > level control. > Solution - > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25535: - Description: Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); was: Add "NO_CLEANUP" in the table properties enable/disable the table-level cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files. Example - ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add "NO_CLEANUP" in the table properties enable/disable the table-level and > partition cleanup and prevent the cleaner process from automatically cleaning > obsolete directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25535: -- Labels: pull-request-available (was: ) > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add "NO_CLEANUP" in the table properties enable/disable the table-level > cleanup and prevent the cleaner process from automatically cleaning obsolete > directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?focusedWorklogId=652315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652315 ] ASF GitHub Bot logged work on HIVE-25535: - Author: ASF GitHub Bot Created on: 17/Sep/21 13:40 Start Date: 17/Sep/21 13:40 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma opened a new pull request #2651: URL: https://github.com/apache/hive/pull/2651 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652315) Remaining Estimate: 0h Time Spent: 10m > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Add "NO_CLEANUP" in the table properties enable/disable the table-level > cleanup and prevent the cleaner process from automatically cleaning obsolete > directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25535) Control cleaning obsolete directories/files of a table via property
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-25535: --- Summary: Control cleaning obsolete directories/files of a table via property (was: Adding table property "NO_CLEANUP") > Control cleaning obsolete directories/files of a table via property > --- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > > Add "NO_CLEANUP" in the table properties enable/disable the table-level > cleanup and prevent the cleaner process from automatically cleaning obsolete > directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25534) Error when executing DistCp on file system not supporting XAttrs
[ https://issues.apache.org/jira/browse/HIVE-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-25534: --- Summary: Error when executing DistCp on file system not supporting XAttrs (was: Don't preserve FileAttribute.XATTR to initialise distcp.) > Error when executing DistCp on file system not supporting XAttrs > > > Key: HIVE-25534 > URL: https://issues.apache.org/jira/browse/HIVE-25534 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Remove the preserve xattr while calling distcp. > {code:java} > 2021-08-23 10:06:18,485 ERROR org.apache.hadoop.tools.DistCp: > [HiveServer2-Background-Pool: Thread-73]: XAttrs not supported on at least > one file system: > org.apache.hadoop.tools.CopyListing$XAttrsNotSupportedException: XAttrs not > supported for file system: s3a://hmangla1-dev > at > org.apache.hadoop.tools.util.DistCpUtils.checkFileSystemXAttrSupport(DistCpUtils.java:513) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:337) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:304) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:214) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:193) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?]{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25533) Incorrect results when filtering data from UNION ALL sub-query
[ https://issues.apache.org/jira/browse/HIVE-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-25533: --- Description: With CBO enabled querying from a view or CTE with a {{UNION ALL}} clause produces wrong results, such as the following script shows. {code:java} CREATE TABLE n1 (c1 STRING); INSERT OVERWRITE TABLE n1 VALUES('needn'); CREATE VIEW v1 AS SELECT 'maggie' AS c1 FROM n1 UNION ALL SELECT c1 FROM n1; {code} Return the incorrect result when using "=" or "IN" with single element. For example, the following 2 querys return nothing. {code:java} SELECT * FROM v1 WHERE c1 = 'maggie'; SELECT * FROM v1 WHERE c1 IN ('maggie');{code} However, I can get correct result when using "LIKE" or "IN" with multiple element. For example, the following 2 querys return expected result. {code:java} SELECT * FROM v1 WHERE c1 IN ('maggie','This is a bug'); SELECT * FROM v1 WHERE c1 LIKE 'maggie%'; {code} was: When querying form a view or CTE which "union all" 2 tables, such as the following script shows {code:java} CREATE TABLE n1 (c1 STRING); INSERT OVERWRITE TABLE n1 VALUES('needn'); CREATE VIEW v1 AS SELECT 'maggie' AS c1 FROM n1 UNION ALL SELECT c1 FROM n1; {code} Return the incorrect result when using "=" or "IN" with single element. For example, the following 2 querys return nothing. {code:java} SELECT * FROM v1 WHERE c1 = 'maggie'; SELECT * FROM v1 WHERE c1 IN ('maggie');{code} However, I can get correct result when using "LIKE" or "IN" with multiple element. For example, the following 2 querys return expected result. {code:java} SELECT * FROM v1 WHERE c1 IN ('maggie','This is a bug'); SELECT * FROM v1 WHERE c1 LIKE 'maggie%'; {code} > Incorrect results when filtering data from UNION ALL sub-query > -- > > Key: HIVE-25533 > URL: https://issues.apache.org/jira/browse/HIVE-25533 > Project: Hive > Issue Type: Bug > Components: Database/Schema >Affects Versions: 3.1.0 > Environment: Azure HDInsight 4.1.7.5 > Hive 3.1.0 >Reporter: Needn Yu >Priority: Critical > Attachments: hive.png > > > With CBO enabled querying from a view or CTE with a {{UNION ALL}} clause > produces wrong results, such as the following script shows. > {code:java} > CREATE TABLE n1 (c1 STRING); > INSERT OVERWRITE TABLE n1 VALUES('needn'); > CREATE VIEW v1 > AS > SELECT 'maggie' AS c1 FROM n1 > UNION ALL > SELECT c1 FROM n1; > {code} > Return the incorrect result when using "=" or "IN" with single element. > For example, the following 2 querys return nothing. > {code:java} > SELECT * FROM v1 WHERE c1 = 'maggie'; > SELECT * FROM v1 WHERE c1 IN ('maggie');{code} > > However, I can get correct result when using "LIKE" or "IN" with multiple > element. > For example, the following 2 querys return expected result. > {code:java} > SELECT * FROM v1 WHERE c1 IN ('maggie','This is a bug'); > SELECT * FROM v1 WHERE c1 LIKE 'maggie%'; > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25533) Incorrect results when filtering data from UNION ALL sub-query
[ https://issues.apache.org/jira/browse/HIVE-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-25533: --- Summary: Incorrect results when filtering data from UNION ALL sub-query (was: With CBO enabled, Incorrect query result when using where CLAUSE to query data from 2 "UNION ALL" parts) > Incorrect results when filtering data from UNION ALL sub-query > -- > > Key: HIVE-25533 > URL: https://issues.apache.org/jira/browse/HIVE-25533 > Project: Hive > Issue Type: Bug > Components: Database/Schema >Affects Versions: 3.1.0 > Environment: Azure HDInsight 4.1.7.5 > Hive 3.1.0 >Reporter: Needn Yu >Priority: Critical > Attachments: hive.png > > > When querying form a view or CTE which "union all" 2 tables, such as the > following script shows > {code:java} > CREATE TABLE n1 (c1 STRING); > INSERT OVERWRITE TABLE n1 VALUES('needn'); > CREATE VIEW v1 > AS > SELECT 'maggie' AS c1 FROM n1 > UNION ALL > SELECT c1 FROM n1; > {code} > Return the incorrect result when using "=" or "IN" with single element. > For example, the following 2 querys return nothing. > {code:java} > SELECT * FROM v1 WHERE c1 = 'maggie'; > SELECT * FROM v1 WHERE c1 IN ('maggie');{code} > > However, I can get correct result when using "LIKE" or "IN" with multiple > element. > For example, the following 2 querys return expected result. > {code:java} > SELECT * FROM v1 WHERE c1 IN ('maggie','This is a bug'); > SELECT * FROM v1 WHERE c1 LIKE 'maggie%'; > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25532) Missing authorization info for KILL QUERY command
[ https://issues.apache.org/jira/browse/HIVE-25532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-25532: --- Summary: Missing authorization info for KILL QUERY command (was: Fix authorization support for Kill Query Command) > Missing authorization info for KILL QUERY command > - > > Key: HIVE-25532 > URL: https://issues.apache.org/jira/browse/HIVE-25532 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Abhay >Assignee: Abhay >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We added authorization for Kill Query command some time back with the help of > Ranger. Below is the ticket https://issues.apache.org/jira/browse/RANGER-1851 > However, we have observed that this hasn't been working as expected. The > Ranger service expects Hive to send in a privilege object of the type > SERVICE_NAME but we can see below > > [https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/server/KillQueryImpl.java#L131] > that it is sending an empty array list. > The Ranger service never throws an exception to this and this results in any > user being able to kill any query even though they don't have necessary > permissions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25496) hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?
[ https://issues.apache.org/jira/browse/HIVE-25496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416639#comment-17416639 ] Jerome Le Ray commented on HIVE-25496: -- Hello [~belugabehr] Finally, I'm using openJDK8 and everything works fine on both OnPrem and on Azure configuration. Here the link of the OpenJDK8 used : [https://github.com/adoptium/temurin8-binaries/releases/download/jdk8u302-b08/OpenJDK8U-jdk_x64_linux_hotspot_8u302b08.tar.gz] Thank you > hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible? > - > > Key: HIVE-25496 > URL: https://issues.apache.org/jira/browse/HIVE-25496 > Project: Hive > Issue Type: Bug > Environment: Linux VM >Reporter: Jerome Le Ray >Assignee: Jerome Le Ray >Priority: Major > > We used the following configuration > hadoop 3.2.1 > hive 3.1.2 > PostGres 12 > Java - OracleJDK 8 > For internal reasons, we have to migrate to OpenJDK11. > So, I've migrated hadoop 3.2.1 to the new version hadoop 3.3.1 > When I'm starting the hiveserver2 service, I've got the error : > which: no hbase in > (/usr/local/bin:/bin:/usr/pgsql-12/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/jdk-11.0.10+9/bin:/opt/hivemetastore/hadoop-3.3.1/bin:/opt/hivemetastore/apache-hive-3.1.2-bin/b > in) > 2021-09-02 16:48:05: Starting HiveServer2 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/hivemetastore/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/hivemetastore/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 2021-09-02 16:48:06,744 INFO conf.HiveConf: Found configuration file > file:/opt/hivemetastore/apache-hive-3.1.2-bin/conf/hive-site.xml > 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name > hive.metastore.local does not exist > 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name > hive.metastore.thrift.bind.host does not exist > 2021-09-02 16:48:07,170 WARN conf.HiveConf: HiveConf of name > hive.enforce.bucketing does not exist > 2021-09-02 16:48:08,414 INFO server.HiveServer2: STARTUP_MSG: > / > STARTUP_MSG: Starting HiveServer2 > STARTUP_MSG: host = lhroelcspt1001.enterprisenet.org/10.90.122.159 > STARTUP_MSG: args = [-hiveconf, mapred.job.tracker=local, -hiveconf, > fs.default.name=file:///cip-data, -hiveconf, > hive.metastore.warehouse.dir=file:cip-data, --hiveconf, hive.server2.thrif > t.port=1, --hiveconf, hive.root.logger=INFO,console] > STARTUP_MSG: version = 3.1.2 > (...) > STARTUP_MSG: build = git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r > 8190d2be7b7165effa62bd21b7d60ef81fb0e4af; compiled by 'gates' on Thu Aug 22 > 15:01:18 PDT 2019 > / > 2021-09-02 16:48:08,436 INFO server.HiveServer2: Starting HiveServer2 > 2021-09-02 16:48:08,462 WARN conf.HiveConf: HiveConf of name > hive.metastore.local does not exist > 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name > hive.metastore.thrift.bind.host does not exist > 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name > hive.enforce.bucketing does not exist > Hive Session ID = 440449ff-99b7-429c-82d9-e20bdcc9b46f > 2021-09-02 16:48:08,566 INFO SessionState: Hive Session ID = > 440449ff-99b7-429c-82d9-e20bdcc9b46f > 2021-09-02 16:48:08,566 INFO server.HiveServer2: Shutting down HiveServer2 > 2021-09-02 16:48:08,584 INFO server.HiveServer2: Stopping/Disconnecting tez > sessions. > 2021-09-02 16:48:08,585 WARN server.HiveServer2: Error starting HiveServer2 > on attempt 1, will retry in 6ms > java.lang.RuntimeException: Error applying authorization policy on hive > configuration: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot > be cast to class java.net.URLClassLoader (jdk. > internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are > in module java.base of loader 'bootstrap') > at org.apache.hive.service.cli.CLIService.init(CLIService.java:118) > at org.apache.hive.service.CompositeService.init(CompositeService.java:59) > at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:230) > at > org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1036) > at > org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140) > at > org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305) > at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149) > at
[jira] [Work started] (HIVE-25536) Upgrade to Kafka 2.8
[ https://issues.apache.org/jira/browse/HIVE-25536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25536 started by Viktor Somogyi-Vass. -- > Upgrade to Kafka 2.8 > > > Key: HIVE-25536 > URL: https://issues.apache.org/jira/browse/HIVE-25536 > Project: Hive > Issue Type: Improvement > Components: kafka integration >Reporter: Viktor Somogyi-Vass >Assignee: Viktor Somogyi-Vass >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25536) Upgrade to Kafka 2.8
[ https://issues.apache.org/jira/browse/HIVE-25536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viktor Somogyi-Vass reassigned HIVE-25536: -- Assignee: Viktor Somogyi-Vass > Upgrade to Kafka 2.8 > > > Key: HIVE-25536 > URL: https://issues.apache.org/jira/browse/HIVE-25536 > Project: Hive > Issue Type: Improvement > Components: kafka integration >Reporter: Viktor Somogyi-Vass >Assignee: Viktor Somogyi-Vass >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25503) Add cleanup for the duplicate COMPLETED_TXN_COMPONENTS entries
[ https://issues.apache.org/jira/browse/HIVE-25503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko reassigned HIVE-25503: - Assignee: Denys Kuzmenko > Add cleanup for the duplicate COMPLETED_TXN_COMPONENTS entries > -- > > Key: HIVE-25503 > URL: https://issues.apache.org/jira/browse/HIVE-25503 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Performace improvement. Accumulated entries in COMPLETED_TXN_COMPONENTS can > lead to query performance degradation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25503) Add cleanup for the duplicate COMPLETED_TXN_COMPONENTS entries
[ https://issues.apache.org/jira/browse/HIVE-25503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-25503. --- Resolution: Fixed > Add cleanup for the duplicate COMPLETED_TXN_COMPONENTS entries > -- > > Key: HIVE-25503 > URL: https://issues.apache.org/jira/browse/HIVE-25503 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Performace improvement. Accumulated entries in COMPLETED_TXN_COMPONENTS can > lead to query performance degradation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25503) Add cleanup for the duplicate COMPLETED_TXN_COMPONENTS entries
[ https://issues.apache.org/jira/browse/HIVE-25503?focusedWorklogId=652266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652266 ] ASF GitHub Bot logged work on HIVE-25503: - Author: ASF GitHub Bot Created on: 17/Sep/21 11:00 Start Date: 17/Sep/21 11:00 Worklog Time Spent: 10m Work Description: deniskuzZ merged pull request #2612: URL: https://github.com/apache/hive/pull/2612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652266) Time Spent: 50m (was: 40m) > Add cleanup for the duplicate COMPLETED_TXN_COMPONENTS entries > -- > > Key: HIVE-25503 > URL: https://issues.apache.org/jira/browse/HIVE-25503 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Performace improvement. Accumulated entries in COMPLETED_TXN_COMPONENTS can > lead to query performance degradation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25535) Adding table property "NO_CLEANUP"
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416607#comment-17416607 ] Denys Kuzmenko commented on HIVE-25535: --- hi [~ashish-kumar-sharma], could you please elaborate on the use case when this feature would be useful. cc [~klcopp] > Adding table property "NO_CLEANUP" > -- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > > Add "NO_CLEANUP" in the table properties enable/disable the table-level > cleanup and prevent the cleaner process from automatically cleaning obsolete > directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25534) Don't preserve FileAttribute.XATTR to initialise distcp.
[ https://issues.apache.org/jira/browse/HIVE-25534?focusedWorklogId=652251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652251 ] ASF GitHub Bot logged work on HIVE-25534: - Author: ASF GitHub Bot Created on: 17/Sep/21 09:59 Start Date: 17/Sep/21 09:59 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2650: URL: https://github.com/apache/hive/pull/2650#discussion_r710922922 ## File path: shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java ## @@ -1131,7 +1131,7 @@ public void setStoragePolicy(Path path, StoragePolicyValue policy) } } if (needToAddPreserveOption) { - params.add("-pbx"); + params.add("-pb"); //Only Block Size will be preserved. Review comment: Change done. ## File path: shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java ## @@ -1273,8 +1272,6 @@ public boolean runDistCpWithSnapshots(String oldSnapshot, String newSnapshot, Li } } catch (Exception e) { throw new IOException("Cannot execute DistCp process: ", e); -} finally { - conf.setBoolean("mapred.mapper.new-api", false); Review comment: Added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652251) Time Spent: 0.5h (was: 20m) > Don't preserve FileAttribute.XATTR to initialise distcp. > > > Key: HIVE-25534 > URL: https://issues.apache.org/jira/browse/HIVE-25534 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Remove the preserve xattr while calling distcp. > {code:java} > 2021-08-23 10:06:18,485 ERROR org.apache.hadoop.tools.DistCp: > [HiveServer2-Background-Pool: Thread-73]: XAttrs not supported on at least > one file system: > org.apache.hadoop.tools.CopyListing$XAttrsNotSupportedException: XAttrs not > supported for file system: s3a://hmangla1-dev > at > org.apache.hadoop.tools.util.DistCpUtils.checkFileSystemXAttrSupport(DistCpUtils.java:513) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:337) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:304) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:214) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:193) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?]{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation
[ https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=652245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652245 ] ASF GitHub Bot logged work on HIVE-25346: - Author: ASF GitHub Bot Created on: 17/Sep/21 09:54 Start Date: 17/Sep/21 09:54 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2547: URL: https://github.com/apache/hive/pull/2547#discussion_r710919584 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -1446,70 +1446,75 @@ public void commitTxn(CommitTxnRequest rqst) OperationType.UPDATE + "," + OperationType.DELETE + ")"; long tempCommitId = generateTemporaryId(); -if (txnType.get() != TxnType.READ_ONLY -&& !isReplayedReplTxn -&& isUpdateOrDelete(stmt, conflictSQLSuffix)) { - - isUpdateDelete = 'Y'; - //if here it means currently committing txn performed update/delete and we should check WW conflict - /** - * "select distinct" is used below because - * 1. once we get to multi-statement txns, we only care to record that something was updated once - * 2. if {@link #addDynamicPartitions(AddDynamicPartitions)} is retried by caller it may create - * duplicate entries in TXN_COMPONENTS - * but we want to add a PK on WRITE_SET which won't have unique rows w/o this distinct - * even if it includes all of its columns - * - * First insert into write_set using a temporary commitID, which will be updated in a separate call, - * see: {@link #updateWSCommitIdAndCleanUpMetadata(Statement, long, TxnType, Long, long)}}. - * This should decrease the scope of the S4U lock on the next_txn_id table. - */ - Savepoint undoWriteSetForCurrentTxn = dbConn.setSavepoint(); - stmt.executeUpdate("INSERT INTO \"WRITE_SET\" (\"WS_DATABASE\", \"WS_TABLE\", \"WS_PARTITION\", \"WS_TXNID\", \"WS_COMMIT_ID\", \"WS_OPERATION_TYPE\")" + - " SELECT DISTINCT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", \"TC_TXNID\", " + tempCommitId + ", \"TC_OPERATION_TYPE\" " + conflictSQLSuffix); - - /** - * This S4U will mutex with other commitTxn() and openTxns(). - * -1 below makes txn intervals look like [3,3] [4,4] if all txns are serial - * Note: it's possible to have several txns have the same commit id. Suppose 3 txns start - * at the same time and no new txns start until all 3 commit. - * We could've incremented the sequence for commitId as well but it doesn't add anything functionally. - */ - acquireTxnLock(stmt, false); - commitId = getHighWaterMark(stmt); +if (txnType.get() != TxnType.READ_ONLY && !isReplayedReplTxn && txnType.get() != TxnType.COMPACTION) { + if (isUpdateOrDelete(stmt, conflictSQLSuffix)) { +isUpdateDelete = 'Y'; +//if here it means currently committing txn performed update/delete and we should check WW conflict +/** + * "select distinct" is used below because + * 1. once we get to multi-statement txns, we only care to record that something was updated once + * 2. if {@link #addDynamicPartitions(AddDynamicPartitions)} is retried by caller it may create + * duplicate entries in TXN_COMPONENTS + * but we want to add a PK on WRITE_SET which won't have unique rows w/o this distinct + * even if it includes all of its columns + * + * First insert into write_set using a temporary commitID, which will be updated in a separate call, + * see: {@link #updateWSCommitIdAndCleanUpMetadata(Statement, long, TxnType, Long, long)}}. + * This should decrease the scope of the S4U lock on the next_txn_id table. + */ +Savepoint undoWriteSetForCurrentTxn = dbConn.setSavepoint(); +stmt.executeUpdate("INSERT INTO \"WRITE_SET\" (\"WS_DATABASE\", \"WS_TABLE\", \"WS_PARTITION\", \"WS_TXNID\", \"WS_COMMIT_ID\", \"WS_OPERATION_TYPE\")" + +" SELECT DISTINCT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", \"TC_TXNID\", " + tempCommitId + ", \"TC_OPERATION_TYPE\" " + conflictSQLSuffix); - if (!rqst.isExclWriteEnabled()) { /** - * see if there are any overlapping txns that wrote the same element, i.e. have a conflict - * Since entire commit operation is mutexed wrt other start/commit ops, - * committed.ws_commit_id <= current.ws_commit_id for all txns - * thus if committed.ws_commit_id < current.ws_txnid,
[jira] [Updated] (HIVE-23760) Upgrading to Kafka 2.5 Clients
[ https://issues.apache.org/jira/browse/HIVE-23760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage updated HIVE-23760: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Upgrading to Kafka 2.5 Clients > -- > > Key: HIVE-23760 > URL: https://issues.apache.org/jira/browse/HIVE-23760 > Project: Hive > Issue Type: Improvement > Components: kafka integration >Reporter: Andras Katona >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25513) Delta metrics collection may cause NPE
[ https://issues.apache.org/jira/browse/HIVE-25513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage updated HIVE-25513: - Parent: HIVE-24824 Issue Type: Sub-task (was: Bug) > Delta metrics collection may cause NPE > -- > > Key: HIVE-25513 > URL: https://issues.apache.org/jira/browse/HIVE-25513 > Project: Hive > Issue Type: Sub-task >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When collecting metrics about the number of deltas under specific > partitions/tables, information about which partitions/tables are being read > is stored in the Configuration object under key delta.files.metrics.metadata. > This information is retrieved in > DeltaFilesMetricsReporter#mergeDeltaFilesStats when collecting the actual > information about the number of deltas. But if the information was never > stored for some reason, an NPE will be thrown from > DeltaFilesMetricsReporter#mergeDeltaFilesStats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25535) Adding table property "NO_CLEANUP"
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25535 started by Ashish Sharma. > Adding table property "NO_CLEANUP" > -- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > > Add "NO_CLEANUP" in the table properties enable/disable the table-level > cleanup and prevent the cleaner process from automatically cleaning obsolete > directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25535) Adding table property "NO_CLEANUP"
[ https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma reassigned HIVE-25535: > Adding table property "NO_CLEANUP" > -- > > Key: HIVE-25535 > URL: https://issues.apache.org/jira/browse/HIVE-25535 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > > Add "NO_CLEANUP" in the table properties enable/disable the table-level > cleanup and prevent the cleaner process from automatically cleaning obsolete > directories/files. > Example - > ALTER TABLE SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25529) Add tests for reading/writing Iceberg V2 tables with delete files
[ https://issues.apache.org/jira/browse/HIVE-25529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod resolved HIVE-25529. --- Resolution: Fixed > Add tests for reading/writing Iceberg V2 tables with delete files > - > > Key: HIVE-25529 > URL: https://issues.apache.org/jira/browse/HIVE-25529 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Since Iceberg V2 tables are now official, we can start testing out whether V2 > tables can be created/read/written by Hive. While Hive has no delete > statement yet on Iceberg tables, we can nonetheless use the Iceberg API to > create delete files manually and then check if Hive honors those deletes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25529) Add tests for reading/writing Iceberg V2 tables with delete files
[ https://issues.apache.org/jira/browse/HIVE-25529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416516#comment-17416516 ] Marton Bod commented on HIVE-25529: --- Pushed to master, thanks for the review [~pvary]! > Add tests for reading/writing Iceberg V2 tables with delete files > - > > Key: HIVE-25529 > URL: https://issues.apache.org/jira/browse/HIVE-25529 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Since Iceberg V2 tables are now official, we can start testing out whether V2 > tables can be created/read/written by Hive. While Hive has no delete > statement yet on Iceberg tables, we can nonetheless use the Iceberg API to > create delete files manually and then check if Hive honors those deletes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25534) Don't preserve FileAttribute.XATTR to initialise distcp.
[ https://issues.apache.org/jira/browse/HIVE-25534?focusedWorklogId=652122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652122 ] ASF GitHub Bot logged work on HIVE-25534: - Author: ASF GitHub Bot Created on: 17/Sep/21 06:12 Start Date: 17/Sep/21 06:12 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2650: URL: https://github.com/apache/hive/pull/2650#discussion_r710778425 ## File path: shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java ## @@ -1273,8 +1272,6 @@ public boolean runDistCpWithSnapshots(String oldSnapshot, String newSnapshot, Li } } catch (Exception e) { throw new IOException("Cannot execute DistCp process: ", e); -} finally { - conf.setBoolean("mapred.mapper.new-api", false); Review comment: why is this change a part of this JIRA ## File path: shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java ## @@ -1273,8 +1272,6 @@ public boolean runDistCpWithSnapshots(String oldSnapshot, String newSnapshot, Li } } catch (Exception e) { throw new IOException("Cannot execute DistCp process: ", e); -} finally { - conf.setBoolean("mapred.mapper.new-api", false); Review comment: add test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652122) Time Spent: 20m (was: 10m) > Don't preserve FileAttribute.XATTR to initialise distcp. > > > Key: HIVE-25534 > URL: https://issues.apache.org/jira/browse/HIVE-25534 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Remove the preserve xattr while calling distcp. > {code:java} > 2021-08-23 10:06:18,485 ERROR org.apache.hadoop.tools.DistCp: > [HiveServer2-Background-Pool: Thread-73]: XAttrs not supported on at least > one file system: > org.apache.hadoop.tools.CopyListing$XAttrsNotSupportedException: XAttrs not > supported for file system: s3a://hmangla1-dev > at > org.apache.hadoop.tools.util.DistCpUtils.checkFileSystemXAttrSupport(DistCpUtils.java:513) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:337) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:304) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:214) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:193) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?]{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25534) Don't preserve FileAttribute.XATTR to initialise distcp.
[ https://issues.apache.org/jira/browse/HIVE-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25534: -- Labels: pull-request-available (was: ) > Don't preserve FileAttribute.XATTR to initialise distcp. > > > Key: HIVE-25534 > URL: https://issues.apache.org/jira/browse/HIVE-25534 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Remove the preserve xattr while calling distcp. > {code:java} > 2021-08-23 10:06:18,485 ERROR org.apache.hadoop.tools.DistCp: > [HiveServer2-Background-Pool: Thread-73]: XAttrs not supported on at least > one file system: > org.apache.hadoop.tools.CopyListing$XAttrsNotSupportedException: XAttrs not > supported for file system: s3a://hmangla1-dev > at > org.apache.hadoop.tools.util.DistCpUtils.checkFileSystemXAttrSupport(DistCpUtils.java:513) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:337) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:304) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:214) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:193) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?]{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25534) Don't preserve FileAttribute.XATTR to initialise distcp.
[ https://issues.apache.org/jira/browse/HIVE-25534?focusedWorklogId=652121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652121 ] ASF GitHub Bot logged work on HIVE-25534: - Author: ASF GitHub Bot Created on: 17/Sep/21 06:11 Start Date: 17/Sep/21 06:11 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2650: URL: https://github.com/apache/hive/pull/2650#discussion_r710778268 ## File path: shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java ## @@ -1131,7 +1131,7 @@ public void setStoragePolicy(Path path, StoragePolicyValue policy) } } if (needToAddPreserveOption) { - params.add("-pbx"); + params.add("-pb"); //Only Block Size will be preserved. Review comment: We can't remove it in all the cases. for hdfs we should preserve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652121) Remaining Estimate: 0h Time Spent: 10m > Don't preserve FileAttribute.XATTR to initialise distcp. > > > Key: HIVE-25534 > URL: https://issues.apache.org/jira/browse/HIVE-25534 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Remove the preserve xattr while calling distcp. > {code:java} > 2021-08-23 10:06:18,485 ERROR org.apache.hadoop.tools.DistCp: > [HiveServer2-Background-Pool: Thread-73]: XAttrs not supported on at least > one file system: > org.apache.hadoop.tools.CopyListing$XAttrsNotSupportedException: XAttrs not > supported for file system: s3a://hmangla1-dev > at > org.apache.hadoop.tools.util.DistCpUtils.checkFileSystemXAttrSupport(DistCpUtils.java:513) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:337) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:304) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:214) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?] > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:193) > ~[hadoop-distcp-3.1.1.7.1.6.0-297.jar:?]{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)