[jira] [Work logged] (HIVE-25935) Cleanup IMetaStoreClient#getPartitionsByNames APIs
[ https://issues.apache.org/jira/browse/HIVE-25935?focusedWorklogId=737010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-737010 ] ASF GitHub Bot logged work on HIVE-25935: - Author: ASF GitHub Bot Created on: 05/Mar/22 03:34 Start Date: 05/Mar/22 03:34 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #3072: URL: https://github.com/apache/hive/pull/3072#issuecomment-1059674601 Overall looks good to me, +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 737010) Time Spent: 1h 20m (was: 1h 10m) > Cleanup IMetaStoreClient#getPartitionsByNames APIs > -- > > Key: HIVE-25935 > URL: https://issues.apache.org/jira/browse/HIVE-25935 > Project: Hive > Issue Type: Task > Components: Metastore >Reporter: Stamatis Zampetakis >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently the > [IMetastoreClient|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java] > interface has 8 variants of the {{getPartitionsByNames}} method. Going > quickly over the concrete implementation it appears that not all of them are > useful/necessary so a bit of cleanup is needed. > Below a few potential problems I observed: > * Some of the APIs are not used anywhere in the project (neither by > production nor by test code). > * Some of the APIs are deprecated in some concrete implementations but not > globally at the interface level without an explanation why. > * Some of the implementations simply throw without doing anything. > * Many of the APIs are partially tested or not tested at all. > HIVE-24743, HIVE-25281 are related since they introduce/deprecate some of the > aforementioned APIs. > It would be good to review the aforementioned APIs and decide what needs to > stay and what needs to go as well as complete necessary when relevant. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25935) Cleanup IMetaStoreClient#getPartitionsByNames APIs
[ https://issues.apache.org/jira/browse/HIVE-25935?focusedWorklogId=737004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-737004 ] ASF GitHub Bot logged work on HIVE-25935: - Author: ASF GitHub Bot Created on: 05/Mar/22 02:28 Start Date: 05/Mar/22 02:28 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #3072: URL: https://github.com/apache/hive/pull/3072#discussion_r820021805 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -4151,16 +4152,18 @@ public boolean dropPartition(String dbName, String tableName, List parti } if (nParts > nBatches * batchSize) { - String validWriteIdList = null; - Long tableId = null; - if (AcidUtils.isTransactionalTable(tbl)) { -ValidWriteIdList vWriteIdList = getValidWriteIdList(tbl.getDbName(), tbl.getTableName()); -validWriteIdList = vWriteIdList != null ? vWriteIdList.toString() : null; -tableId = tbl.getTTable().getId(); - } +String validWriteIdList = null; Review comment: nit: I wonder if we can use the following: ```java GetPartitionsByNamesRequest req = MetaStoreUtils.convertToGetPartitionsByNamesRequest(tbl.getDbName(), tbl.getTableName(), partNames.subList(nBatches*batchSize, nParts), getColStats, Constants.HIVE_ENGINE, null, null); List tParts = getPartitionsByNames(req, tbl); -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 737004) Time Spent: 1h 10m (was: 1h) > Cleanup IMetaStoreClient#getPartitionsByNames APIs > -- > > Key: HIVE-25935 > URL: https://issues.apache.org/jira/browse/HIVE-25935 > Project: Hive > Issue Type: Task > Components: Metastore >Reporter: Stamatis Zampetakis >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently the > [IMetastoreClient|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java] > interface has 8 variants of the {{getPartitionsByNames}} method. Going > quickly over the concrete implementation it appears that not all of them are > useful/necessary so a bit of cleanup is needed. > Below a few potential problems I observed: > * Some of the APIs are not used anywhere in the project (neither by > production nor by test code). > * Some of the APIs are deprecated in some concrete implementations but not > globally at the interface level without an explanation why. > * Some of the implementations simply throw without doing anything. > * Many of the APIs are partially tested or not tested at all. > HIVE-24743, HIVE-25281 are related since they introduce/deprecate some of the > aforementioned APIs. > It would be good to review the aforementioned APIs and decide what needs to > stay and what needs to go as well as complete necessary when relevant. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25989) CTLT HBaseStorageHandler is dropping underlying HBase table when failed
[ https://issues.apache.org/jira/browse/HIVE-25989?focusedWorklogId=736953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736953 ] ASF GitHub Bot logged work on HIVE-25989: - Author: ASF GitHub Bot Created on: 04/Mar/22 23:00 Start Date: 04/Mar/22 23:00 Worklog Time Spent: 10m Work Description: nareshpr commented on pull request #3076: URL: https://github.com/apache/hive/pull/3076#issuecomment-1059591262 Please update versionmap.txt as thrift is getting updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736953) Time Spent: 20m (was: 10m) > CTLT HBaseStorageHandler is dropping underlying HBase table when failed > --- > > Key: HIVE-25989 > URL: https://issues.apache.org/jira/browse/HIVE-25989 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > With hive.strict.managed.tables & hive.create.as.acid, > Hive-Hbase rollback code is assuming it is a createTable failure instead of > CTLT & removing underlying hbase table while rolling back at here. > [https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseMetaHook.java#L187-L195] > > Repro > > {code:java} > hbase > = > hbase shell > create 'hbase_hive_table', 'cf' > beeline > === > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.strict.managed.tables=true; > set hive.create.as.acid=true; > set hive.create.as.insert.only=true; > set hive.default.fileformat.managed=ORC; > > CREATE EXTERNAL TABLE `hbase_hive_table`( > `key` int COMMENT '', > `value` string COMMENT '') > ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' > STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ( > 'hbase.columns.mapping'=':key,cf:cf') > TBLPROPERTIES ('hbase.table.name'='hbase_hive_table'); > > select * from hbase_hive_table; > +---+-+ > | hbase_hive_table.key | hbase_hive_table.value | > +---+-+ > +---+-+ > > create table new_hbase_hive_table like hbase_hive_table; > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: The table must > be stored using an ACID compliant format (such as ORC): > default.new_hbase_hive_table > > select * from hbase_hive_table; > Error: java.io.IOException: org.apache.hadoop.hbase.TableNotFoundException: > hbase_hive_table > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26006) TopNKey and PTF with more than one column is failing with IOBE
[ https://issues.apache.org/jira/browse/HIVE-26006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-26006: -- Description: {code:java} java.lang.IndexOutOfBoundsException: toIndex = 2 at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014) at java.util.ArrayList.subList(ArrayList.java:1006) at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201) at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162) at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76) at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215){code} was: java.lang.IndexOutOfBoundsException: toIndex = 2 at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014) at java.util.ArrayList.subList(ArrayList.java:1006) at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201) at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162) at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76) at org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215) > TopNKey and PTF with more than one column is failing with IOBE > -- > > Key: HIVE-26006 > URL: https://issues.apache.org/jira/browse/HIVE-26006 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > > {code:java} > java.lang.IndexOutOfBoundsException: toIndex = 2 > at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014) > at java.util.ArrayList.subList(ArrayList.java:1006) > at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201) > at > org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162) > at > org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76) > at >
[jira] [Work logged] (HIVE-26000) DirectSQL to prune partitions fails with postgres backend for Skewed-Partition tables
[ https://issues.apache.org/jira/browse/HIVE-26000?focusedWorklogId=736839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736839 ] ASF GitHub Bot logged work on HIVE-26000: - Author: ASF GitHub Bot Created on: 04/Mar/22 18:27 Start Date: 04/Mar/22 18:27 Worklog Time Spent: 10m Work Description: nareshpr commented on pull request #3073: URL: https://github.com/apache/hive/pull/3073#issuecomment-1059412819 Thanks for the review @zabetak. I verified the fix locally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736839) Time Spent: 40m (was: 0.5h) > DirectSQL to prune partitions fails with postgres backend for > Skewed-Partition tables > - > > Key: HIVE-26000 > URL: https://issues.apache.org/jira/browse/HIVE-26000 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > > > {code:java} > 2022-03-02 20:37:56,421 INFO > org.apache.hadoop.hive.metastore.PartFilterExprUtil: [pool-6-thread-200]: > Unable to make the expression tree from expression string [((ds = > '2008-04-08') and (UDFToDouble(hr) = 11.0D))]Error parsing partition filter; > lexer error: null; exception NoViableAltException(24@[]) > 2022-03-02 20:37:56,593 WARN org.apache.hadoop.hive.metastore.ObjectStore: > [pool-6-thread-200]: Falling back to ORM path due to direct SQL failure (this > is not an error): Error executing SQL query "select > "SKEWED_COL_VALUE_LOC_MAP"."SD_ID", > "SKEWED_STRING_LIST_VALUES".STRING_LIST_ID, > "SKEWED_COL_VALUE_LOC_MAP"."LOCATION", > "SKEWED_STRING_LIST_VALUES"."STRING_LIST_VALUE" from > "SKEWED_COL_VALUE_LOC_MAP" left outer join "SKEWED_STRING_LIST_VALUES" on > "SKEWED_COL_VALUE_LOC_MAP"."STRING_LIST_ID_KID" = > "SKEWED_STRING_LIST_VALUES"."STRING_LIST_ID" where > "SKEWED_COL_VALUE_LOC_MAP"."SD_ID" in (51010) and > "SKEWED_COL_VALUE_LOC_MAP"."STRING_LIST_ID_KID" is not null order by > "SKEWED_COL_VALUE_LOC_MAP"."SD_ID" asc, > "SKEWED_STRING_LIST_VALUES"."STRING_LIST_ID" asc, > "SKEWED_STRING_LIST_VALUES"."INTEGER_IDX" asc". at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543) > at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) at > org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216) at > org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.loopJoinOrderedResult(MetastoreDirectSqlUtils.java:131) > at > org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.loopJoinOrderedResult(MetastoreDirectSqlUtils.java:109) > at > org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.setSkewedColLocationMaps(MetastoreDirectSqlUtils.java:414) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:967) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:788) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:117) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:530) > at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:521) > at > org.apache.hadoop.hive.metastore.ObjectStore$10.getSqlResult(ObjectStore.java:3722); > Caused by: ERROR: column SKEWED_STRING_LIST_VALUES.string_list_id does not > exist > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.
[ https://issues.apache.org/jira/browse/HIVE-25988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala resolved HIVE-25988. -- Resolution: Fixed Fix committed to master. Closing the jira. > CreateTableEvent should have database object as one of the hive privilege > object. > - > > Key: HIVE-25988 > URL: https://issues.apache.org/jira/browse/HIVE-25988 > Project: Hive > Issue Type: Bug > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The CreateTableEvent in HMS should have a database object as one of the > HivePrivilege Objects so that it is consistent with HS2's CreateTable Event. > Also, we need to move the DFS_URI object into the InputList so that this is > also consistent with HS2's behavior. > Having database objects in the create table events hive privilege objects > helps to determine if a user has the right permissions to create a table in a > particular database via ranger/sentry. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.
[ https://issues.apache.org/jira/browse/HIVE-25988?focusedWorklogId=736798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736798 ] ASF GitHub Bot logged work on HIVE-25988: - Author: ASF GitHub Bot Created on: 04/Mar/22 17:06 Start Date: 04/Mar/22 17:06 Worklog Time Spent: 10m Work Description: saihemanth-cloudera closed pull request #3057: URL: https://github.com/apache/hive/pull/3057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736798) Time Spent: 40m (was: 0.5h) > CreateTableEvent should have database object as one of the hive privilege > object. > - > > Key: HIVE-25988 > URL: https://issues.apache.org/jira/browse/HIVE-25988 > Project: Hive > Issue Type: Bug > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The CreateTableEvent in HMS should have a database object as one of the > HivePrivilege Objects so that it is consistent with HS2's CreateTable Event. > Also, we need to move the DFS_URI object into the InputList so that this is > also consistent with HS2's behavior. > Having database objects in the create table events hive privilege objects > helps to determine if a user has the right permissions to create a table in a > particular database via ranger/sentry. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.
[ https://issues.apache.org/jira/browse/HIVE-25988?focusedWorklogId=736793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736793 ] ASF GitHub Bot logged work on HIVE-25988: - Author: ASF GitHub Bot Created on: 04/Mar/22 17:02 Start Date: 04/Mar/22 17:02 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #3057: URL: https://github.com/apache/hive/pull/3057#issuecomment-1059342769 Fix has been merged to master. Please close the PR. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736793) Time Spent: 0.5h (was: 20m) > CreateTableEvent should have database object as one of the hive privilege > object. > - > > Key: HIVE-25988 > URL: https://issues.apache.org/jira/browse/HIVE-25988 > Project: Hive > Issue Type: Bug > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The CreateTableEvent in HMS should have a database object as one of the > HivePrivilege Objects so that it is consistent with HS2's CreateTable Event. > Also, we need to move the DFS_URI object into the InputList so that this is > also consistent with HS2's behavior. > Having database objects in the create table events hive privilege objects > helps to determine if a user has the right permissions to create a table in a > particular database via ranger/sentry. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25943) Introduce compaction cleaner failed attempts threshold
[ https://issues.apache.org/jira/browse/HIVE-25943?focusedWorklogId=736761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736761 ] ASF GitHub Bot logged work on HIVE-25943: - Author: ASF GitHub Bot Created on: 04/Mar/22 16:25 Start Date: 04/Mar/22 16:25 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #3034: URL: https://github.com/apache/hive/pull/3034#discussion_r819678060 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -288,14 +285,30 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, boolean metricsEnabled if (metricsEnabled) { Metrics.getOrCreateCounter(MetricsConstants.COMPACTION_CLEANER_FAILURE_COUNTER).inc(); } - txnHandler.markFailed(ci); -} finally { + handleCleanerAttemptFailure(ci); +} finally { if (metricsEnabled) { perfLogger.perfLogEnd(CLASS_NAME, cleanerMetric); } } } + private void handleCleanerAttemptFailure(CompactionInfo ci) throws MetaException { +long defaultRetention = getTimeVar(conf, HIVE_COMPACTOR_CLEANER_RETRY_RETENTION_TIME, TimeUnit.MILLISECONDS); +int cleanAttempts = 0; Review comment: Shouldn't cleanAttempts be initialized to 1? Because HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS >= 1 because of its RangeValidator? (Speaking of, it might be good to increase the range to (0, 10) as a kind of feature flag, but I'll leave that up to you) ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -288,14 +285,30 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, boolean metricsEnabled if (metricsEnabled) { Metrics.getOrCreateCounter(MetricsConstants.COMPACTION_CLEANER_FAILURE_COUNTER).inc(); } - txnHandler.markFailed(ci); -} finally { + handleCleanerAttemptFailure(ci); +} finally { if (metricsEnabled) { perfLogger.perfLogEnd(CLASS_NAME, cleanerMetric); } } } + private void handleCleanerAttemptFailure(CompactionInfo ci) throws MetaException { +long defaultRetention = getTimeVar(conf, HIVE_COMPACTOR_CLEANER_RETRY_RETENTION_TIME, TimeUnit.MILLISECONDS); +int cleanAttempts = 0; +if (ci.retryRetention > 0) { + cleanAttempts = (int)(Math.log(ci.retryRetention / defaultRetention) / Math.log(2)) + 1; +} +if (cleanAttempts >= getIntVar(conf, HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS)) { + //Mark it as failed if the max attempt threshold is reached. + txnHandler.markFailed(ci); +} else { + //Calculate retry retention time and update record. + ci.retryRetention = (long)Math.pow(2, cleanAttempts) * defaultRetention; Review comment: So assuming HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS==5m, we try at CQ_COMMIT_TIME + 5m then CQ_COMMIT_TIME + 5^2 minutes then CQ_COMMIT_TIME + 5^3 minutes? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -353,11 +353,9 @@ public void markCompacted(CompactionInfo info) throws MetaException { if (minOpenTxnWaterMark > 0) { whereClause += " AND (\"CQ_NEXT_TXN_ID\" <= " + minOpenTxnWaterMark + " OR \"CQ_NEXT_TXN_ID\" IS NULL)"; } -if (retentionTime > 0) { - whereClause += " AND \"CQ_COMMIT_TIME\" < (" + getEpochFn(dbProduct) + " - " + retentionTime + ")"; -} +whereClause += " AND (\"CQ_COMMIT_TIME\" < (" + getEpochFn(dbProduct) + " - CQ_RETRY_RETENTION - " + retentionTime + ") OR \"CQ_COMMIT_TIME\" IS NULL)"; Review comment: It would probably be best to fix the `if (retentionTime > 0)` removal in a separate ticket since it fixes an unrelated bug ## File path: standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql ## @@ -629,7 +629,8 @@ CREATE TABLE COMPACTION_QUEUE ( CQ_INITIATOR_ID varchar(128), CQ_INITIATOR_VERSION varchar(128), CQ_WORKER_VERSION varchar(128), - CQ_CLEANER_START bigint + CQ_CLEANER_START bigint, + CQ_RETRY_RETENTION integer NOT NULL DEFAULT 0 Review comment: I'm a little iffy about declaring this as an integer vs. a bigint, since we store milliseconds and this value could be 2^10 * hive.compactor.cleaner.retry.retentionTime which has no upper limit ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -288,14 +285,30 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, boolean metricsEnabled if (metricsEnabled) { Metrics.getOrCreateCounter(MetricsConstants.COMPACTION_CLEANER_FAILURE_COUNTER).inc(); } - txnHandler.markFailed(ci); -} finally { + handleCleanerAttemptFailure(ci); +} finally
[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean
[ https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736692 ] ASF GitHub Bot logged work on HIVE-25645: - Author: ASF GitHub Bot Created on: 04/Mar/22 14:36 Start Date: 04/Mar/22 14:36 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3079: URL: https://github.com/apache/hive/pull/3079#discussion_r819622903 ## File path: ql/src/test/org/apache/hadoop/hive/metastore/TestMetastoreExpr.java ## @@ -183,12 +183,12 @@ public void checkExpr(int numParts, String dbName, String tblName, ExprNodeGenericFuncDesc expr, Table t) throws Exception { List parts = new ArrayList(); client.listPartitionsByExpr(dbName, tblName, -SerializationUtilities.serializeExpressionToKryo(expr), null, (short)-1, parts); +SerializationUtilities.serializeObjectWithTypeInformation(expr), null, (short)-1, parts); assertEquals("Partition check failed: " + expr.getExprString(), numParts, parts.size()); // check with partition spec as well PartitionsByExprRequest req = new PartitionsByExprRequest(dbName, tblName, - ByteBuffer.wrap(SerializationUtilities.serializeExpressionToKryo(expr))); + ByteBuffer.wrap(SerializationUtilities.serializeObjectWithTypeInformation(expr))); Review comment: nit: funny that the indentation is different than the above one. If we touch the line fix the indentation please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736692) Time Spent: 1h 10m (was: 1h) > Query-based compaction doesn't work when partition column type is boolean > - > > Key: HIVE-25645 > URL: https://issues.apache.org/jira/browse/HIVE-25645 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean
[ https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736690 ] ASF GitHub Bot logged work on HIVE-25645: - Author: ASF GitHub Bot Created on: 04/Mar/22 14:36 Start Date: 04/Mar/22 14:36 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3079: URL: https://github.com/apache/hive/pull/3079#discussion_r819622269 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilder.java ## @@ -347,8 +347,16 @@ private void buildWhereClauseForInsert(StringBuilder query) { query.append(" where "); for (int i = 0; i < keys.size(); ++i) { -query.append(i == 0 ? "`" : " and `").append(keys.get(i).getName()).append("`='") -.append(vals.get(i)).append("'"); +FieldSchema keySchema = keys.get(i); +boolean isBooleanKey = keySchema.getType().equalsIgnoreCase("boolean"); +query.append(i == 0 ? "`" : " and `").append(keySchema.getName()).append("`="); +if (!isBooleanKey) { Review comment: nit: Maybe easier to read: ``` if (isBooleanKey) { query.append("'").append(vals.get(i)).append("'"); } else { query.append(vals.get(i)); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736690) Time Spent: 1h (was: 50m) > Query-based compaction doesn't work when partition column type is boolean > - > > Key: HIVE-25645 > URL: https://issues.apache.org/jira/browse/HIVE-25645 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean
[ https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736683 ] ASF GitHub Bot logged work on HIVE-25645: - Author: ASF GitHub Bot Created on: 04/Mar/22 14:33 Start Date: 04/Mar/22 14:33 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3079: URL: https://github.com/apache/hive/pull/3079#discussion_r819620506 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilder.java ## @@ -347,8 +347,16 @@ private void buildWhereClauseForInsert(StringBuilder query) { query.append(" where "); for (int i = 0; i < keys.size(); ++i) { -query.append(i == 0 ? "`" : " and `").append(keys.get(i).getName()).append("`='") -.append(vals.get(i)).append("'"); +FieldSchema keySchema = keys.get(i); +boolean isBooleanKey = keySchema.getType().equalsIgnoreCase("boolean"); Review comment: ColumnType.BOOLEAN_TYPE_NAME -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736683) Time Spent: 50m (was: 40m) > Query-based compaction doesn't work when partition column type is boolean > - > > Key: HIVE-25645 > URL: https://issues.apache.org/jira/browse/HIVE-25645 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean
[ https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736682 ] ASF GitHub Bot logged work on HIVE-25645: - Author: ASF GitHub Bot Created on: 04/Mar/22 14:32 Start Date: 04/Mar/22 14:32 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3079: URL: https://github.com/apache/hive/pull/3079#discussion_r819619418 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java ## @@ -498,7 +498,7 @@ private static PrunedPartitionList getPartitionsFromServer(Table tab, final Stri * @return true iff the partition pruning expression contains non-partition columns. */ static private boolean pruneBySequentialScan(Table tab, List partitions, - ExprNodeGenericFuncDesc prunerExpr, HiveConf conf) throws HiveException, MetaException { + ExprNodeDesc prunerExpr, HiveConf conf) throws HiveException, MetaException { Review comment: nit: line continuation is 4 spaces -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736682) Time Spent: 40m (was: 0.5h) > Query-based compaction doesn't work when partition column type is boolean > - > > Key: HIVE-25645 > URL: https://issues.apache.org/jira/browse/HIVE-25645 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean
[ https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736679 ] ASF GitHub Bot logged work on HIVE-25645: - Author: ASF GitHub Bot Created on: 04/Mar/22 14:31 Start Date: 04/Mar/22 14:31 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3079: URL: https://github.com/apache/hive/pull/3079#discussion_r819618344 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java ## @@ -810,31 +811,44 @@ private static void serializeObjectByKryo(Kryo kryo, Object plan, OutputStream o } /** - * Serializes expression via Kryo. - * @param expr Expression. + * Serializes any object via Kryo. Type information will be serialized as well, allowing dynamic deserialization + * without the need to pass the class. + * @param object The object to serialize. * @return Bytes. */ - public static byte[] serializeExpressionToKryo(ExprNodeGenericFuncDesc expr) { -return serializeObjectToKryo(expr); + public static byte[] serializeObjectWithTypeInformation(Serializable object) { +ByteArrayOutputStream baos = new ByteArrayOutputStream(); +Kryo kryo = borrowKryo(); +try (Output output = new Output(baos)) { + kryo.writeClassAndObject(output, object); +} finally { + releaseKryo(kryo); +} +return baos.toByteArray(); } /** * Deserializes expression from Kryo. * @param bytes Bytes containing the expression. * @return Expression; null if deserialization succeeded, but the result type is incorrect. */ - public static ExprNodeGenericFuncDesc deserializeExpressionFromKryo(byte[] bytes) { -return deserializeObjectFromKryo(bytes, ExprNodeGenericFuncDesc.class); + public static T deserializeObjectWithTypeInformation(byte[] bytes) { +Kryo kryo = borrowKryo(); +try (Input inp = new Input(new ByteArrayInputStream(bytes))) { + return (T) kryo.readClassAndObject(inp); +} finally { + releaseKryo(kryo); +} } public static String serializeExpression(ExprNodeGenericFuncDesc expr) { -return new String(Base64.encodeBase64(serializeExpressionToKryo(expr)), -StandardCharsets.UTF_8); +return new String(Base64.encodeBase64(serializeObjectToKryo(expr)), +StandardCharsets.UTF_8); Review comment: nit: keep 4 spaces -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736679) Time Spent: 20m (was: 10m) > Query-based compaction doesn't work when partition column type is boolean > - > > Key: HIVE-25645 > URL: https://issues.apache.org/jira/browse/HIVE-25645 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean
[ https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736680 ] ASF GitHub Bot logged work on HIVE-25645: - Author: ASF GitHub Bot Created on: 04/Mar/22 14:31 Start Date: 04/Mar/22 14:31 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3079: URL: https://github.com/apache/hive/pull/3079#discussion_r819618843 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java ## @@ -865,13 +879,13 @@ public static ExprNodeGenericFuncDesc deserializeExpression(String s) { public static String serializeObject(Serializable expr) { return new String(Base64.encodeBase64(serializeObjectToKryo(expr)), -StandardCharsets.UTF_8); +StandardCharsets.UTF_8); Review comment: nit: remove formatting only changes. They kill you during backports -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736680) Time Spent: 0.5h (was: 20m) > Query-based compaction doesn't work when partition column type is boolean > - > > Key: HIVE-25645 > URL: https://issues.apache.org/jira/browse/HIVE-25645 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25894) Table migration to Iceberg doesn't remove HMS partitions
[ https://issues.apache.org/jira/browse/HIVE-25894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-25894. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for reporting [~boroknagyz] and for the review [~Marton Bod] and [~lpinter]! > Table migration to Iceberg doesn't remove HMS partitions > > > Key: HIVE-25894 > URL: https://issues.apache.org/jira/browse/HIVE-25894 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Repro: > {code:java} > create table ice_part_migrate (i int) partitioned by (p int) stored as > parquet; > insert into ice_part_migrate partition(p=1) values (1), (11), (111); > insert into ice_part_migrate partition(p=2) values (2), (22), (222); > ALTER TABLE ice_part_migrate SET TBLPROPERTIES > ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'); > {code} > Then looking at the HMS database: > {code:java} > => select "PART_NAME" from "PARTITIONS" p, "TBLS" t where > t."TBL_ID"=p."TBL_ID" and t."TBL_NAME"='ice_part_migrate'; > PART_NAME > --- > p=1 > p=2 > {code} > This is weird because Iceberg tables are supposed to be unpartitioned. It > also breaks some precondition checks in Impala. Is there a particular reason > to keep the partitions in HMS? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25894) Table migration to Iceberg doesn't remove HMS partitions
[ https://issues.apache.org/jira/browse/HIVE-25894?focusedWorklogId=736649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736649 ] ASF GitHub Bot logged work on HIVE-25894: - Author: ASF GitHub Bot Created on: 04/Mar/22 13:37 Start Date: 04/Mar/22 13:37 Worklog Time Spent: 10m Work Description: pvary merged pull request #3061: URL: https://github.com/apache/hive/pull/3061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736649) Time Spent: 40m (was: 0.5h) > Table migration to Iceberg doesn't remove HMS partitions > > > Key: HIVE-25894 > URL: https://issues.apache.org/jira/browse/HIVE-25894 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Repro: > {code:java} > create table ice_part_migrate (i int) partitioned by (p int) stored as > parquet; > insert into ice_part_migrate partition(p=1) values (1), (11), (111); > insert into ice_part_migrate partition(p=2) values (2), (22), (222); > ALTER TABLE ice_part_migrate SET TBLPROPERTIES > ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'); > {code} > Then looking at the HMS database: > {code:java} > => select "PART_NAME" from "PARTITIONS" p, "TBLS" t where > t."TBL_ID"=p."TBL_ID" and t."TBL_NAME"='ice_part_migrate'; > PART_NAME > --- > p=1 > p=2 > {code} > This is weird because Iceberg tables are supposed to be unpartitioned. It > also breaks some precondition checks in Impala. Is there a particular reason > to keep the partitions in HMS? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita updated HIVE-25975: -- Description: The first version of the ClusteredWriter in Hive-Iceberg will be lenient for bucketed tables: i.e. the records do not need to be ordered by the bucket values, the writer will just close its current file and open a new one for out-of-order records. This is suboptimal for the long-term due to creating many small files. Spark uses a UDF to compute the bucket value for each record and therefore it is able to order the records by bucket values, achieving optimal clustering. The proposed change adds a new UDF that uses Iceberg's bucket transformation function to produce bucket values from constants or any column input. All types that Iceberg buckets support are supported in this UDF too, except for UUID. This UDF is then used in SortedDynPartitionOptimizer to sort data during write if the target Iceberg target has bucket transform partitioning. To enable this, Hive has been extended with the feature that allows storage handlers to define custom sorting expressions, to be passed to FileSink operator's DynPartContext during dynamic partitioning write scenarios. The lenient version of ClusteredWriter in patched-iceberg-core has been disposed of as it is not needed anymore with this feature in. was: The first version of the ClusteredWriter in Hive-Iceberg will be lenient for bucketed tables: i.e. the records do not need to be ordered by the bucket values, the writer will just close its current file and open a new one for out-of-order records. This is suboptimal for the long-term due to creating many small files. Spark uses a UDF to compute the bucket value for each record and therefore it is able to order the records by bucket values, achieving optimal clustering. > Optimize ClusteredWriter for bucketed Iceberg tables > > > Key: HIVE-25975 > URL: https://issues.apache.org/jira/browse/HIVE-25975 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > > The first version of the ClusteredWriter in Hive-Iceberg will be lenient for > bucketed tables: i.e. the records do not need to be ordered by the bucket > values, the writer will just close its current file and open a new one for > out-of-order records. > This is suboptimal for the long-term due to creating many small files. Spark > uses a UDF to compute the bucket value for each record and therefore it is > able to order the records by bucket values, achieving optimal clustering. > The proposed change adds a new UDF that uses Iceberg's bucket transformation > function to produce bucket values from constants or any column input. All > types that Iceberg buckets support are supported in this UDF too, except for > UUID. > This UDF is then used in SortedDynPartitionOptimizer to sort data during > write if the target Iceberg target has bucket transform partitioning. > To enable this, Hive has been extended with the feature that allows storage > handlers to define custom sorting expressions, to be passed to FileSink > operator's DynPartContext during dynamic partitioning write scenarios. > The lenient version of ClusteredWriter in patched-iceberg-core has been > disposed of as it is not needed anymore with this feature in. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita resolved HIVE-25975. --- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master. Thanks for the thorough reviews from [~pvary] and [~Marton Bod] > Optimize ClusteredWriter for bucketed Iceberg tables > > > Key: HIVE-25975 > URL: https://issues.apache.org/jira/browse/HIVE-25975 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > The first version of the ClusteredWriter in Hive-Iceberg will be lenient for > bucketed tables: i.e. the records do not need to be ordered by the bucket > values, the writer will just close its current file and open a new one for > out-of-order records. > This is suboptimal for the long-term due to creating many small files. Spark > uses a UDF to compute the bucket value for each record and therefore it is > able to order the records by bucket values, achieving optimal clustering. > The proposed change adds a new UDF that uses Iceberg's bucket transformation > function to produce bucket values from constants or any column input. All > types that Iceberg buckets support are supported in this UDF too, except for > UUID. > This UDF is then used in SortedDynPartitionOptimizer to sort data during > write if the target Iceberg target has bucket transform partitioning. > To enable this, Hive has been extended with the feature that allows storage > handlers to define custom sorting expressions, to be passed to FileSink > operator's DynPartContext during dynamic partitioning write scenarios. > The lenient version of ClusteredWriter in patched-iceberg-core has been > disposed of as it is not needed anymore with this feature in. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime
[ https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gabrywu resolved HIVE-16352. Resolution: Won't Fix > Ability to skip or repair out of sync blocks with HIVE at runtime > - > > Key: HIVE-16352 > URL: https://issues.apache.org/jira/browse/HIVE-16352 > Project: Hive > Issue Type: New Feature > Components: Avro, File Formats, Reader >Affects Versions: 3.1.2 >Reporter: Navdeep Poonia >Assignee: gabrywu >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > When a file is corrupted it raises the error java.io.IOException: Invalid > sync! with hive. > Can we have some functionality to skip or repair such blocks at runtime to > make avro more error resilient in case of data corruption. > Error: java.io.IOException: java.io.IOException: java.io.IOException: While > processing file > s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42. > java.io.IOException: Invalid sync! > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=736591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736591 ] ASF GitHub Bot logged work on HIVE-25975: - Author: ASF GitHub Bot Created on: 04/Mar/22 11:27 Start Date: 04/Mar/22 11:27 Worklog Time Spent: 10m Work Description: szlta commented on a change in pull request #3060: URL: https://github.com/apache/hive/pull/3060#discussion_r819490947 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java ## @@ -307,6 +331,42 @@ public boolean supportsPartitionTransform() { }).collect(Collectors.toList()); } + @Override + public DynamicPartitionCtx createDPContext(HiveConf hiveConf, org.apache.hadoop.hive.ql.metadata.Table hmsTable) + throws SemanticException { +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = IcebergTableUtil.getTable(conf, tableDesc.getProperties()); +if (table.spec().isUnpartitioned()) { + return null; +} + +// Iceberg currently doesn't have publicly accessible partition transform information, hence use above string parse +List partitionTransformSpecs = getPartitionTransformSpec(hmsTable); Review comment: As discussed offline this won't be addressed with this patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736591) Time Spent: 6h 20m (was: 6h 10m) > Optimize ClusteredWriter for bucketed Iceberg tables > > > Key: HIVE-25975 > URL: https://issues.apache.org/jira/browse/HIVE-25975 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 6h 20m > Remaining Estimate: 0h > > The first version of the ClusteredWriter in Hive-Iceberg will be lenient for > bucketed tables: i.e. the records do not need to be ordered by the bucket > values, the writer will just close its current file and open a new one for > out-of-order records. > This is suboptimal for the long-term due to creating many small files. Spark > uses a UDF to compute the bucket value for each record and therefore it is > able to order the records by bucket values, achieving optimal clustering. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=736593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736593 ] ASF GitHub Bot logged work on HIVE-25975: - Author: ASF GitHub Bot Created on: 04/Mar/22 11:27 Start Date: 04/Mar/22 11:27 Worklog Time Spent: 10m Work Description: szlta merged pull request #3060: URL: https://github.com/apache/hive/pull/3060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736593) Time Spent: 6.5h (was: 6h 20m) > Optimize ClusteredWriter for bucketed Iceberg tables > > > Key: HIVE-25975 > URL: https://issues.apache.org/jira/browse/HIVE-25975 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > > The first version of the ClusteredWriter in Hive-Iceberg will be lenient for > bucketed tables: i.e. the records do not need to be ordered by the bucket > values, the writer will just close its current file and open a new one for > out-of-order records. > This is suboptimal for the long-term due to creating many small files. Spark > uses a UDF to compute the bucket value for each record and therefore it is > able to order the records by bucket values, achieving optimal clustering. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25971) Tez task shutdown getting delayed due to cached thread pool not closed
[ https://issues.apache.org/jira/browse/HIVE-25971?focusedWorklogId=736569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736569 ] ASF GitHub Bot logged work on HIVE-25971: - Author: ASF GitHub Bot Created on: 04/Mar/22 10:24 Start Date: 04/Mar/22 10:24 Worklog Time Spent: 10m Work Description: guptashailesh92 edited a comment on pull request #3046: URL: https://github.com/apache/hive/pull/3046#issuecomment-1059033143 @rbalamohan , added same patch for master as well. [CR](https://github.com/apache/hive/pull/3078). Here in jenkins link, it shows no new failures and in master all tests are successful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736569) Time Spent: 1h 40m (was: 1.5h) > Tez task shutdown getting delayed due to cached thread pool not closed > -- > > Key: HIVE-25971 > URL: https://issues.apache.org/jira/browse/HIVE-25971 > Project: Hive > Issue Type: Improvement > Components: Tez >Affects Versions: 2.4.0, 3.1.2 >Reporter: Shailesh Gupta >Assignee: Shailesh Gupta >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > We are using > a[CachedThreadPool|https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ObjectCache.java] > but not closing it. CachedThreadPool creates non daemon threads, causing the > Tez Task JVM shutdown delayed upto 1 min, as default idle timeout is 1 min. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25971) Tez task shutdown getting delayed due to cached thread pool not closed
[ https://issues.apache.org/jira/browse/HIVE-25971?focusedWorklogId=736568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736568 ] ASF GitHub Bot logged work on HIVE-25971: - Author: ASF GitHub Bot Created on: 04/Mar/22 10:24 Start Date: 04/Mar/22 10:24 Worklog Time Spent: 10m Work Description: guptashailesh92 commented on pull request #3046: URL: https://github.com/apache/hive/pull/3046#issuecomment-1059033143 @rbalamohan , added same patch for master as well. [CR](https://github.com/apache/hive/pull/3078) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 736568) Time Spent: 1.5h (was: 1h 20m) > Tez task shutdown getting delayed due to cached thread pool not closed > -- > > Key: HIVE-25971 > URL: https://issues.apache.org/jira/browse/HIVE-25971 > Project: Hive > Issue Type: Improvement > Components: Tez >Affects Versions: 2.4.0, 3.1.2 >Reporter: Shailesh Gupta >Assignee: Shailesh Gupta >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > We are using > a[CachedThreadPool|https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ObjectCache.java] > but not closing it. CachedThreadPool creates non daemon threads, causing the > Tez Task JVM shutdown delayed upto 1 min, as default idle timeout is 1 min. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26003: Description: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true which causes this I still don't like this, why do we ignore non-existent functions if we have a separate "if exist" clause? at least a message should appear that myfunc is invalid but we don't throw SemanticException was: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true which causes this I still don't like this, why don't we ignore nonexistent if we have a separate "if exist" clause...at least a message should appear that myfunc is invalid but we doesn't throw SemanticException > DROP FUNCTION silently passes when function doesn't exist > - > > Key: HIVE-26003 > URL: https://issues.apache.org/jira/browse/HIVE-26003 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > DROP FUNCTION silently passes when a function doesn't exist, which is bad, > especially because hive has "DROP FUNCTION IF EXISTS". > I was working with functions when I found that "DROP FUNCTION myfunc" passed, > and I thought it simply dropped the function, but then it kept working. I > realized I was supposed to call "DROP FUNCTION default.myfunc" because it's > registered as "default.myfunc". This "default" usecase is just one example > where DROP FUNCTION seems to work expected but silently causes confusion. > {code} > CREATE FUNCTION qtest_get_java_boolean AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; > describe function extended qtest_get_java_boolean; > drop function if exists qtest_get_java_boolean_typo; #PASS, find > drop function qtest_get_java_boolean_typo; #PASS, should fail I believe > {code} > UPDATE: okay, I've just realized there is > hive.exec.drop.ignorenonexistent=true which causes this > I still don't like this, why do we ignore non-existent functions if we have a > separate "if exist" clause? at least a message should appear that myfunc is > invalid but we don't throw SemanticException -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26003: Description: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true which causes this I still don't like this, why do we ignore non-existent functions if we have a separate "if exist" clause? at least a message should appear that the function is invalid but we don't throw SemanticException was: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true which causes this I still don't like this, why do we ignore non-existent functions if we have a separate "if exist" clause? at least a message should appear that myfunc is invalid but we don't throw SemanticException > DROP FUNCTION silently passes when function doesn't exist > - > > Key: HIVE-26003 > URL: https://issues.apache.org/jira/browse/HIVE-26003 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > DROP FUNCTION silently passes when a function doesn't exist, which is bad, > especially because hive has "DROP FUNCTION IF EXISTS". > I was working with functions when I found that "DROP FUNCTION myfunc" passed, > and I thought it simply dropped the function, but then it kept working. I > realized I was supposed to call "DROP FUNCTION default.myfunc" because it's > registered as "default.myfunc". This "default" usecase is just one example > where DROP FUNCTION seems to work expected but silently causes confusion. > {code} > CREATE FUNCTION qtest_get_java_boolean AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; > describe function extended qtest_get_java_boolean; > drop function if exists qtest_get_java_boolean_typo; #PASS, find > drop function qtest_get_java_boolean_typo; #PASS, should fail I believe > {code} > UPDATE: okay, I've just realized there is > hive.exec.drop.ignorenonexistent=true which causes this > I still don't like this, why do we ignore non-existent functions if we have a > separate "if exist" clause? at least a message should appear that the > function is invalid but we don't throw SemanticException -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26003: Description: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true which causes this I still don't like this, why don't we ignore nonexistent if we have a separate "if exist" clause...at least a message should appear that myfunc is invalid but we doesn't throw SemanticException was: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true which causes this I still don't like this, why don't we ignore nonexistent if we have an "if exist" > DROP FUNCTION silently passes when function doesn't exist > - > > Key: HIVE-26003 > URL: https://issues.apache.org/jira/browse/HIVE-26003 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > DROP FUNCTION silently passes when a function doesn't exist, which is bad, > especially because hive has "DROP FUNCTION IF EXISTS". > I was working with functions when I found that "DROP FUNCTION myfunc" passed, > and I thought it simply dropped the function, but then it kept working. I > realized I was supposed to call "DROP FUNCTION default.myfunc" because it's > registered as "default.myfunc". This "default" usecase is just one example > where DROP FUNCTION seems to work expected but silently causes confusion. > {code} > CREATE FUNCTION qtest_get_java_boolean AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; > describe function extended qtest_get_java_boolean; > drop function if exists qtest_get_java_boolean_typo; #PASS, find > drop function qtest_get_java_boolean_typo; #PASS, should fail I believe > {code} > UPDATE: okay, I've just realized there is > hive.exec.drop.ignorenonexistent=true which causes this > I still don't like this, why don't we ignore nonexistent if we have a > separate "if exist" clause...at least a message should appear that myfunc is > invalid but we doesn't throw SemanticException -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26003: Description: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true which causes this I still don't like this, why don't we ignore nonexistent if we have an "if exist" was: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} > DROP FUNCTION silently passes when function doesn't exist > - > > Key: HIVE-26003 > URL: https://issues.apache.org/jira/browse/HIVE-26003 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > DROP FUNCTION silently passes when a function doesn't exist, which is bad, > especially because hive has "DROP FUNCTION IF EXISTS". > I was working with functions when I found that "DROP FUNCTION myfunc" passed, > and I thought it simply dropped the function, but then it kept working. I > realized I was supposed to call "DROP FUNCTION default.myfunc" because it's > registered as "default.myfunc". This "default" usecase is just one example > where DROP FUNCTION seems to work expected but silently causes confusion. > {code} > CREATE FUNCTION qtest_get_java_boolean AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; > describe function extended qtest_get_java_boolean; > drop function if exists qtest_get_java_boolean_typo; #PASS, find > drop function qtest_get_java_boolean_typo; #PASS, should fail I believe > {code} > UPDATE: okay, I've just realized there is > hive.exec.drop.ignorenonexistent=true which causes this > I still don't like this, why don't we ignore nonexistent if we have an "if > exist" -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26004) Upgrade Iceberg to 0.13.1
[ https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501219#comment-17501219 ] Marton Bod commented on HIVE-26004: --- Pushed to master. Thanks [~pvary] for the review. > Upgrade Iceberg to 0.13.1 > - > > Key: HIVE-26004 > URL: https://issues.apache.org/jira/browse/HIVE-26004 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-26004) Upgrade Iceberg to 0.13.1
[ https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod resolved HIVE-26004. --- Resolution: Fixed > Upgrade Iceberg to 0.13.1 > - > > Key: HIVE-26004 > URL: https://issues.apache.org/jira/browse/HIVE-26004 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26004) Upgrade Iceberg to 0.13.1
[ https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod reassigned HIVE-26004: - > Upgrade Iceberg to 0.13.1 > - > > Key: HIVE-26004 > URL: https://issues.apache.org/jira/browse/HIVE-26004 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26003: Description: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. {code} CREATE FUNCTION qtest_get_java_boolean AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; describe function extended qtest_get_java_boolean; drop function if exists qtest_get_java_boolean_typo; #PASS, find drop function qtest_get_java_boolean_typo; #PASS, should fail I believe {code} was: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. > DROP FUNCTION silently passes when function doesn't exist > - > > Key: HIVE-26003 > URL: https://issues.apache.org/jira/browse/HIVE-26003 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > DROP FUNCTION silently passes when a function doesn't exist, which is bad, > especially because hive has "DROP FUNCTION IF EXISTS". > I was working with functions when I found that "DROP FUNCTION myfunc" passed, > and I thought it simply dropped the function, but then it kept working. I > realized I was supposed to call "DROP FUNCTION default.myfunc" because it's > registered as "default.myfunc". This "default" usecase is just one example > where DROP FUNCTION seems to work expected but silently causes confusion. > {code} > CREATE FUNCTION qtest_get_java_boolean AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean'; > describe function extended qtest_get_java_boolean; > drop function if exists qtest_get_java_boolean_typo; #PASS, find > drop function qtest_get_java_boolean_typo; #PASS, should fail I believe > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26003: Description: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This "default" usecase is just one example where DROP FUNCTION seems to work expected but silently causes confusion. was: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This is just one example where DROP FUNCTION seems to work expected, but silently makes confusion. > DROP FUNCTION silently passes when function doesn't exist > - > > Key: HIVE-26003 > URL: https://issues.apache.org/jira/browse/HIVE-26003 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > DROP FUNCTION silently passes when a function doesn't exist, which is bad, > especially because hive has "DROP FUNCTION IF EXISTS". > I was working with functions when I found that "DROP FUNCTION myfunc" passed, > and I thought it simply dropped the function, but then it kept working. I > realized I was supposed to call "DROP FUNCTION default.myfunc" because it's > registered as "default.myfunc". This "default" usecase is just one example > where DROP FUNCTION seems to work expected but silently causes confusion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26003: Description: DROP FUNCTION silently passes when a function doesn't exist, which is bad, especially because hive has "DROP FUNCTION IF EXISTS". I was working with functions when I found that "DROP FUNCTION myfunc" passed, and I thought it simply dropped the function, but then it kept working. I realized I was supposed to call "DROP FUNCTION default.myfunc" because it's registered as "default.myfunc". This is just one example where DROP FUNCTION seems to work expected, but silently makes confusion. > DROP FUNCTION silently passes when function doesn't exist > - > > Key: HIVE-26003 > URL: https://issues.apache.org/jira/browse/HIVE-26003 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > DROP FUNCTION silently passes when a function doesn't exist, which is bad, > especially because hive has "DROP FUNCTION IF EXISTS". > I was working with functions when I found that "DROP FUNCTION myfunc" passed, > and I thought it simply dropped the function, but then it kept working. I > realized I was supposed to call "DROP FUNCTION default.myfunc" because it's > registered as "default.myfunc". This is just one example where DROP FUNCTION > seems to work expected, but silently makes confusion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-26003: --- Assignee: László Bodor > DROP FUNCTION silently passes when function doesn't exist > - > > Key: HIVE-26003 > URL: https://issues.apache.org/jira/browse/HIVE-26003 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)