[jira] [Work logged] (HIVE-25935) Cleanup IMetaStoreClient#getPartitionsByNames APIs

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25935?focusedWorklogId=737010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-737010
 ]

ASF GitHub Bot logged work on HIVE-25935:
-

Author: ASF GitHub Bot
Created on: 05/Mar/22 03:34
Start Date: 05/Mar/22 03:34
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #3072:
URL: https://github.com/apache/hive/pull/3072#issuecomment-1059674601


   Overall looks good to me,  +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 737010)
Time Spent: 1h 20m  (was: 1h 10m)

> Cleanup IMetaStoreClient#getPartitionsByNames APIs
> --
>
> Key: HIVE-25935
> URL: https://issues.apache.org/jira/browse/HIVE-25935
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Stamatis Zampetakis
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently the 
> [IMetastoreClient|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java]
>  interface has 8 variants of the {{getPartitionsByNames}} method. Going 
> quickly over the concrete implementation it appears that not all of them are 
> useful/necessary so a bit of cleanup is needed.
> Below a few potential problems I observed:
> * Some of the APIs are not used anywhere in the project (neither by 
> production nor by test code).
> * Some of the APIs are deprecated in some concrete implementations but not 
> globally at the interface level without an explanation why.
> * Some of the implementations simply throw without doing anything.
> * Many of the APIs are partially tested or not tested at all.
> HIVE-24743, HIVE-25281 are related since they introduce/deprecate some of the 
> aforementioned APIs.
> It would be good to review the aforementioned APIs and decide what needs to 
> stay and what needs to go as well as complete necessary when relevant.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25935) Cleanup IMetaStoreClient#getPartitionsByNames APIs

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25935?focusedWorklogId=737004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-737004
 ]

ASF GitHub Bot logged work on HIVE-25935:
-

Author: ASF GitHub Bot
Created on: 05/Mar/22 02:28
Start Date: 05/Mar/22 02:28
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #3072:
URL: https://github.com/apache/hive/pull/3072#discussion_r820021805



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -4151,16 +4152,18 @@ public boolean dropPartition(String dbName, String 
tableName, List parti
   }
 
   if (nParts > nBatches * batchSize) {
-   String validWriteIdList = null;
-   Long tableId = null;
-   if (AcidUtils.isTransactionalTable(tbl)) {
-ValidWriteIdList vWriteIdList = getValidWriteIdList(tbl.getDbName(), 
tbl.getTableName());
-validWriteIdList = vWriteIdList != null ? vWriteIdList.toString() : 
null;
-tableId = tbl.getTTable().getId();
-  }
+String validWriteIdList = null;

Review comment:
   nit: I wonder if we can use the following:
   ```java
GetPartitionsByNamesRequest req = 
MetaStoreUtils.convertToGetPartitionsByNamesRequest(tbl.getDbName(), 
tbl.getTableName(),
   partNames.subList(nBatches*batchSize, nParts), getColStats, 
Constants.HIVE_ENGINE, null,
   null);
List tParts = 
getPartitionsByNames(req, tbl);




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 737004)
Time Spent: 1h 10m  (was: 1h)

> Cleanup IMetaStoreClient#getPartitionsByNames APIs
> --
>
> Key: HIVE-25935
> URL: https://issues.apache.org/jira/browse/HIVE-25935
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Stamatis Zampetakis
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently the 
> [IMetastoreClient|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java]
>  interface has 8 variants of the {{getPartitionsByNames}} method. Going 
> quickly over the concrete implementation it appears that not all of them are 
> useful/necessary so a bit of cleanup is needed.
> Below a few potential problems I observed:
> * Some of the APIs are not used anywhere in the project (neither by 
> production nor by test code).
> * Some of the APIs are deprecated in some concrete implementations but not 
> globally at the interface level without an explanation why.
> * Some of the implementations simply throw without doing anything.
> * Many of the APIs are partially tested or not tested at all.
> HIVE-24743, HIVE-25281 are related since they introduce/deprecate some of the 
> aforementioned APIs.
> It would be good to review the aforementioned APIs and decide what needs to 
> stay and what needs to go as well as complete necessary when relevant.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25989) CTLT HBaseStorageHandler is dropping underlying HBase table when failed

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25989?focusedWorklogId=736953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736953
 ]

ASF GitHub Bot logged work on HIVE-25989:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 23:00
Start Date: 04/Mar/22 23:00
Worklog Time Spent: 10m 
  Work Description: nareshpr commented on pull request #3076:
URL: https://github.com/apache/hive/pull/3076#issuecomment-1059591262


   Please update versionmap.txt as thrift is getting updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736953)
Time Spent: 20m  (was: 10m)

> CTLT HBaseStorageHandler is dropping underlying HBase table when failed
> ---
>
> Key: HIVE-25989
> URL: https://issues.apache.org/jira/browse/HIVE-25989
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With hive.strict.managed.tables & hive.create.as.acid, 
> Hive-Hbase rollback code is assuming it is a createTable failure instead of 
> CTLT & removing underlying hbase table while rolling back at here.
> [https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseMetaHook.java#L187-L195]
>  
> Repro
>  
> {code:java}
> hbase
> =
> hbase shell
> create 'hbase_hive_table', 'cf'
> beeline
> ===
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.managed.tables=true;
> set hive.create.as.acid=true;
> set hive.create.as.insert.only=true;
> set hive.default.fileformat.managed=ORC;
> > CREATE EXTERNAL TABLE `hbase_hive_table`(                       
>    `key` int COMMENT '',                            
>    `value` string COMMENT '')                       
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.hbase.HBaseSerDe'        
>  STORED BY                                          
>    'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
>  WITH SERDEPROPERTIES (                             
>    'hbase.columns.mapping'=':key,cf:cf')                      
>  TBLPROPERTIES ('hbase.table.name'='hbase_hive_table');
> > select * from hbase_hive_table;
> +---+-+
> | hbase_hive_table.key  | hbase_hive_table.value  |
> +---+-+
> +---+-+
> > create table new_hbase_hive_table like hbase_hive_table;
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: The table must 
> be stored using an ACID compliant format (such as ORC): 
> default.new_hbase_hive_table
> > select * from hbase_hive_table;
> Error: java.io.IOException: org.apache.hadoop.hbase.TableNotFoundException: 
> hbase_hive_table
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26006) TopNKey and PTF with more than one column is failing with IOBE

2022-03-04 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-26006:
--
Description: 
{code:java}
java.lang.IndexOutOfBoundsException: toIndex = 2
at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014)
at java.util.ArrayList.subList(ArrayList.java:1006)
at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201)
at 
org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162)
at 
org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76)
at 
org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215){code}

  was:
java.lang.IndexOutOfBoundsException: toIndex = 2
at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014)
at java.util.ArrayList.subList(ArrayList.java:1006)
at 
org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201)
at 
org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162)
at 
org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76)
at 
org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215)


> TopNKey and PTF with more than one column is failing with IOBE
> --
>
> Key: HIVE-26006
> URL: https://issues.apache.org/jira/browse/HIVE-26006
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
>
> {code:java}
> java.lang.IndexOutOfBoundsException: toIndex = 2
> at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014)
> at java.util.ArrayList.subList(ArrayList.java:1006)
> at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201)
> at 
> org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162)
> at 
> org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76)
> at 
> 

[jira] [Work logged] (HIVE-26000) DirectSQL to prune partitions fails with postgres backend for Skewed-Partition tables

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26000?focusedWorklogId=736839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736839
 ]

ASF GitHub Bot logged work on HIVE-26000:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 18:27
Start Date: 04/Mar/22 18:27
Worklog Time Spent: 10m 
  Work Description: nareshpr commented on pull request #3073:
URL: https://github.com/apache/hive/pull/3073#issuecomment-1059412819


   Thanks for the review @zabetak. I verified the fix locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736839)
Time Spent: 40m  (was: 0.5h)

> DirectSQL to prune partitions fails with postgres backend for 
> Skewed-Partition tables
> -
>
> Key: HIVE-26000
> URL: https://issues.apache.org/jira/browse/HIVE-26000
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  
>  
> {code:java}
> 2022-03-02 20:37:56,421 INFO  
> org.apache.hadoop.hive.metastore.PartFilterExprUtil: [pool-6-thread-200]: 
> Unable to make the expression tree from expression string [((ds = 
> '2008-04-08') and (UDFToDouble(hr) = 11.0D))]Error parsing partition filter; 
> lexer error: null; exception NoViableAltException(24@[])
> 2022-03-02 20:37:56,593 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-6-thread-200]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): Error executing SQL query "select 
> "SKEWED_COL_VALUE_LOC_MAP"."SD_ID", 
> "SKEWED_STRING_LIST_VALUES".STRING_LIST_ID, 
> "SKEWED_COL_VALUE_LOC_MAP"."LOCATION", 
> "SKEWED_STRING_LIST_VALUES"."STRING_LIST_VALUE" from 
> "SKEWED_COL_VALUE_LOC_MAP"  left outer join "SKEWED_STRING_LIST_VALUES" on 
> "SKEWED_COL_VALUE_LOC_MAP"."STRING_LIST_ID_KID" = 
> "SKEWED_STRING_LIST_VALUES"."STRING_LIST_ID" where 
> "SKEWED_COL_VALUE_LOC_MAP"."SD_ID" in (51010)  and 
> "SKEWED_COL_VALUE_LOC_MAP"."STRING_LIST_ID_KID" is not null order by 
> "SKEWED_COL_VALUE_LOC_MAP"."SD_ID" asc,  
> "SKEWED_STRING_LIST_VALUES"."STRING_LIST_ID" asc,  
> "SKEWED_STRING_LIST_VALUES"."INTEGER_IDX" asc". at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
>  at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) at 
> org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216) at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.loopJoinOrderedResult(MetastoreDirectSqlUtils.java:131)
>  at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.loopJoinOrderedResult(MetastoreDirectSqlUtils.java:109)
>  at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.setSkewedColLocationMaps(MetastoreDirectSqlUtils.java:414)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:967)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:788)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:117)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:530)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:521)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$10.getSqlResult(ObjectStore.java:3722);
>  Caused by: ERROR: column SKEWED_STRING_LIST_VALUES.string_list_id does not 
> exist
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.

2022-03-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala resolved HIVE-25988.
--
Resolution: Fixed

Fix committed to master. Closing the jira.

> CreateTableEvent should have database object as one of the hive privilege 
> object.
> -
>
> Key: HIVE-25988
> URL: https://issues.apache.org/jira/browse/HIVE-25988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The CreateTableEvent in HMS should have a database object as one of the 
> HivePrivilege Objects so that it is consistent with HS2's CreateTable Event.
> Also, we need to move the DFS_URI object into the InputList so that this is 
> also consistent with HS2's behavior.
> Having database objects in the create table events hive privilege objects 
> helps to determine if a user has the right permissions to create a table in a 
> particular database via ranger/sentry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25988?focusedWorklogId=736798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736798
 ]

ASF GitHub Bot logged work on HIVE-25988:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 17:06
Start Date: 04/Mar/22 17:06
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera closed pull request #3057:
URL: https://github.com/apache/hive/pull/3057


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736798)
Time Spent: 40m  (was: 0.5h)

> CreateTableEvent should have database object as one of the hive privilege 
> object.
> -
>
> Key: HIVE-25988
> URL: https://issues.apache.org/jira/browse/HIVE-25988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The CreateTableEvent in HMS should have a database object as one of the 
> HivePrivilege Objects so that it is consistent with HS2's CreateTable Event.
> Also, we need to move the DFS_URI object into the InputList so that this is 
> also consistent with HS2's behavior.
> Having database objects in the create table events hive privilege objects 
> helps to determine if a user has the right permissions to create a table in a 
> particular database via ranger/sentry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25988?focusedWorklogId=736793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736793
 ]

ASF GitHub Bot logged work on HIVE-25988:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 17:02
Start Date: 04/Mar/22 17:02
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #3057:
URL: https://github.com/apache/hive/pull/3057#issuecomment-1059342769


   Fix has been merged to master. Please close the PR. Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736793)
Time Spent: 0.5h  (was: 20m)

> CreateTableEvent should have database object as one of the hive privilege 
> object.
> -
>
> Key: HIVE-25988
> URL: https://issues.apache.org/jira/browse/HIVE-25988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The CreateTableEvent in HMS should have a database object as one of the 
> HivePrivilege Objects so that it is consistent with HS2's CreateTable Event.
> Also, we need to move the DFS_URI object into the InputList so that this is 
> also consistent with HS2's behavior.
> Having database objects in the create table events hive privilege objects 
> helps to determine if a user has the right permissions to create a table in a 
> particular database via ranger/sentry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25943) Introduce compaction cleaner failed attempts threshold

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25943?focusedWorklogId=736761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736761
 ]

ASF GitHub Bot logged work on HIVE-25943:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 16:25
Start Date: 04/Mar/22 16:25
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #3034:
URL: https://github.com/apache/hive/pull/3034#discussion_r819678060



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -288,14 +285,30 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
   if (metricsEnabled) {
 
Metrics.getOrCreateCounter(MetricsConstants.COMPACTION_CLEANER_FAILURE_COUNTER).inc();
   }
-  txnHandler.markFailed(ci);
-} finally {
+  handleCleanerAttemptFailure(ci);
+}  finally {
   if (metricsEnabled) {
 perfLogger.perfLogEnd(CLASS_NAME, cleanerMetric);
   }
 }
   }
 
+  private void handleCleanerAttemptFailure(CompactionInfo ci) throws 
MetaException {
+long defaultRetention = getTimeVar(conf, 
HIVE_COMPACTOR_CLEANER_RETRY_RETENTION_TIME, TimeUnit.MILLISECONDS);
+int cleanAttempts = 0;

Review comment:
   Shouldn't cleanAttempts be initialized to 1?
   Because HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS >= 1 because of its 
RangeValidator? (Speaking of, it might be good to increase the range to (0, 10) 
as a kind of feature flag, but I'll leave that up to you)

##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -288,14 +285,30 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
   if (metricsEnabled) {
 
Metrics.getOrCreateCounter(MetricsConstants.COMPACTION_CLEANER_FAILURE_COUNTER).inc();
   }
-  txnHandler.markFailed(ci);
-} finally {
+  handleCleanerAttemptFailure(ci);
+}  finally {
   if (metricsEnabled) {
 perfLogger.perfLogEnd(CLASS_NAME, cleanerMetric);
   }
 }
   }
 
+  private void handleCleanerAttemptFailure(CompactionInfo ci) throws 
MetaException {
+long defaultRetention = getTimeVar(conf, 
HIVE_COMPACTOR_CLEANER_RETRY_RETENTION_TIME, TimeUnit.MILLISECONDS);
+int cleanAttempts = 0;
+if (ci.retryRetention > 0) {
+  cleanAttempts = (int)(Math.log(ci.retryRetention / defaultRetention) / 
Math.log(2)) + 1;
+}
+if (cleanAttempts >= getIntVar(conf, 
HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS)) {
+  //Mark it as failed if the max attempt threshold is reached.
+  txnHandler.markFailed(ci);
+} else {
+  //Calculate retry retention time and update record.
+  ci.retryRetention = (long)Math.pow(2, cleanAttempts) * defaultRetention;

Review comment:
   So assuming HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS==5m,  we try at 
CQ_COMMIT_TIME + 5m then CQ_COMMIT_TIME + 5^2 minutes then CQ_COMMIT_TIME + 5^3 
minutes?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -353,11 +353,9 @@ public void markCompacted(CompactionInfo info) throws 
MetaException {
 if (minOpenTxnWaterMark > 0) {
   whereClause += " AND (\"CQ_NEXT_TXN_ID\" <= " + minOpenTxnWaterMark 
+ " OR \"CQ_NEXT_TXN_ID\" IS NULL)";
 }
-if (retentionTime > 0) {
-  whereClause += " AND \"CQ_COMMIT_TIME\" < (" + getEpochFn(dbProduct) 
+ " - " + retentionTime + ")";
-}
+whereClause += " AND (\"CQ_COMMIT_TIME\" < (" + getEpochFn(dbProduct) 
+ " - CQ_RETRY_RETENTION - " + retentionTime + ") OR \"CQ_COMMIT_TIME\" IS 
NULL)";

Review comment:
   It would probably be best to fix the `if (retentionTime > 0)` removal in 
a separate ticket since it fixes an unrelated bug

##
File path: 
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql
##
@@ -629,7 +629,8 @@ CREATE TABLE COMPACTION_QUEUE (
   CQ_INITIATOR_ID varchar(128),
   CQ_INITIATOR_VERSION varchar(128),
   CQ_WORKER_VERSION varchar(128),
-  CQ_CLEANER_START bigint
+  CQ_CLEANER_START bigint,
+  CQ_RETRY_RETENTION integer NOT NULL DEFAULT 0

Review comment:
   I'm a little iffy about declaring this as an integer vs. a bigint, since 
we store milliseconds and this value could be 2^10 * 
hive.compactor.cleaner.retry.retentionTime which has no upper limit

##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -288,14 +285,30 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
   if (metricsEnabled) {
 
Metrics.getOrCreateCounter(MetricsConstants.COMPACTION_CLEANER_FAILURE_COUNTER).inc();
   }
-  txnHandler.markFailed(ci);
-} finally {
+  handleCleanerAttemptFailure(ci);
+}  finally 

[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736692
 ]

ASF GitHub Bot logged work on HIVE-25645:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 14:36
Start Date: 04/Mar/22 14:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3079:
URL: https://github.com/apache/hive/pull/3079#discussion_r819622903



##
File path: ql/src/test/org/apache/hadoop/hive/metastore/TestMetastoreExpr.java
##
@@ -183,12 +183,12 @@ public void checkExpr(int numParts,
   String dbName, String tblName, ExprNodeGenericFuncDesc expr, Table t) 
throws Exception {
 List parts = new ArrayList();
 client.listPartitionsByExpr(dbName, tblName,
-SerializationUtilities.serializeExpressionToKryo(expr), null, 
(short)-1, parts);
+SerializationUtilities.serializeObjectWithTypeInformation(expr), null, 
(short)-1, parts);
 assertEquals("Partition check failed: " + expr.getExprString(), numParts, 
parts.size());
 
 // check with partition spec as well
 PartitionsByExprRequest req = new PartitionsByExprRequest(dbName, tblName,
-
ByteBuffer.wrap(SerializationUtilities.serializeExpressionToKryo(expr)));
+
ByteBuffer.wrap(SerializationUtilities.serializeObjectWithTypeInformation(expr)));

Review comment:
   nit: funny that the indentation is different than the above one.
   If we touch the line fix the indentation please




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736692)
Time Spent: 1h 10m  (was: 1h)

> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25645
> URL: https://issues.apache.org/jira/browse/HIVE-25645
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736690
 ]

ASF GitHub Bot logged work on HIVE-25645:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 14:36
Start Date: 04/Mar/22 14:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3079:
URL: https://github.com/apache/hive/pull/3079#discussion_r819622269



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilder.java
##
@@ -347,8 +347,16 @@ private void buildWhereClauseForInsert(StringBuilder 
query) {
 
   query.append(" where ");
   for (int i = 0; i < keys.size(); ++i) {
-query.append(i == 0 ? "`" : " and 
`").append(keys.get(i).getName()).append("`='")
-.append(vals.get(i)).append("'");
+FieldSchema keySchema = keys.get(i);
+boolean isBooleanKey = keySchema.getType().equalsIgnoreCase("boolean");
+query.append(i == 0 ? "`" : " and 
`").append(keySchema.getName()).append("`=");
+if (!isBooleanKey) {

Review comment:
   nit: Maybe easier to read:
   ```
   if (isBooleanKey) {
 query.append("'").append(vals.get(i)).append("'");
   } else {
 query.append(vals.get(i));
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736690)
Time Spent: 1h  (was: 50m)

> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25645
> URL: https://issues.apache.org/jira/browse/HIVE-25645
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736683
 ]

ASF GitHub Bot logged work on HIVE-25645:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 14:33
Start Date: 04/Mar/22 14:33
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3079:
URL: https://github.com/apache/hive/pull/3079#discussion_r819620506



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilder.java
##
@@ -347,8 +347,16 @@ private void buildWhereClauseForInsert(StringBuilder 
query) {
 
   query.append(" where ");
   for (int i = 0; i < keys.size(); ++i) {
-query.append(i == 0 ? "`" : " and 
`").append(keys.get(i).getName()).append("`='")
-.append(vals.get(i)).append("'");
+FieldSchema keySchema = keys.get(i);
+boolean isBooleanKey = keySchema.getType().equalsIgnoreCase("boolean");

Review comment:
   ColumnType.BOOLEAN_TYPE_NAME




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736683)
Time Spent: 50m  (was: 40m)

> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25645
> URL: https://issues.apache.org/jira/browse/HIVE-25645
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736682
 ]

ASF GitHub Bot logged work on HIVE-25645:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 14:32
Start Date: 04/Mar/22 14:32
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3079:
URL: https://github.com/apache/hive/pull/3079#discussion_r819619418



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
##
@@ -498,7 +498,7 @@ private static PrunedPartitionList 
getPartitionsFromServer(Table tab, final Stri
* @return true iff the partition pruning expression contains non-partition 
columns.
*/
   static private boolean pruneBySequentialScan(Table tab, List 
partitions,
-  ExprNodeGenericFuncDesc prunerExpr, HiveConf conf) throws HiveException, 
MetaException {
+   ExprNodeDesc prunerExpr, 
HiveConf conf) throws HiveException, MetaException {

Review comment:
   nit: line continuation is 4 spaces




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736682)
Time Spent: 40m  (was: 0.5h)

> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25645
> URL: https://issues.apache.org/jira/browse/HIVE-25645
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736679
 ]

ASF GitHub Bot logged work on HIVE-25645:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 14:31
Start Date: 04/Mar/22 14:31
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3079:
URL: https://github.com/apache/hive/pull/3079#discussion_r819618344



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java
##
@@ -810,31 +811,44 @@ private static void serializeObjectByKryo(Kryo kryo, 
Object plan, OutputStream o
   }
 
   /**
-   * Serializes expression via Kryo.
-   * @param expr Expression.
+   * Serializes any object via Kryo. Type information will be serialized as 
well, allowing dynamic deserialization
+   * without the need to pass the class.
+   * @param object The object to serialize.
* @return Bytes.
*/
-  public static byte[] serializeExpressionToKryo(ExprNodeGenericFuncDesc expr) 
{
-return serializeObjectToKryo(expr);
+  public static byte[] serializeObjectWithTypeInformation(Serializable object) 
{
+ByteArrayOutputStream baos = new ByteArrayOutputStream();
+Kryo kryo = borrowKryo();
+try (Output output = new Output(baos)) {
+  kryo.writeClassAndObject(output, object);
+} finally {
+  releaseKryo(kryo);
+}
+return baos.toByteArray();
   }
 
   /**
* Deserializes expression from Kryo.
* @param bytes Bytes containing the expression.
* @return Expression; null if deserialization succeeded, but the result 
type is incorrect.
*/
-  public static ExprNodeGenericFuncDesc deserializeExpressionFromKryo(byte[] 
bytes) {
-return deserializeObjectFromKryo(bytes, ExprNodeGenericFuncDesc.class);
+  public static  T deserializeObjectWithTypeInformation(byte[] bytes) {
+Kryo kryo = borrowKryo();
+try (Input inp = new Input(new ByteArrayInputStream(bytes))) {
+  return (T) kryo.readClassAndObject(inp);
+} finally {
+  releaseKryo(kryo);
+}
   }
 
   public static String serializeExpression(ExprNodeGenericFuncDesc expr) {
-return new String(Base64.encodeBase64(serializeExpressionToKryo(expr)),
-StandardCharsets.UTF_8);
+return new String(Base64.encodeBase64(serializeObjectToKryo(expr)),
+StandardCharsets.UTF_8);

Review comment:
   nit: keep 4 spaces  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736679)
Time Spent: 20m  (was: 10m)

> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25645
> URL: https://issues.apache.org/jira/browse/HIVE-25645
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25645?focusedWorklogId=736680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736680
 ]

ASF GitHub Bot logged work on HIVE-25645:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 14:31
Start Date: 04/Mar/22 14:31
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3079:
URL: https://github.com/apache/hive/pull/3079#discussion_r819618843



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java
##
@@ -865,13 +879,13 @@ public static ExprNodeGenericFuncDesc 
deserializeExpression(String s) {
 
   public static String serializeObject(Serializable expr) {
 return new String(Base64.encodeBase64(serializeObjectToKryo(expr)),
-StandardCharsets.UTF_8);
+StandardCharsets.UTF_8);

Review comment:
   nit: remove formatting only changes. They kill you during backports




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736680)
Time Spent: 0.5h  (was: 20m)

> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25645
> URL: https://issues.apache.org/jira/browse/HIVE-25645
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25894) Table migration to Iceberg doesn't remove HMS partitions

2022-03-04 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25894.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for reporting [~boroknagyz] and for the review [~Marton Bod] and 
[~lpinter]!

> Table migration to Iceberg doesn't remove HMS partitions
> 
>
> Key: HIVE-25894
> URL: https://issues.apache.org/jira/browse/HIVE-25894
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Repro:
> {code:java}
> create table ice_part_migrate (i int) partitioned by (p int) stored as 
> parquet;
> insert into ice_part_migrate partition(p=1) values (1), (11), (111);
> insert into ice_part_migrate partition(p=2) values (2), (22), (222);
> ALTER TABLE ice_part_migrate  SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');
> {code}
> Then looking at the HMS database:
> {code:java}
> => select "PART_NAME" from "PARTITIONS" p, "TBLS" t where 
> t."TBL_ID"=p."TBL_ID" and t."TBL_NAME"='ice_part_migrate';
>  PART_NAME
> ---
>  p=1
>  p=2
> {code}
> This is weird because Iceberg tables are supposed to be unpartitioned. It 
> also breaks some precondition checks in Impala. Is there a particular reason 
> to keep the partitions in HMS?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25894) Table migration to Iceberg doesn't remove HMS partitions

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25894?focusedWorklogId=736649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736649
 ]

ASF GitHub Bot logged work on HIVE-25894:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 13:37
Start Date: 04/Mar/22 13:37
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #3061:
URL: https://github.com/apache/hive/pull/3061


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736649)
Time Spent: 40m  (was: 0.5h)

> Table migration to Iceberg doesn't remove HMS partitions
> 
>
> Key: HIVE-25894
> URL: https://issues.apache.org/jira/browse/HIVE-25894
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Repro:
> {code:java}
> create table ice_part_migrate (i int) partitioned by (p int) stored as 
> parquet;
> insert into ice_part_migrate partition(p=1) values (1), (11), (111);
> insert into ice_part_migrate partition(p=2) values (2), (22), (222);
> ALTER TABLE ice_part_migrate  SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');
> {code}
> Then looking at the HMS database:
> {code:java}
> => select "PART_NAME" from "PARTITIONS" p, "TBLS" t where 
> t."TBL_ID"=p."TBL_ID" and t."TBL_NAME"='ice_part_migrate';
>  PART_NAME
> ---
>  p=1
>  p=2
> {code}
> This is weird because Iceberg tables are supposed to be unpartitioned. It 
> also breaks some precondition checks in Impala. Is there a particular reason 
> to keep the partitions in HMS?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-25975:
--
Description: 
The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
bucketed tables: i.e. the records do not need to be ordered by the bucket 
values, the writer will just close its current file and open a new one for 
out-of-order records. 

This is suboptimal for the long-term due to creating many small files. Spark 
uses a UDF to compute the bucket value for each record and therefore it is able 
to order the records by bucket values, achieving optimal clustering.

The proposed change adds a new UDF that uses Iceberg's bucket transformation 
function to produce bucket values from constants or any column input. All types 
that Iceberg buckets support are supported in this UDF too, except for UUID.

This UDF is then used in SortedDynPartitionOptimizer to sort data during write 
if the target Iceberg target has bucket transform partitioning.

To enable this, Hive has been extended with the feature that allows storage 
handlers to define custom sorting expressions, to be passed to FileSink 
operator's DynPartContext during dynamic partitioning write scenarios.

The lenient version of ClusteredWriter in patched-iceberg-core has been 
disposed of as it is not needed anymore with this feature in.

  was:
The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
bucketed tables: i.e. the records do not need to be ordered by the bucket 
values, the writer will just close its current file and open a new one for 
out-of-order records. 

This is suboptimal for the long-term due to creating many small files. Spark 
uses a UDF to compute the bucket value for each record and therefore it is able 
to order the records by bucket values, achieving optimal clustering.


> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.
> The proposed change adds a new UDF that uses Iceberg's bucket transformation 
> function to produce bucket values from constants or any column input. All 
> types that Iceberg buckets support are supported in this UDF too, except for 
> UUID.
> This UDF is then used in SortedDynPartitionOptimizer to sort data during 
> write if the target Iceberg target has bucket transform partitioning.
> To enable this, Hive has been extended with the feature that allows storage 
> handlers to define custom sorting expressions, to be passed to FileSink 
> operator's DynPartContext during dynamic partitioning write scenarios.
> The lenient version of ClusteredWriter in patched-iceberg-core has been 
> disposed of as it is not needed anymore with this feature in.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-25975.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks for the thorough reviews from [~pvary] and [~Marton 
Bod] 

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.
> The proposed change adds a new UDF that uses Iceberg's bucket transformation 
> function to produce bucket values from constants or any column input. All 
> types that Iceberg buckets support are supported in this UDF too, except for 
> UUID.
> This UDF is then used in SortedDynPartitionOptimizer to sort data during 
> write if the target Iceberg target has bucket transform partitioning.
> To enable this, Hive has been extended with the feature that allows storage 
> handlers to define custom sorting expressions, to be passed to FileSink 
> operator's DynPartContext during dynamic partitioning write scenarios.
> The lenient version of ClusteredWriter in patched-iceberg-core has been 
> disposed of as it is not needed anymore with this feature in.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime

2022-03-04 Thread gabrywu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gabrywu resolved HIVE-16352.

Resolution: Won't Fix

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -
>
> Key: HIVE-16352
> URL: https://issues.apache.org/jira/browse/HIVE-16352
> Project: Hive
>  Issue Type: New Feature
>  Components: Avro, File Formats, Reader
>Affects Versions: 3.1.2
>Reporter: Navdeep Poonia
>Assignee: gabrywu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=736591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736591
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 11:27
Start Date: 04/Mar/22 11:27
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r819490947



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -307,6 +331,42 @@ public boolean supportsPartitionTransform() {
 }).collect(Collectors.toList());
   }
 
+  @Override
+  public DynamicPartitionCtx createDPContext(HiveConf hiveConf, 
org.apache.hadoop.hive.ql.metadata.Table hmsTable)
+  throws SemanticException {
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = IcebergTableUtil.getTable(conf, tableDesc.getProperties());
+if (table.spec().isUnpartitioned()) {
+  return null;
+}
+
+// Iceberg currently doesn't have publicly accessible partition transform 
information, hence use above string parse
+List partitionTransformSpecs = 
getPartitionTransformSpec(hmsTable);

Review comment:
   As discussed offline this won't be addressed with this patch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736591)
Time Spent: 6h 20m  (was: 6h 10m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=736593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736593
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 11:27
Start Date: 04/Mar/22 11:27
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #3060:
URL: https://github.com/apache/hive/pull/3060


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736593)
Time Spent: 6.5h  (was: 6h 20m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25971) Tez task shutdown getting delayed due to cached thread pool not closed

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25971?focusedWorklogId=736569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736569
 ]

ASF GitHub Bot logged work on HIVE-25971:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 10:24
Start Date: 04/Mar/22 10:24
Worklog Time Spent: 10m 
  Work Description: guptashailesh92 edited a comment on pull request #3046:
URL: https://github.com/apache/hive/pull/3046#issuecomment-1059033143


   @rbalamohan , added same patch for master as well. 
[CR](https://github.com/apache/hive/pull/3078).
   Here in jenkins link, it shows no new failures and in master all tests are 
successful. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736569)
Time Spent: 1h 40m  (was: 1.5h)

> Tez task shutdown getting delayed due to cached thread pool not closed
> --
>
> Key: HIVE-25971
> URL: https://issues.apache.org/jira/browse/HIVE-25971
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.4.0, 3.1.2
>Reporter: Shailesh Gupta
>Assignee: Shailesh Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We are using 
> a[CachedThreadPool|https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ObjectCache.java]
>  but not closing it. CachedThreadPool creates non daemon threads, causing the 
> Tez Task JVM shutdown delayed upto 1 min, as default idle timeout is 1 min.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25971) Tez task shutdown getting delayed due to cached thread pool not closed

2022-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25971?focusedWorklogId=736568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-736568
 ]

ASF GitHub Bot logged work on HIVE-25971:
-

Author: ASF GitHub Bot
Created on: 04/Mar/22 10:24
Start Date: 04/Mar/22 10:24
Worklog Time Spent: 10m 
  Work Description: guptashailesh92 commented on pull request #3046:
URL: https://github.com/apache/hive/pull/3046#issuecomment-1059033143


   @rbalamohan , added same patch for master as well. 
[CR](https://github.com/apache/hive/pull/3078)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 736568)
Time Spent: 1.5h  (was: 1h 20m)

> Tez task shutdown getting delayed due to cached thread pool not closed
> --
>
> Key: HIVE-25971
> URL: https://issues.apache.org/jira/browse/HIVE-25971
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.4.0, 3.1.2
>Reporter: Shailesh Gupta
>Assignee: Shailesh Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We are using 
> a[CachedThreadPool|https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ObjectCache.java]
>  but not closing it. CachedThreadPool creates non daemon threads, causing the 
> Tez Task JVM shutdown delayed upto 1 min, as default idle timeout is 1 min.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26003:

Description: 
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}

UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true 
which causes this
I still don't like this, why do we ignore non-existent functions if we have a 
separate "if exist" clause? at least a message should appear that myfunc is 
invalid but we don't throw SemanticException

  was:
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}

UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true 
which causes this
I still don't like this, why don't we ignore nonexistent if we have a separate 
"if exist" clause...at least a message should appear that myfunc is invalid but 
we doesn't throw SemanticException


> DROP FUNCTION silently passes when function doesn't exist
> -
>
> Key: HIVE-26003
> URL: https://issues.apache.org/jira/browse/HIVE-26003
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
> especially because hive has "DROP FUNCTION IF EXISTS".
> I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
> and I thought it simply dropped the function, but then it kept working. I 
> realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
> registered as "default.myfunc". This "default" usecase is just one example 
> where DROP FUNCTION seems to work expected but silently causes confusion. 
> {code}
> CREATE FUNCTION qtest_get_java_boolean AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
> describe function extended qtest_get_java_boolean;
> drop function if exists qtest_get_java_boolean_typo; #PASS, find
> drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
> {code}
> UPDATE: okay, I've just realized there is 
> hive.exec.drop.ignorenonexistent=true which causes this
> I still don't like this, why do we ignore non-existent functions if we have a 
> separate "if exist" clause? at least a message should appear that myfunc is 
> invalid but we don't throw SemanticException



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26003:

Description: 
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}

UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true 
which causes this
I still don't like this, why do we ignore non-existent functions if we have a 
separate "if exist" clause? at least a message should appear that the function 
is invalid but we don't throw SemanticException

  was:
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}

UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true 
which causes this
I still don't like this, why do we ignore non-existent functions if we have a 
separate "if exist" clause? at least a message should appear that myfunc is 
invalid but we don't throw SemanticException


> DROP FUNCTION silently passes when function doesn't exist
> -
>
> Key: HIVE-26003
> URL: https://issues.apache.org/jira/browse/HIVE-26003
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
> especially because hive has "DROP FUNCTION IF EXISTS".
> I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
> and I thought it simply dropped the function, but then it kept working. I 
> realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
> registered as "default.myfunc". This "default" usecase is just one example 
> where DROP FUNCTION seems to work expected but silently causes confusion. 
> {code}
> CREATE FUNCTION qtest_get_java_boolean AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
> describe function extended qtest_get_java_boolean;
> drop function if exists qtest_get_java_boolean_typo; #PASS, find
> drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
> {code}
> UPDATE: okay, I've just realized there is 
> hive.exec.drop.ignorenonexistent=true which causes this
> I still don't like this, why do we ignore non-existent functions if we have a 
> separate "if exist" clause? at least a message should appear that the 
> function is invalid but we don't throw SemanticException



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26003:

Description: 
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}

UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true 
which causes this
I still don't like this, why don't we ignore nonexistent if we have a separate 
"if exist" clause...at least a message should appear that myfunc is invalid but 
we doesn't throw SemanticException

  was:
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}

UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true 
which causes this
I still don't like this, why don't we ignore nonexistent if we have an "if 
exist"


> DROP FUNCTION silently passes when function doesn't exist
> -
>
> Key: HIVE-26003
> URL: https://issues.apache.org/jira/browse/HIVE-26003
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
> especially because hive has "DROP FUNCTION IF EXISTS".
> I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
> and I thought it simply dropped the function, but then it kept working. I 
> realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
> registered as "default.myfunc". This "default" usecase is just one example 
> where DROP FUNCTION seems to work expected but silently causes confusion. 
> {code}
> CREATE FUNCTION qtest_get_java_boolean AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
> describe function extended qtest_get_java_boolean;
> drop function if exists qtest_get_java_boolean_typo; #PASS, find
> drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
> {code}
> UPDATE: okay, I've just realized there is 
> hive.exec.drop.ignorenonexistent=true which causes this
> I still don't like this, why don't we ignore nonexistent if we have a 
> separate "if exist" clause...at least a message should appear that myfunc is 
> invalid but we doesn't throw SemanticException



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26003:

Description: 
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}

UPDATE: okay, I've just realized there is hive.exec.drop.ignorenonexistent=true 
which causes this
I still don't like this, why don't we ignore nonexistent if we have an "if 
exist"

  was:
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}


> DROP FUNCTION silently passes when function doesn't exist
> -
>
> Key: HIVE-26003
> URL: https://issues.apache.org/jira/browse/HIVE-26003
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
> especially because hive has "DROP FUNCTION IF EXISTS".
> I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
> and I thought it simply dropped the function, but then it kept working. I 
> realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
> registered as "default.myfunc". This "default" usecase is just one example 
> where DROP FUNCTION seems to work expected but silently causes confusion. 
> {code}
> CREATE FUNCTION qtest_get_java_boolean AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
> describe function extended qtest_get_java_boolean;
> drop function if exists qtest_get_java_boolean_typo; #PASS, find
> drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
> {code}
> UPDATE: okay, I've just realized there is 
> hive.exec.drop.ignorenonexistent=true which causes this
> I still don't like this, why don't we ignore nonexistent if we have an "if 
> exist"



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26004) Upgrade Iceberg to 0.13.1

2022-03-04 Thread Marton Bod (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501219#comment-17501219
 ] 

Marton Bod commented on HIVE-26004:
---

Pushed to master. Thanks [~pvary] for the review.

> Upgrade Iceberg to 0.13.1
> -
>
> Key: HIVE-26004
> URL: https://issues.apache.org/jira/browse/HIVE-26004
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26004) Upgrade Iceberg to 0.13.1

2022-03-04 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-26004.
---
Resolution: Fixed

> Upgrade Iceberg to 0.13.1
> -
>
> Key: HIVE-26004
> URL: https://issues.apache.org/jira/browse/HIVE-26004
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26004) Upgrade Iceberg to 0.13.1

2022-03-04 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-26004:
-


> Upgrade Iceberg to 0.13.1
> -
>
> Key: HIVE-26004
> URL: https://issues.apache.org/jira/browse/HIVE-26004
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26003:

Description: 
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

{code}
CREATE FUNCTION qtest_get_java_boolean AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
describe function extended qtest_get_java_boolean;

drop function if exists qtest_get_java_boolean_typo; #PASS, find

drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
{code}

  was:
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 


> DROP FUNCTION silently passes when function doesn't exist
> -
>
> Key: HIVE-26003
> URL: https://issues.apache.org/jira/browse/HIVE-26003
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
> especially because hive has "DROP FUNCTION IF EXISTS".
> I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
> and I thought it simply dropped the function, but then it kept working. I 
> realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
> registered as "default.myfunc". This "default" usecase is just one example 
> where DROP FUNCTION seems to work expected but silently causes confusion. 
> {code}
> CREATE FUNCTION qtest_get_java_boolean AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFTestGetJavaBoolean';
> describe function extended qtest_get_java_boolean;
> drop function if exists qtest_get_java_boolean_typo; #PASS, find
> drop function qtest_get_java_boolean_typo; #PASS, should fail I believe
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26003:

Description: 
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This "default" usecase is just one example 
where DROP FUNCTION seems to work expected but silently causes confusion. 

  was:
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This is just one example where DROP FUNCTION 
seems to work expected, but silently makes confusion. 


> DROP FUNCTION silently passes when function doesn't exist
> -
>
> Key: HIVE-26003
> URL: https://issues.apache.org/jira/browse/HIVE-26003
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
> especially because hive has "DROP FUNCTION IF EXISTS".
> I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
> and I thought it simply dropped the function, but then it kept working. I 
> realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
> registered as "default.myfunc". This "default" usecase is just one example 
> where DROP FUNCTION seems to work expected but silently causes confusion. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26003:

Description: 
DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
especially because hive has "DROP FUNCTION IF EXISTS".
I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
and I thought it simply dropped the function, but then it kept working. I 
realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
registered as "default.myfunc". This is just one example where DROP FUNCTION 
seems to work expected, but silently makes confusion. 

> DROP FUNCTION silently passes when function doesn't exist
> -
>
> Key: HIVE-26003
> URL: https://issues.apache.org/jira/browse/HIVE-26003
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> DROP FUNCTION silently passes when a function doesn't exist, which is bad, 
> especially because hive has "DROP FUNCTION IF EXISTS".
> I was working with functions when I found that "DROP FUNCTION myfunc" passed, 
> and I thought it simply dropped the function, but then it kept working. I 
> realized I was supposed to call  "DROP FUNCTION default.myfunc" because it's 
> registered as "default.myfunc". This is just one example where DROP FUNCTION 
> seems to work expected, but silently makes confusion. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26003) DROP FUNCTION silently passes when function doesn't exist

2022-03-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-26003:
---

Assignee: László Bodor

> DROP FUNCTION silently passes when function doesn't exist
> -
>
> Key: HIVE-26003
> URL: https://issues.apache.org/jira/browse/HIVE-26003
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)