[jira] [Work logged] (HIVE-26136) Implement UPDATE statements for Iceberg tables

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26136?focusedWorklogId=756179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756179
 ]

ASF GitHub Bot logged work on HIVE-26136:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 05:32
Start Date: 13/Apr/22 05:32
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on code in PR #3204:
URL: https://github.com/apache/hive/pull/3204#discussion_r849090121


##
iceberg/iceberg-handler/src/test/queries/positive/update_iceberg_unpartitioned_parquet.q:
##
@@ -0,0 +1,26 @@
+drop table if exists tbl_ice;
+create external table tbl_ice(a int, b string, c int) stored by iceberg 
tblproperties ('format-version'='2');
+
+

Issue Time Tracking
---

Worklog Id: (was: 756179)
Time Spent: 0.5h  (was: 20m)

> Implement UPDATE statements for Iceberg tables
> --
>
> Key: HIVE-26136
> URL: https://issues.apache.org/jira/browse/HIVE-26136
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26136) Implement UPDATE statements for Iceberg tables

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26136?focusedWorklogId=756178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756178
 ]

ASF GitHub Bot logged work on HIVE-26136:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 05:27
Start Date: 13/Apr/22 05:27
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on code in PR #3204:
URL: https://github.com/apache/hive/pull/3204#discussion_r849088082


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergV2.java:
##
@@ -310,6 +310,39 @@ public void testDeleteStatementWithOtherTable() {
 
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 objects), 0);
   }
 
+  @Test
+  public void testUpdateStatementUnpartitioned() {
+Assume.assumeFalse("Iceberg UPDATEs are only implemented for 
non-vectorized mode for now", isVectorized);

Review Comment:
   Will reader go via non-vect* path when reading from an updated table?





Issue Time Tracking
---

Worklog Id: (was: 756178)
Time Spent: 20m  (was: 10m)

> Implement UPDATE statements for Iceberg tables
> --
>
> Key: HIVE-26136
> URL: https://issues.apache.org/jira/browse/HIVE-26136
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25907?focusedWorklogId=756170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756170
 ]

ASF GitHub Bot logged work on HIVE-25907:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 04:29
Start Date: 13/Apr/22 04:29
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on PR #2978:
URL: https://github.com/apache/hive/pull/2978#issuecomment-1097543679

   @kgyrtkirk - Can you please review the changes?




Issue Time Tracking
---

Worklog Id: (was: 756170)
Time Spent: 1h 40m  (was: 1.5h)

> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25855) Make a branch-3 release

2022-04-12 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-25855.
--
Resolution: Fixed

Apache Hive 3.1.3 has been released.

> Make a branch-3 release 
> 
>
> Key: HIVE-25855
> URL: https://issues.apache.org/jira/browse/HIVE-25855
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> This jira is to track commits for a hive release off branch-3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25416) Hive metastore memory leak because datanucleus-api-jdo bug

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25416?focusedWorklogId=756116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756116
 ]

ASF GitHub Bot logged work on HIVE-25416:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 00:47
Start Date: 13/Apr/22 00:47
Worklog Time Spent: 10m 
  Work Description: ming95 commented on PR #2555:
URL: https://github.com/apache/hive/pull/2555#issuecomment-1097435214

   @kasakrisz  @marton-bod @kgyrtkirk @pvary 
   
   CI build fails not because of this pr. Can you guys merge this pr into 
master , I think it's a necessary patch.




Issue Time Tracking
---

Worklog Id: (was: 756116)
Time Spent: 1h  (was: 50m)

> Hive metastore memory leak because datanucleus-api-jdo bug
> --
>
> Key: HIVE-25416
> URL: https://issues.apache.org/jira/browse/HIVE-25416
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.2
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.3
>
> Attachments: leak.jpg
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I encountered a memory leak case. The MAT info :
> !leak.jpg!
> Full error message is :
> {code:java}
> Cannot get Long result for param = 8 for column "`FUNCS`.`FUNC_ID`" : 
> Operation not allowed after ResultSet closed{code}
> This is because there is a bug in the JDOPersistenceManager.retrieveAll code.
> {code:java}
> // code placeholder
> JDOPersistenceManager{
> public void retrieveAll(Collection pcs, boolean useFetchPlan) {
> this.assertIsOpen();
> ArrayList failures = new ArrayList();
> Iterator i = pcs.iterator();
> while(i.hasNext()) {
> try {
> this.jdoRetrieve(i.next(), useFetchPlan);
> } catch (RuntimeException var6) {
> failures.add(var6);
> }
> }
> if (!failures.isEmpty()) {
> throw new JDOUserException(Localiser.msg("010038"), 
> (Exception[])((Exception[])failures.toArray(new Exception[failures.size()])));
> }
> }
> }
> {code}
> In some extreme cases   the function of next() does not work . This will 
> result in a very large failures ArrayList like as shown above.
>  
> The bug detail can see this : 
> [https://github.com/datanucleus/datanucleus-api-jdo/issues/106]
> This problem is fixed in datanucleus-api-jdo version 5.2.6. So we should 
> upgrade it .
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26114) jdbc connection hivesrerver2 using dfs command with prefix space will cause exception

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26114?focusedWorklogId=756109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756109
 ]

ASF GitHub Bot logged work on HIVE-26114:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 00:37
Start Date: 13/Apr/22 00:37
Worklog Time Spent: 10m 
  Work Description: ming95 commented on PR #3176:
URL: https://github.com/apache/hive/pull/3176#issuecomment-1097429520

   @pvary 
   
   sorry for the late response...  I'll add some UTs later.
   
   
   




Issue Time Tracking
---

Worklog Id: (was: 756109)
Time Spent: 40m  (was: 0.5h)

> jdbc connection hivesrerver2 using dfs command with prefix space will cause 
> exception
> -
>
> Key: HIVE-26114
> URL: https://issues.apache.org/jira/browse/HIVE-26114
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.8, 3.1.2
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
>         Connection con = 
> DriverManager.getConnection("jdbc:hive2://10.214.35.115:1/");
>         Statement stmt = con.createStatement();
>         // dfs command with prefix space or "\n"
>         ResultSet res = stmt.executeQuery(" dfs -ls /");
>         //ResultSet res = stmt.executeQuery("\ndfs -ls /"); {code}
> it will cause exception
> {code:java}
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: 
> Error while processing statement: null
>     at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
>     at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
>     at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244)
>     at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:375)
>     at com.ne.gdc.whitemane.shezm.TestJdbc.main(TestJdbc.java:30)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> processing statement: null
>     at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>     at 
> org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:118)
>     at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>     at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>     at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> But when I execute sql with prefix "\n" it works fine
> {code:java}
> ResultSet res = stmt.executeQuery("\n select 1"); {code}



--
This message was sent by Atlassian Jira

[jira] [Work logged] (HIVE-24969) Predicates may be removed when decorrelating subqueries with lateral

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24969?focusedWorklogId=756103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756103
 ]

ASF GitHub Bot logged work on HIVE-24969:
-

Author: ASF GitHub Bot
Created on: 13/Apr/22 00:19
Start Date: 13/Apr/22 00:19
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3018:
URL: https://github.com/apache/hive/pull/3018#issuecomment-1097420492

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 756103)
Time Spent: 2h 50m  (was: 2h 40m)

> Predicates may be removed when decorrelating subqueries with lateral
> 
>
> Key: HIVE-24969
> URL: https://issues.apache.org/jira/browse/HIVE-24969
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Step to reproduce:
> {code:java}
> select count(distinct logItem.triggerId)
> from service_stat_log LATERAL VIEW explode(logItems) LogItemTable AS logItem
> where logItem.dsp in ('delivery', 'ocpa')
> and logItem.iswin = true
> and logItem.adid in (
>  select distinct adId
>  from ad_info
>  where subAccountId in (16010, 14863));  {code}
> For predicates _logItem.dsp in ('delivery', 'ocpa')_  and _logItem.iswin = 
> true_ are removed when doing ppd: JOIN ->   RS  -> LVJ.  The JOIN has 
> candicates: logitem -> [logItem.dsp in ('delivery', 'ocpa'), logItem.iswin = 
> true],when pushing them to the RS followed by LVJ,  none of them are pushed, 
> the candicates of logitem are removed finally by default, which cause to the 
> wrong result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?focusedWorklogId=755982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755982
 ]

ASF GitHub Bot logged work on HIVE-26117:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 19:23
Start Date: 12/Apr/22 19:23
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera commented on code in PR #3179:
URL: https://github.com/apache/hive/pull/3179#discussion_r848794055


##
ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslatorHelper.java:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.metadata.HiveUtils;
+
+/**
+ * Class containing static methods that help populate the UnparseTranslator.
+ */
+public class UnparseTranslatorHelper {
+
+  /**
+   * Adds translation to the unparseTranslator for the RexNode if it is a 
RexInputRef.
+   * Grabs the inputRef information from the given RowResolver.
+   */
+  public static void addTranslationIfNeeded(ASTNode astNode, RexNode rexNode, 
RowResolver rr,

Review Comment:
   I have mixed feelings about this.
   
   I don't particularly like making UnparseTranslator dependent on some of 
these other classes. I suppose RexNode and ASTNode do not matter too much since 
they will always be dependencies but i'm not thrilled about adding a dependency 
to RowResolver.
   
   Having said that, they are in the same package right now and it will be easy 
enough to separate them, so I don't mind doing what you propose.





Issue Time Tracking
---

Worklog Id: (was: 755982)
Time Spent: 1h  (was: 50m)

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?focusedWorklogId=755980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755980
 ]

ASF GitHub Bot logged work on HIVE-26117:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 19:21
Start Date: 12/Apr/22 19:21
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera commented on code in PR #3179:
URL: https://github.com/apache/hive/pull/3179#discussion_r848792532


##
ql/src/test/results/clientpositive/llap/views_explain_ddl.q.out:
##
@@ -305,7 +305,7 @@ TBLPROPERTIES (
 ALTER TABLE db1.table2_n13 UPDATE STATISTICS 
SET('numRows'='0','rawDataSize'='0' );
 ALTER TABLE db1.table1_n19 UPDATE STATISTICS 
SET('numRows'='0','rawDataSize'='0' );
 
-CREATE VIEW `db1`.`v3_n3` AS SELECT `t1`.`key`, `t1`.`value`, `t2`.`key` `k` 
FROM `db1`.`table1_n19` `t1` JOIN `db1`.`table2_n13` `t2` ON `t1`.`key` = 
`t2`.`key`;
+CREATE VIEW `db1`.`v3_n3` AS SELECT `t1`.`key`, `t1`.`value`, `t2`.`key` `k` 
FROM `db1`.`table1_n19` `t1` JOIN `db1`.`table2_n13` `t2` ON t1.key = t2.key;

Review Comment:
   Makes sense.  Didn't know what the unparsed translator does until now, 
thanks!





Issue Time Tracking
---

Worklog Id: (was: 755980)
Time Spent: 50m  (was: 40m)

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?focusedWorklogId=755979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755979
 ]

ASF GitHub Bot logged work on HIVE-26117:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 19:20
Start Date: 12/Apr/22 19:20
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera commented on code in PR #3179:
URL: https://github.com/apache/hive/pull/3179#discussion_r848792227


##
ql/src/test/results/clientnegative/joinneg.q.out:
##
@@ -1 +1 @@
-FAILED: SemanticException [Error 10004]: Line 6:12 Invalid table alias or 
column reference 'b': (possible column names are: x.key, x.value, y.key, 
y.value)
+FAILED: SemanticException [Error 10009]: Line 6:12 Invalid table alias 'b'

Review Comment:
   Agreed, but I think that's an issue for the new code.  I would rather remove 
the lines of code that converts to ExprNodes (but keep the unparsetranslator 
part.





Issue Time Tracking
---

Worklog Id: (was: 755979)
Time Spent: 40m  (was: 0.5h)

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26139) URL Encoding from HIVE-26015 was a bit too aggressive

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26139:
--
Labels: pull-request-available  (was: )

> URL Encoding from HIVE-26015 was a bit too aggressive
> -
>
> Key: HIVE-26139
> URL: https://issues.apache.org/jira/browse/HIVE-26139
> Project: Hive
>  Issue Type: Bug
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The fix for HIVE-26015 was a bit too aggressive in the URL encoding. 
> We should only encode space characters for now since this was the bug that 
> was originally reported.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26139) URL Encoding from HIVE-26015 was a bit too aggressive

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26139?focusedWorklogId=755941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755941
 ]

ASF GitHub Bot logged work on HIVE-26139:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 18:35
Start Date: 12/Apr/22 18:35
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera opened a new pull request, #3206:
URL: https://github.com/apache/hive/pull/3206

   Unit test for hash tag url encoding already exists in TestHBaseStorageHandler
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 755941)
Remaining Estimate: 0h
Time Spent: 10m

> URL Encoding from HIVE-26015 was a bit too aggressive
> -
>
> Key: HIVE-26139
> URL: https://issues.apache.org/jira/browse/HIVE-26139
> Project: Hive
>  Issue Type: Bug
>Reporter: Steve Carlin
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The fix for HIVE-26015 was a bit too aggressive in the URL encoding. 
> We should only encode space characters for now since this was the bug that 
> was originally reported.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24299) hive-ql guava versions and vulneralities

2022-04-12 Thread shahbaz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahbaz updated HIVE-24299:
---
Summary: hive-ql guava versions and vulneralities  (was: hive-ql guava 
versions and vulnerabilities)

> hive-ql guava versions and vulneralities
> 
>
> Key: HIVE-24299
> URL: https://issues.apache.org/jira/browse/HIVE-24299
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Affects Versions: 3.1.2
>Reporter: openlookeng
>Priority: Blocker
>
> hive-ql shades google's guava 19.0 component, but have vulnerabilities 
> CVE-2018-10237, do team have plan to update it ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24299) hive-ql guava versions and vulnerabilities

2022-04-12 Thread shahbaz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahbaz updated HIVE-24299:
---
Summary: hive-ql guava versions and vulnerabilities  (was: hive-ql guava 
versions and vulneralities)

> hive-ql guava versions and vulnerabilities
> --
>
> Key: HIVE-24299
> URL: https://issues.apache.org/jira/browse/HIVE-24299
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Affects Versions: 3.1.2
>Reporter: openlookeng
>Priority: Blocker
>
> hive-ql shades google's guava 19.0 component, but have vulnerabilities 
> CVE-2018-10237, do team have plan to update it ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26127?focusedWorklogId=755910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755910
 ]

ASF GitHub Bot logged work on HIVE-26127:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 17:29
Start Date: 12/Apr/22 17:29
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on code in PR #3198:
URL: https://github.com/apache/hive/pull/3198#discussion_r848688003


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -5399,7 +5399,12 @@ public void cleanUpOneDirectoryForReplace(Path path, 
FileSystem fs,
 if (isNeedRecycle && conf.getBoolVar(HiveConf.ConfVars.REPLCMENABLED)) {
   recycleDirToCmPath(path, purge);

Review Comment:
   Thank you for catching this. It looks like `recycleDirToCmPath` might also 
failed if a path does not exist. The intention was to avoid one additional file 
system call, however, it seems unavoidable. I added the path check before 
calling `cleanUpOneDirectoryForReplace`.





Issue Time Tracking
---

Worklog Id: (was: 755910)
Time Spent: 0.5h  (was: 20m)

> Insert overwrite throws FileNotFound when destination partition is deleted 
> ---
>
> Key: HIVE-26127
> URL: https://issues.apache.org/jira/browse/HIVE-26127
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # create external table src (col int) partitioned by (year int);
>  # create external table dest (col int) partitioned by (year int);
>  # insert into src partition (year=2022) values (1);
>  # insert into dest partition (year=2022) values (2);
>  # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
>  # insert overwrite table dest select * from src;
> We will get FileNotFoundException as below.
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1
>  could not be cleaned up.
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387)
>     at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) {code}
> It is because it call listStatus on a path doesn't exist. We should not fail 
> insert overwrite because there is nothing to be clean up.
> {code:java}
> fs.listStatus(path, pathFilter){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25091) Implement connector provider for MSSQL and Oracle

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25091?focusedWorklogId=755904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755904
 ]

ASF GitHub Bot logged work on HIVE-25091:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 17:23
Start Date: 12/Apr/22 17:23
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on code in PR #3167:
URL: https://github.com/apache/hive/pull/3167#discussion_r848677371


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/AbstractJDBCConnectorProvider.java:
##
@@ -27,6 +27,7 @@
 import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
 import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
 import 
org.apache.hadoop.hive.metastore.dataconnector.AbstractDataConnectorProvider;
+import org.apache.hadoop.hive.metastore.dataconnector.IDataConnectorProvider;

Review Comment:
   nit: Unnecessary import?



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MSSQLConnectorProvider.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.dataconnector.jdbc;
+
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.DataConnector;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.ResultSet;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.List;
+
+public class MSSQLConnectorProvider extends AbstractJDBCConnectorProvider {
+private static Logger LOG = 
LoggerFactory.getLogger(MySQLConnectorProvider.class);
+private static final String DRIVER_CLASS = 
"com.microsoft.sqlserver.jdbc.SQLServerDriver".intern();
+
+public MSSQLConnectorProvider(String dbName, DataConnector dataConn) {
+super(dbName, dataConn, DRIVER_CLASS);
+driverClassName = DRIVER_CLASS;
+}
+
+@Override protected ResultSet fetchTableMetadata(String tableName) throws 
MetaException {
+ResultSet rs = null;
+try {
+rs = getConnection().getMetaData().getColumns(null, scoped_db, 
tableName, null);
+} catch (Exception ex) {
+LOG.warn("Could not retrieve table names from remote datasource, 
cause:" + ex.getMessage());
+throw new MetaException("Could not retrieve table names from 
remote datasource, cause:" + ex);
+}
+return rs;
+}
+
+@Override protected ResultSet fetchTableNames() throws MetaException {
+ResultSet rs = null;
+try {
+rs = getConnection().getMetaData().getTables(null, scoped_db, 
null, new String[] { "TABLE" });
+} catch (SQLException sqle) {
+LOG.warn("Could not retrieve table names from remote datasource, 
cause:" + sqle.getMessage());
+throw new MetaException("Could not retrieve table names from 
remote datasource, cause:" + sqle);
+}
+return rs;
+}
+
+@Override protected String getCatalogName() {
+return null;
+}
+
+@Override protected String getDatabaseName() {
+return scoped_db;
+}
+
+protected String getDataType(String dbDataType, int size) {
+String mappedType = super.getDataType(dbDataType, size);
+if (!mappedType.equalsIgnoreCase(ColumnType.VOID_TYPE_NAME)) {
+return mappedType;
+}
+
+// map any db specific types here.
+//TODO: bit data types of oracle needs to be supported.

Review Comment:
   nit: TODO comment references oracle but this is MSSQL



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/OracleConnectorProvider.java:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * 

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=755876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755876
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 16:41
Start Date: 12/Apr/22 16:41
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r848648030


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##
@@ -422,21 +415,46 @@ void findUnknownPartitions(Table table, Set 
partPaths, byte[] filterExp,
   }
   allPartDirs = partDirs;
 }
-// don't want the table dir
-allPartDirs.remove(tablePath);
-
-// remove the partition paths we know about
-allPartDirs.removeAll(partPaths);
-
 Set partColNames = Sets.newHashSet();
 for(FieldSchema fSchema : getPartCols(table)) {
   partColNames.add(fSchema.getName());
 }
 
 Map partitionColToTypeMap = 
getPartitionColtoTypeMap(table.getPartitionKeys());
+
+FileSystem fs = tablePath.getFileSystem(conf);
+Set correctPartPathsInMS = new HashSet<>(partPathsInMS);

Review Comment:
   At this place we have 4 more-or-less similar copies of file listing in 
memory:
   1. `partPaths` - Path objects from the HMS and every parent of the partitions
   2. `partPathsInMS` - Path objects from the HMS
   3.  `correctPartPathsInMS` - This will be the final result, but here this is 
a duplicate of the  partPathsInMS`
   4. `allPartDirs` - Recursive listing of the table root dir(?)
   
   Do we need all of these? Would it be better to store only the difference of 
the current `partPaths` and `partPathsInMS` in a list instead of storing the 
full list again?
   
   Could we build up the `correctPartPathsInMS` when we are iterating through 
the `partPathsInMS`? Would that be comparable in time complexity and more 
optimal in space complexity?





Issue Time Tracking
---

Worklog Id: (was: 755876)
Time Spent: 5h 40m  (was: 5.5h)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> 

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=755865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755865
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 16:25
Start Date: 12/Apr/22 16:25
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r848634890


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##
@@ -356,7 +347,7 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
   }
 }
 
-findUnknownPartitions(table, partPaths, filterExp, result);
+findUnknownPartitions(table, partPaths, partPathsInMS, filterExp, result);

Review Comment:
   Am I right when I think that the `partPaths` and the `partPathsInMS` contain 
basically the same path objects.
   If the dir is like this
   ```
   ---root
+ -+---P=1/K=1
 |\---P=1/K=2
 +-+---P=3/K=1
  \---P=3/K=3
   ```
   `partPathsInMS`:
   - root/P=1/K=1
   - root/P=1/K=2
   - root/P=3/K=1
   - root/P=3/K=3
   
   `partPath`:
   - root
   - root/P=1
   - root/P=1/K=1
   - root/P=1/K=2
   - root/P=3
   - root/P=3/K=1
   - root/P=3/K=3
   
   Do I understand the above correctly?
   
   Do we need to send all of these to the filter? Isn't this duplicated info?





Issue Time Tracking
---

Worklog Id: (was: 755865)
Time Spent: 5.5h  (was: 5h 20m)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> 

[jira] [Work logged] (HIVE-26134) Remove Hive on Spark from the main branch

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26134?focusedWorklogId=755856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755856
 ]

ASF GitHub Bot logged work on HIVE-26134:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 16:01
Start Date: 12/Apr/22 16:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3201:
URL: https://github.com/apache/hive/pull/3201#discussion_r848609807


##
pom.xml:
##
@@ -1627,24 +1585,19 @@
 true
  Remove Hive on Spark from the main branch
> -
>
> Key: HIVE-26134
> URL: https://issues.apache.org/jira/browse/HIVE-26134
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Based on this discussion 
> [here|https://lists.apache.org/thread/nxg2jpngp72t6clo90407jgqxnmdm5g4] there 
> is no activity on keeping the feature up-to-date.
> We should remove it from the main line to help ongoing development efforts 
> and keep the testing cheaper/faster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26135?focusedWorklogId=755848=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755848
 ]

ASF GitHub Bot logged work on HIVE-26135:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 15:44
Start Date: 12/Apr/22 15:44
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request, #3205:
URL: https://github.com/apache/hive/pull/3205

   
   
   ### What changes were proposed in this pull request?
   
   restricts the optimization to only happen in case `X IS NULL` is present and 
`X` is Strong and there is no `CAST` in `X`.
   ...in fact in Hive `CAST` is not `Strong`; but its not possible to change 
how `CAST` is treated in the current version of Calcite
   
   ### Why are the changes needed?
   without this patch incorrect optimaztion may happen
   
   
   ### Does this PR introduce _any_ user-facing change?
   not really - it only makes the optimization less zealous
   
   ### How was this patch tested?
   a qtest was added to cover for it
   




Issue Time Tracking
---

Worklog Id: (was: 755848)
Remaining Estimate: 0h
Time Spent: 10m

> Invalid Anti join conversion may cause missing results
> --
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where 
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26135:
--
Labels: pull-request-available  (was: )

> Invalid Anti join conversion may cause missing results
> --
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where 
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26136) Implement UPDATE statements for Iceberg tables

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26136?focusedWorklogId=755840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755840
 ]

ASF GitHub Bot logged work on HIVE-26136:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 15:38
Start Date: 12/Apr/22 15:38
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request, #3204:
URL: https://github.com/apache/hive/pull/3204

   ### What changes were proposed in this pull request?
   Implemented the UPDATE for Iceberg
   
   
   ### Why are the changes needed?
   Iceberg update
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Update runs
   
   ### How was this patch tested?
   Unit tests
   




Issue Time Tracking
---

Worklog Id: (was: 755840)
Remaining Estimate: 0h
Time Spent: 10m

> Implement UPDATE statements for Iceberg tables
> --
>
> Key: HIVE-26136
> URL: https://issues.apache.org/jira/browse/HIVE-26136
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26136) Implement UPDATE statements for Iceberg tables

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26136:
--
Labels: pull-request-available  (was: )

> Implement UPDATE statements for Iceberg tables
> --
>
> Key: HIVE-26136
> URL: https://issues.apache.org/jira/browse/HIVE-26136
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26137) Optimized transfer of Iceberg residual expressions from AM to execution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26137?focusedWorklogId=755822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755822
 ]

ASF GitHub Bot logged work on HIVE-26137:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 15:16
Start Date: 12/Apr/22 15:16
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3203:
URL: https://github.com/apache/hive/pull/3203#discussion_r848560366


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/orc/VectorizedReadUtils.java:
##
@@ -160,8 +161,9 @@ public static void handleIcebergProjection(FileScanTask 
task, JobConf job, TypeD
 job.set(ColumnProjectionUtils.ORC_SCHEMA_STRING, readOrcSchema.toString());
 
 // Predicate pushdowns needs to be adjusted too in case of column renames, 
we let Iceberg generate this into job
-if (task.residual() != null) {
-  Expression boundFilter = Binder.bind(currentSchema.asStruct(), 
task.residual(), false);
+Expression residual = HiveIcebergInputFormat.residualForTask(task, job);
+if (residual != null) {

Review Comment:
   will this ever be null?





Issue Time Tracking
---

Worklog Id: (was: 755822)
Time Spent: 20m  (was: 10m)

> Optimized transfer of Iceberg residual expressions from AM to execution
> ---
>
> Key: HIVE-26137
> URL: https://issues.apache.org/jira/browse/HIVE-26137
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-25967 introduced a hack to prevent Iceberg filter expressions to be 
> serialized into splits. This temporary fix was to avoid OOM problems on Tez 
> AM side, but at the same time prevented predicate pushdowns to work on the 
> execution side too.
> This ticket intends to incorporate the long term solution. It turns out that 
> the file scan tasks created by Iceberg actually don't contain a "residual" 
> expressions, but rather a complete/original one. It becomes residual only 
> when it is evaluated against the tasks' partition value, which only happens 
> on the execution site. This means that the original filter is the same 
> expression for all splits in Tez AM, so we can transfer it via job conf 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-12 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521205#comment-17521205
 ] 

Zoltan Haindrich commented on HIVE-26135:
-

wanted to add a check for "Strong"-ness; however, consider:
{code}
(leftCol + rightCol) IS NULL
{code}
since we want to deduce that the nullness of the expression strongly depends on 
that `rightCol` can not be anything else than `null`... like:
{code}
(a + null) IS NULL
{code}

however; if the lefthandside is null - could also make it null; and in case 
rightCol is not in the joinkeys we could loose correct results...



> Invalid Anti join conversion may cause missing results
> --
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where 
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26137) Optimized transfer of Iceberg residual expressions from AM to execution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26137:
--
Labels: pull-request-available  (was: )

> Optimized transfer of Iceberg residual expressions from AM to execution
> ---
>
> Key: HIVE-26137
> URL: https://issues.apache.org/jira/browse/HIVE-26137
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25967 introduced a hack to prevent Iceberg filter expressions to be 
> serialized into splits. This temporary fix was to avoid OOM problems on Tez 
> AM side, but at the same time prevented predicate pushdowns to work on the 
> execution side too.
> This ticket intends to incorporate the long term solution. It turns out that 
> the file scan tasks created by Iceberg actually don't contain a "residual" 
> expressions, but rather a complete/original one. It becomes residual only 
> when it is evaluated against the tasks' partition value, which only happens 
> on the execution site. This means that the original filter is the same 
> expression for all splits in Tez AM, so we can transfer it via job conf 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26137) Optimized transfer of Iceberg residual expressions from AM to execution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26137?focusedWorklogId=755789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755789
 ]

ASF GitHub Bot logged work on HIVE-26137:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 14:36
Start Date: 12/Apr/22 14:36
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request, #3203:
URL: https://github.com/apache/hive/pull/3203

   The filter expression that goes with the file scan tasks is actually not a 
"residual" one, but rather the original data filter. This is good for us, as 
now we know that for any Hive job the expression is the same object - so we can 
transfer it another way to Hive execution processes:
   
   The expression itself is generated via 
https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java#L82-L93
 before split generation within the AM. There's nothing to prevent us from 
reusing this same logic on the executors.
   At the same time we can ask ignoreResiduals() on the table scan, so that 
Iceberg only uses the filter for split generation, but won't actually attach it 
to the file scan tasks, and therefore their enwrapping splits. On the execution 
side we can just simply retrieve the original filter expression by the logic 
above and evaluate it against the current task (whose spec and partition value 
information are present anyway), ending up with the actual residual expression 
for the task. This is then passed to the underlying file formats the same way 
as before.




Issue Time Tracking
---

Worklog Id: (was: 755789)
Remaining Estimate: 0h
Time Spent: 10m

> Optimized transfer of Iceberg residual expressions from AM to execution
> ---
>
> Key: HIVE-26137
> URL: https://issues.apache.org/jira/browse/HIVE-26137
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25967 introduced a hack to prevent Iceberg filter expressions to be 
> serialized into splits. This temporary fix was to avoid OOM problems on Tez 
> AM side, but at the same time prevented predicate pushdowns to work on the 
> execution side too.
> This ticket intends to incorporate the long term solution. It turns out that 
> the file scan tasks created by Iceberg actually don't contain a "residual" 
> expressions, but rather a complete/original one. It becomes residual only 
> when it is evaluated against the tasks' partition value, which only happens 
> on the execution site. This means that the original filter is the same 
> expression for all splits in Tez AM, so we can transfer it via job conf 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-23885) Remove Hive on Spark

2022-04-12 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23885.
---
Resolution: Duplicate

After the discussion on upstream we still see no ongoing development on the 
Hive On Spark engine, so we will remove it.

> Remove Hive on Spark
> 
>
> Key: HIVE-23885
> URL: https://issues.apache.org/jira/browse/HIVE-23885
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-23885) Remove Hive on Spark

2022-04-12 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521179#comment-17521179
 ] 

Peter Vary commented on HIVE-23885:
---

Made a mistake to create a new jira :(
HIVE-26134

> Remove Hive on Spark
> 
>
> Key: HIVE-23885
> URL: https://issues.apache.org/jira/browse/HIVE-23885
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26134) Remove Hive on Spark from the main branch

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26134?focusedWorklogId=755771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755771
 ]

ASF GitHub Bot logged work on HIVE-26134:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 14:05
Start Date: 12/Apr/22 14:05
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3201:
URL: https://github.com/apache/hive/pull/3201#discussion_r848448818


##
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java:
##
@@ -34,7 +34,6 @@ public enum OperatorType implements org.apache.thrift.TEnum {
   ORCFILEMERGE(22),
   RCFILEMERGE(23),
   MERGEJOIN(24),
-  SPARKPRUNINGSINK(25),

Review Comment:
   this is a generated file and I don't see any changes in the thrift file - 
have you regenerated these files?



##
pom.xml:
##
@@ -1627,24 +1585,19 @@
 true
  Remove Hive on Spark from the main branch
> -
>
> Key: HIVE-26134
> URL: https://issues.apache.org/jira/browse/HIVE-26134
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Based on this discussion 
> [here|https://lists.apache.org/thread/nxg2jpngp72t6clo90407jgqxnmdm5g4] there 
> is no activity on keeping the feature up-to-date.
> We should remove it from the main line to help ongoing development efforts 
> and keep the testing cheaper/faster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755770
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 14:04
Start Date: 12/Apr/22 14:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848477387


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInserts.java:
##
@@ -183,6 +183,22 @@ public void 
testInsertOverwriteBucketPartitionedTableThrowsError() {
 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
target, true)));
   }
 
+  @Test
+  public void testInsertOverwriteWithPartitionEvolutionThrowsError() throws 
IOException {

Review Comment:
   Sadly this is the case.
   We should state this in the error message, and when we have the tools then 
we can add a test





Issue Time Tracking
---

Worklog Id: (was: 755770)
Time Spent: 1h 40m  (was: 1.5h)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755763
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:52
Start Date: 12/Apr/22 13:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848463614


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -460,6 +461,13 @@ public void validateSinkDesc(FileSinkDesc sinkDesc) throws 
SemanticException {
   if (IcebergTableUtil.isBucketed(table)) {
 throw new SemanticException("Cannot perform insert overwrite query on 
bucket partitioned Iceberg table.");
   }
+  if (table.currentSnapshot() != null) {
+if 
(table.currentSnapshot().allManifests().parallelStream().map(ManifestFile::partitionSpecId)
+.filter(id -> id < table.spec().specId()).findAny().isPresent()) {
+  throw new SemanticException(
+  "Cannot perform insert overwrite query on Iceberg table where 
partition evolution happened.");

Review Comment:
   I guess you're right. We can only resolve this using merge + compaction as 
far as I know (neither of which are available in Hive currently). So let's 
leave the message as it is, and then we can later extend it by adding some 
useful tips on how to do the rewrite





Issue Time Tracking
---

Worklog Id: (was: 755763)
Time Spent: 1.5h  (was: 1h 20m)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755759
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:46
Start Date: 12/Apr/22 13:46
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848456955


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -460,6 +461,13 @@ public void validateSinkDesc(FileSinkDesc sinkDesc) throws 
SemanticException {
   if (IcebergTableUtil.isBucketed(table)) {
 throw new SemanticException("Cannot perform insert overwrite query on 
bucket partitioned Iceberg table.");
   }
+  if (table.currentSnapshot() != null) {
+if 
(table.currentSnapshot().allManifests().parallelStream().map(ManifestFile::partitionSpecId)
+.filter(id -> id < table.spec().specId()).findAny().isPresent()) {
+  throw new SemanticException(
+  "Cannot perform insert overwrite query on Iceberg table where 
partition evolution happened.");

Review Comment:
   I didn't add any recommendations, because I don't know how can we enforce 
the rewrite of the data. Do we have an example query? 





Issue Time Tracking
---

Worklog Id: (was: 755759)
Time Spent: 1h 20m  (was: 1h 10m)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755757
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:45
Start Date: 12/Apr/22 13:45
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848455472


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInserts.java:
##
@@ -183,6 +183,22 @@ public void 
testInsertOverwriteBucketPartitionedTableThrowsError() {
 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
target, true)));
   }
 
+  @Test
+  public void testInsertOverwriteWithPartitionEvolutionThrowsError() throws 
IOException {

Review Comment:
   Actually, maybe the merge won't help here either using the current logic, 
because it won't delete the old data files, just remove the rows using delete 
files...
   So only compaction 
(https://iceberg.apache.org/docs/latest/maintenance/#compact-data-files) will 
help? - @pvary any thoughts ?





Issue Time Tracking
---

Worklog Id: (was: 755757)
Time Spent: 1h 10m  (was: 1h)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755756
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:44
Start Date: 12/Apr/22 13:44
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848455472


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInserts.java:
##
@@ -183,6 +183,22 @@ public void 
testInsertOverwriteBucketPartitionedTableThrowsError() {
 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
target, true)));
   }
 
+  @Test
+  public void testInsertOverwriteWithPartitionEvolutionThrowsError() throws 
IOException {

Review Comment:
   Actually, maybe the merge won't help here either using the current logic, 
because it won't delete the old data files, just remove the rows using delete 
files...
   So only compaction 
(https://iceberg.apache.org/docs/latest/maintenance/#compact-data-files) will 
help - @pvary any thoughts ?





Issue Time Tracking
---

Worklog Id: (was: 755756)
Time Spent: 1h  (was: 50m)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755754
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:42
Start Date: 12/Apr/22 13:42
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848453269


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -460,6 +461,13 @@ public void validateSinkDesc(FileSinkDesc sinkDesc) throws 
SemanticException {
   if (IcebergTableUtil.isBucketed(table)) {
 throw new SemanticException("Cannot perform insert overwrite query on 
bucket partitioned Iceberg table.");
   }
+  if (table.currentSnapshot() != null) {
+if 
(table.currentSnapshot().allManifests().parallelStream().map(ManifestFile::partitionSpecId)
+.filter(id -> id < table.spec().specId()).findAny().isPresent()) {

Review Comment:
   You are right, this can be simplified. 





Issue Time Tracking
---

Worklog Id: (was: 755754)
Time Spent: 50m  (was: 40m)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755753
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:39
Start Date: 12/Apr/22 13:39
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848449913


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInserts.java:
##
@@ -183,6 +183,22 @@ public void 
testInsertOverwriteBucketPartitionedTableThrowsError() {
 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
target, true)));
   }
 
+  @Test
+  public void testInsertOverwriteWithPartitionEvolutionThrowsError() throws 
IOException {

Review Comment:
   It would be great to have a test where we:
   - create table and insert data
   - evolve partition spec
   - insert more data
   - try IOW -> fails
   - rewrite old data files
   - try IOW -> succeeds
   
   Right now we don't have the tools for this, because rewriting the old data 
will be best achieved using the MERGE statement - so in the meantime, can we 
insert a TODO line to add some testing for this once MERGE is available?





Issue Time Tracking
---

Worklog Id: (was: 755753)
Time Spent: 40m  (was: 0.5h)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755751
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:34
Start Date: 12/Apr/22 13:34
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848443468


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -460,6 +461,13 @@ public void validateSinkDesc(FileSinkDesc sinkDesc) throws 
SemanticException {
   if (IcebergTableUtil.isBucketed(table)) {
 throw new SemanticException("Cannot perform insert overwrite query on 
bucket partitioned Iceberg table.");
   }
+  if (table.currentSnapshot() != null) {
+if 
(table.currentSnapshot().allManifests().parallelStream().map(ManifestFile::partitionSpecId)
+.filter(id -> id < table.spec().specId()).findAny().isPresent()) {
+  throw new SemanticException(
+  "Cannot perform insert overwrite query on Iceberg table where 
partition evolution happened.");

Review Comment:
   Can you please add a note that the table must be rewritten according to the 
last spec in order to be able to start using IOW again? IOW can be an unsafe 
operation when there are different data files written out according to 
different partition specs





Issue Time Tracking
---

Worklog Id: (was: 755751)
Time Spent: 0.5h  (was: 20m)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755750
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:32
Start Date: 12/Apr/22 13:32
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3202:
URL: https://github.com/apache/hive/pull/3202#discussion_r848441593


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -460,6 +461,13 @@ public void validateSinkDesc(FileSinkDesc sinkDesc) throws 
SemanticException {
   if (IcebergTableUtil.isBucketed(table)) {
 throw new SemanticException("Cannot perform insert overwrite query on 
bucket partitioned Iceberg table.");
   }
+  if (table.currentSnapshot() != null) {
+if 
(table.currentSnapshot().allManifests().parallelStream().map(ManifestFile::partitionSpecId)
+.filter(id -> id < table.spec().specId()).findAny().isPresent()) {

Review Comment:
   nit: you could use `anyMatch(filter)` instead of 
`filter(filter).findAny().isPresent()`. Just maybe a litte bit more readable





Issue Time Tracking
---

Worklog Id: (was: 755750)
Time Spent: 20m  (was: 10m)

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?focusedWorklogId=755745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755745
 ]

ASF GitHub Bot logged work on HIVE-26133:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 13:27
Start Date: 12/Apr/22 13:27
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request, #3202:
URL: https://github.com/apache/hive/pull/3202

   
   
   
   ### What changes were proposed in this pull request?
   
   
   When running an IOW operation we should check whether partition evolution 
happened or not. If it is the case, we should validate that the specId of the 
data manifest files are the same as the current specId. If all the specIds 
match the current one, it means that the data was rewritten and we are safe to 
complete the IOW operation. Otherwise, we should throw an exception. 
   
   ### Why are the changes needed?
   
   Avoid data duplication when running IOW on Iceberg tables with partition 
evolution
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Manual test, unit test




Issue Time Tracking
---

Worklog Id: (was: 755745)
Remaining Estimate: 0h
Time Spent: 10m

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26133:
--
Labels: pull-request-available  (was: )

> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26136) Implement UPDATE statements for Iceberg tables

2022-04-12 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26136:
-


> Implement UPDATE statements for Iceberg tables
> --
>
> Key: HIVE-26136
> URL: https://issues.apache.org/jira/browse/HIVE-26136
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26134) Remove Hive on Spark from the main branch

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26134?focusedWorklogId=755733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755733
 ]

ASF GitHub Bot logged work on HIVE-26134:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 12:55
Start Date: 12/Apr/22 12:55
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request, #3201:
URL: https://github.com/apache/hive/pull/3201

   ### What changes were proposed in this pull request?
   Remove Hive on Spark feature from Hive
   
   ### Why are the changes needed?
   The feature is not maintained, an continuously cost us several ways:
   - development effort
   - testing effort
   - complexity
   The feature never reached the level of maturity than Tez, and based on the 
current development activity it never will.
   We should remove the feature to allow faster development on used features.
   
   ### Does this PR introduce _any_ user-facing change?
   Removes the `spark` execution engine
   
   ### How was this patch tested?
   Will see the effect in unit tests




Issue Time Tracking
---

Worklog Id: (was: 755733)
Remaining Estimate: 0h
Time Spent: 10m

> Remove Hive on Spark from the main branch
> -
>
> Key: HIVE-26134
> URL: https://issues.apache.org/jira/browse/HIVE-26134
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Based on this discussion 
> [here|https://lists.apache.org/thread/nxg2jpngp72t6clo90407jgqxnmdm5g4] there 
> is no activity on keeping the feature up-to-date.
> We should remove it from the main line to help ongoing development efforts 
> and keep the testing cheaper/faster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26134) Remove Hive on Spark from the main branch

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26134:
--
Labels: pull-request-available  (was: )

> Remove Hive on Spark from the main branch
> -
>
> Key: HIVE-26134
> URL: https://issues.apache.org/jira/browse/HIVE-26134
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Based on this discussion 
> [here|https://lists.apache.org/thread/nxg2jpngp72t6clo90407jgqxnmdm5g4] there 
> is no activity on keeping the feature up-to-date.
> We should remove it from the main line to help ongoing development efforts 
> and keep the testing cheaper/faster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-12 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26135:
---


> Invalid Anti join conversion may cause missing results
> --
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where 
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26134) Remove Hive on Spark from the main branch

2022-04-12 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-26134:
--
Summary: Remove Hive on Spark from the main branch  (was: Remove Hive on 
Spark from the main branch branch)

> Remove Hive on Spark from the main branch
> -
>
> Key: HIVE-26134
> URL: https://issues.apache.org/jira/browse/HIVE-26134
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>
> Based on this discussion 
> [here|https://lists.apache.org/thread/nxg2jpngp72t6clo90407jgqxnmdm5g4] there 
> is no activity on keeping the feature up-to-date.
> We should remove it from the main line to help ongoing development efforts 
> and keep the testing cheaper/faster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26133) Insert overwrite on Iceberg tables can result in duplicate entries after partition evolution

2022-04-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-26133:



> Insert overwrite on Iceberg tables can result in duplicate entries after 
> partition evolution
> 
>
> Key: HIVE-26133
> URL: https://issues.apache.org/jira/browse/HIVE-26133
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> Insert overwrite commands in Hive only rewrite partitions affected by the 
> query.
> If we write out a record with specA (e.g. day(ts)), resulting in a datafile:
> "/tableRoot/data/ts_day="2020-10-24"/.orc
> If you then change to specB (e.g. day(ts), name), the same record would go to 
> a different partition:
> "/tableRoot/data/ts_day="2020-10-24"/name="Mike"/.orc
> If you then want to overwrite the table with itself, it will detect these two 
> records to belong to different partitions (as they do), and therefore does 
> not overwrite the original record with the new one, resulting in duplicate 
> entries.
> {code:java}
> create table testice1000 (a int, b string) stored by iceberg stored as orc 
> location 'file:/tmp/testice1000';
> insert into testice1000 values (11, 'ddd'), (22, 'ttt');
> alter table testice1000 set partition spec(truncate(2, b));
> insert into testice1000 values (33, 'rrfdfdf');
> insert overwrite table testice1000 select * from testice1000;
> --+
> testice1000.a testice1000.b
> --+
> 11 ddd   
> 11 ddd   
> 22 ttt   
> 22 ttt   
> 33 rrfdfdf
> --+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26132) Schematool upgradeSchema fails with nullPointerException

2022-04-12 Thread David (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David updated HIVE-26132:
-
Description: 
When running schematool upgradeSchema against a mysql database with a 
metastore_db, I get a nullPointerException. The command is:

 

{{schematool -dbType mysql -upgradeSchema -verbose}}

 

The same exception can be created by running the relevant hive upgrade script 
directly in beeline with the following command:

 

{{beeline -u jdbc:mysql://mysql:3306/metastore_db -n [USER] -p[PASS] -f 
/usr/local/hive/scripts/metastore/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql}}

 

Removing the following lines from the sql script fixes this (as does replacing 
`AS ' '` with `AS 'something'`):

 

{{SELECT 'Upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}

{{SELECT 'Finished upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}

 

The beeline exception is:
{quote}Connecting to jdbc:mysql://mysql:3306/metastore_db
Connected to: MySQL (version 5.6.51)
Driver: MySQL Connector/J (version mysql-connector-java-8.0.28 (Revision: 
7ff2161da3899f379fb3171b6538b191b1c5c7e2))
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:mysql://mysql:3306/metastore_db> SELECT 'Finished upgrading MetaStore 
schema from 2.3.0 to 3.0.0' AS ' ';
The statement instance is not HiveStatement type: class 
com.mysql.cj.jdbc.StatementImpl
The statement instance is not HiveStatement type: class 
com.mysql.cj.jdbc.StatementImpl
java.lang.NullPointerException
    at java.lang.StringBuilder.(StringBuilder.java:112)
    at org.apache.hive.beeline.ColorBuffer.center(ColorBuffer.java:81)
    at 
org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:123)
    at 
org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:108)
    at 
org.apache.hive.beeline.TableOutputFormat.print(TableOutputFormat.java:51)
    at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2257)
    at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1026)
    at org.apache.hive.beeline.Commands.execute(Commands.java:1201)
    at org.apache.hive.beeline.Commands.sql(Commands.java:1130)
    at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1425)
    at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1287)
    at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1261)
    at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1064)
    at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:538)
    at org.apache.hive.beeline.BeeLine.main(BeeLine.java:520)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Closing: 0: jdbc:mysql://mysql:3306/metastore_db
{quote}

  was:
When running schematool upgradeSchema against a mysql database with a 
metastore_db, I get a nullPointerException. The command is:

 

{{schematool -dbType mysql -upgradeSchema -verbose}}

 

The same exception can be created by running the relevant hive upgrade script 
directly in beeline with the following command:

 

{{beeline -u jdbc:mysql://mysql:3306/metastore_db -n [USER] -p[PASS] -f 
/usr/local/hive/scripts/metastore/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql}}

 

Removing the following lines from the sql script fixes this (as does replacing 
the space after AS with other characters):

 

{{SELECT 'Upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}

{{SELECT 'Finished upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}

 

The beeline exception is:
{quote}Connecting to jdbc:mysql://mysql:3306/metastore_db
Connected to: MySQL (version 5.6.51)
Driver: MySQL Connector/J (version mysql-connector-java-8.0.28 (Revision: 
7ff2161da3899f379fb3171b6538b191b1c5c7e2))
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:mysql://mysql:3306/metastore_db> SELECT 'Finished upgrading MetaStore 
schema from 2.3.0 to 3.0.0' AS ' ';
The statement instance is not HiveStatement type: class 
com.mysql.cj.jdbc.StatementImpl
The statement instance is not HiveStatement type: class 
com.mysql.cj.jdbc.StatementImpl
java.lang.NullPointerException
    at java.lang.StringBuilder.(StringBuilder.java:112)
    at org.apache.hive.beeline.ColorBuffer.center(ColorBuffer.java:81)
    at 
org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:123)
    at 
org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:108)
    at 
org.apache.hive.beeline.TableOutputFormat.print(TableOutputFormat.java:51)
    at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2257)
    at 

[jira] [Updated] (HIVE-26132) Schematool upgradeSchema fails with nullPointerException

2022-04-12 Thread David (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David updated HIVE-26132:
-
Description: 
When running schematool upgradeSchema against a mysql database with a 
metastore_db, I get a nullPointerException. The command is:

 

{{schematool -dbType mysql -upgradeSchema -verbose}}

 

The same exception can be created by running the relevant hive upgrade script 
directly in beeline with the following command:

 

{{beeline -u jdbc:mysql://mysql:3306/metastore_db -n [USER] -p[PASS] -f 
/usr/local/hive/scripts/metastore/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql}}

 

Removing the following lines from the sql script fixes this (as does replacing 
the space after AS with other characters):

 

{{SELECT 'Upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}

{{SELECT 'Finished upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}

 

The beeline exception is:
{quote}Connecting to jdbc:mysql://mysql:3306/metastore_db
Connected to: MySQL (version 5.6.51)
Driver: MySQL Connector/J (version mysql-connector-java-8.0.28 (Revision: 
7ff2161da3899f379fb3171b6538b191b1c5c7e2))
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:mysql://mysql:3306/metastore_db> SELECT 'Finished upgrading MetaStore 
schema from 2.3.0 to 3.0.0' AS ' ';
The statement instance is not HiveStatement type: class 
com.mysql.cj.jdbc.StatementImpl
The statement instance is not HiveStatement type: class 
com.mysql.cj.jdbc.StatementImpl
java.lang.NullPointerException
    at java.lang.StringBuilder.(StringBuilder.java:112)
    at org.apache.hive.beeline.ColorBuffer.center(ColorBuffer.java:81)
    at 
org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:123)
    at 
org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:108)
    at 
org.apache.hive.beeline.TableOutputFormat.print(TableOutputFormat.java:51)
    at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2257)
    at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1026)
    at org.apache.hive.beeline.Commands.execute(Commands.java:1201)
    at org.apache.hive.beeline.Commands.sql(Commands.java:1130)
    at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1425)
    at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1287)
    at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1261)
    at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1064)
    at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:538)
    at org.apache.hive.beeline.BeeLine.main(BeeLine.java:520)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Closing: 0: jdbc:mysql://mysql:3306/metastore_db
{quote}

  was:
When running schematool upgradeSchema against a mysql database with a 
metastore_db, I get a nullPointerException. The command is:

 

{{schematool -dbType mysql -upgradeSchema -verbose}}

 

The same exception can be created by running the relevant hive upgrade script 
directly in beeline with the following command:

 

{{beeline -u jdbc:mysql://mysql:3306/metastore_db -n [USER] -p[PASS] -f 
/usr/local/hive/scripts/metastore/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql}}

 

Removing the follow lines from the sql script fixes this:

 

{{SELECT 'Upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}

{{SELECT 'Finished upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}

 

The beeline exception is:
{quote}Connecting to jdbc:mysql://mysql:3306/metastore_db
Connected to: MySQL (version 5.6.51)
Driver: MySQL Connector/J (version mysql-connector-java-8.0.28 (Revision: 
7ff2161da3899f379fb3171b6538b191b1c5c7e2))
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:mysql://mysql:3306/metastore_db> SELECT 'Finished upgrading MetaStore 
schema from 2.3.0 to 3.0.0' AS ' ';
The statement instance is not HiveStatement type: class 
com.mysql.cj.jdbc.StatementImpl
The statement instance is not HiveStatement type: class 
com.mysql.cj.jdbc.StatementImpl
java.lang.NullPointerException
    at java.lang.StringBuilder.(StringBuilder.java:112)
    at org.apache.hive.beeline.ColorBuffer.center(ColorBuffer.java:81)
    at 
org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:123)
    at 
org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:108)
    at 
org.apache.hive.beeline.TableOutputFormat.print(TableOutputFormat.java:51)
    at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2257)
    at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1026)
    at 

[jira] [Work logged] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?focusedWorklogId=755726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755726
 ]

ASF GitHub Bot logged work on HIVE-26131:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 12:32
Start Date: 12/Apr/22 12:32
Worklog Time Spent: 10m 
  Work Description: zhangbutao commented on PR #3200:
URL: https://github.com/apache/hive/pull/3200#issuecomment-109823

   @kgyrtkirk @nrg4878 Could you please take a look? thx




Issue Time Tracking
---

Worklog Id: (was: 755726)
Time Spent: 20m  (was: 10m)

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
> Attachments: image-2022-04-12-13-07-09-647.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect OuptputFormat info:
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?focusedWorklogId=755723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755723
 ]

ASF GitHub Bot logged work on HIVE-26130:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 12:29
Start Date: 12/Apr/22 12:29
Worklog Time Spent: 10m 
  Work Description: zhangbutao commented on PR #3199:
URL: https://github.com/apache/hive/pull/3199#issuecomment-1096663026

   Could you merge this pr? Or is there anything else that needs fixing? thx 
@kgyrtkirk 




Issue Time Tracking
---

Worklog Id: (was: 755723)
Time Spent: 0.5h  (was: 20m)

> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26132) Schematool upgradeSchema fails with nullPointerException

2022-04-12 Thread David (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David updated HIVE-26132:
-
Component/s: Hive

> Schematool upgradeSchema fails with nullPointerException 
> -
>
> Key: HIVE-26132
> URL: https://issues.apache.org/jira/browse/HIVE-26132
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: David
>Priority: Major
>
> When running schematool upgradeSchema against a mysql database with a 
> metastore_db, I get a nullPointerException. The command is:
>  
> {{schematool -dbType mysql -upgradeSchema -verbose}}
>  
> The same exception can be created by running the relevant hive upgrade script 
> directly in beeline with the following command:
>  
> {{beeline -u jdbc:mysql://mysql:3306/metastore_db -n [USER] -p[PASS] -f 
> /usr/local/hive/scripts/metastore/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql}}
>  
> Removing the follow lines from the sql script fixes this:
>  
> {{SELECT 'Upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}
> {{SELECT 'Finished upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}
>  
> The beeline exception is:
> {quote}Connecting to jdbc:mysql://mysql:3306/metastore_db
> Connected to: MySQL (version 5.6.51)
> Driver: MySQL Connector/J (version mysql-connector-java-8.0.28 (Revision: 
> 7ff2161da3899f379fb3171b6538b191b1c5c7e2))
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:mysql://mysql:3306/metastore_db> SELECT 'Finished upgrading MetaStore 
> schema from 2.3.0 to 3.0.0' AS ' ';
> The statement instance is not HiveStatement type: class 
> com.mysql.cj.jdbc.StatementImpl
> The statement instance is not HiveStatement type: class 
> com.mysql.cj.jdbc.StatementImpl
> java.lang.NullPointerException
>     at java.lang.StringBuilder.(StringBuilder.java:112)
>     at org.apache.hive.beeline.ColorBuffer.center(ColorBuffer.java:81)
>     at 
> org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:123)
>     at 
> org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:108)
>     at 
> org.apache.hive.beeline.TableOutputFormat.print(TableOutputFormat.java:51)
>     at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2257)
>     at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1026)
>     at org.apache.hive.beeline.Commands.execute(Commands.java:1201)
>     at org.apache.hive.beeline.Commands.sql(Commands.java:1130)
>     at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1425)
>     at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1287)
>     at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1261)
>     at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1064)
>     at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:538)
>     at org.apache.hive.beeline.BeeLine.main(BeeLine.java:520)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
> Closing: 0: jdbc:mysql://mysql:3306/metastore_db
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26132) Schematool upgradeSchema fails with nullPointerException

2022-04-12 Thread David (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David updated HIVE-26132:
-
Affects Version/s: 3.1.3

> Schematool upgradeSchema fails with nullPointerException 
> -
>
> Key: HIVE-26132
> URL: https://issues.apache.org/jira/browse/HIVE-26132
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: David
>Priority: Major
>
> When running schematool upgradeSchema against a mysql database with a 
> metastore_db, I get a nullPointerException. The command is:
>  
> {{schematool -dbType mysql -upgradeSchema -verbose}}
>  
> The same exception can be created by running the relevant hive upgrade script 
> directly in beeline with the following command:
>  
> {{beeline -u jdbc:mysql://mysql:3306/metastore_db -n [USER] -p[PASS] -f 
> /usr/local/hive/scripts/metastore/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql}}
>  
> Removing the follow lines from the sql script fixes this:
>  
> {{SELECT 'Upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}
> {{SELECT 'Finished upgrading MetaStore schema from 2.3.0 to 3.0.0' AS ' ';}}
>  
> The beeline exception is:
> {quote}Connecting to jdbc:mysql://mysql:3306/metastore_db
> Connected to: MySQL (version 5.6.51)
> Driver: MySQL Connector/J (version mysql-connector-java-8.0.28 (Revision: 
> 7ff2161da3899f379fb3171b6538b191b1c5c7e2))
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:mysql://mysql:3306/metastore_db> SELECT 'Finished upgrading MetaStore 
> schema from 2.3.0 to 3.0.0' AS ' ';
> The statement instance is not HiveStatement type: class 
> com.mysql.cj.jdbc.StatementImpl
> The statement instance is not HiveStatement type: class 
> com.mysql.cj.jdbc.StatementImpl
> java.lang.NullPointerException
>     at java.lang.StringBuilder.(StringBuilder.java:112)
>     at org.apache.hive.beeline.ColorBuffer.center(ColorBuffer.java:81)
>     at 
> org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:123)
>     at 
> org.apache.hive.beeline.TableOutputFormat.getOutputString(TableOutputFormat.java:108)
>     at 
> org.apache.hive.beeline.TableOutputFormat.print(TableOutputFormat.java:51)
>     at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2257)
>     at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1026)
>     at org.apache.hive.beeline.Commands.execute(Commands.java:1201)
>     at org.apache.hive.beeline.Commands.sql(Commands.java:1130)
>     at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1425)
>     at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1287)
>     at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1261)
>     at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1064)
>     at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:538)
>     at org.apache.hive.beeline.BeeLine.main(BeeLine.java:520)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
> Closing: 0: jdbc:mysql://mysql:3306/metastore_db
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?focusedWorklogId=755714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755714
 ]

ASF GitHub Bot logged work on HIVE-26117:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 11:59
Start Date: 12/Apr/22 11:59
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3179:
URL: https://github.com/apache/hive/pull/3179#discussion_r848327228


##
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##
@@ -2643,15 +2643,17 @@ private RelNode genJoinRelNode(RelNode leftRel, String 
leftTableAlias, RelNode r
 count++;
   }
   joinCond = count > 1 ? and : equal;
-} else if (unparseTranslator != null && unparseTranslator.isEnabled()) 
{
-  genAllExprNodeDesc(joinCond, input, jCtx);
 }
 Map exprNodes = RexNodeTypeCheck.genExprNodeJoinCond(
 joinCond, jCtx, cluster.getRexBuilder());
 if (jCtx.getError() != null) {
   throw new 
SemanticException(SemanticAnalyzer.generateErrorMessage(jCtx.getErrorSrcNode(),
   jCtx.getError()));
 }
+for (Map.Entry entry : exprNodes.entrySet()) {
+  UnparseTranslatorHelper.addTranslationIfNeeded(entry.getKey(), 
entry.getValue(),

Review Comment:
   I think we are kinda approach a point where we should have the unparse 
translator enabled at all times...
   I wonder how much we gain by keeping the support to have it disabled...



##
ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslatorHelper.java:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.metadata.HiveUtils;
+
+/**
+ * Class containing static methods that help populate the UnparseTranslator.
+ */
+public class UnparseTranslatorHelper {
+
+  /**
+   * Adds translation to the unparseTranslator for the RexNode if it is a 
RexInputRef.
+   * Grabs the inputRef information from the given RowResolver.
+   */
+  public static void addTranslationIfNeeded(ASTNode astNode, RexNode rexNode, 
RowResolver rr,

Review Comment:
   I think it would make sense to have this inside the `UnparseTranslator` 
class instead...(I don't think there will be more functions added in the near 
future)





Issue Time Tracking
---

Worklog Id: (was: 755714)
Time Spent: 0.5h  (was: 20m)

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26092?focusedWorklogId=755710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755710
 ]

ASF GitHub Bot logged work on HIVE-26092:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 11:52
Start Date: 12/Apr/22 11:52
Worklog Time Spent: 10m 
  Work Description: pvary merged PR #3185:
URL: https://github.com/apache/hive/pull/3185




Issue Time Tracking
---

Worklog Id: (was: 755710)
Time Spent: 50m  (was: 40m)

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26092?focusedWorklogId=755709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755709
 ]

ASF GitHub Bot logged work on HIVE-26092:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 11:51
Start Date: 12/Apr/22 11:51
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3185:
URL: https://github.com/apache/hive/pull/3185#discussion_r848342119


##
common/src/java/org/apache/hadoop/hive/common/type/Date.java:
##
@@ -39,7 +39,7 @@
 /**
  * This is the internal type for Date. The full qualified input format of Date
  * is "-MM-dd". For example: "2021-02-11".
- * 
+ * 

Review Comment:
   jeez! this problem will make a comeback in jdk11 :facepalm: 
   
https://stackoverflow.com/questions/22528767/how-to-work-around-the-stricter-java-8-javadoc-when-using-maven





Issue Time Tracking
---

Worklog Id: (was: 755709)
Time Spent: 40m  (was: 0.5h)

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-12 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26117:
---

Assignee: Steve Carlin

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-12 Thread Marton Bod (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521023#comment-17521023
 ] 

Marton Bod commented on HIVE-26102:
---

Pushed to master.

Thanks [~pvary] for the thorough review!

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-12 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-26102.
---
Resolution: Fixed

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755652
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 08:54
Start Date: 12/Apr/22 08:54
Worklog Time Spent: 10m 
  Work Description: marton-bod merged PR #3131:
URL: https://github.com/apache/hive/pull/3131




Issue Time Tracking
---

Worklog Id: (was: 755652)
Time Spent: 17h 40m  (was: 17.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26127) Insert overwrite throws FileNotFound when destination partition is deleted

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26127?focusedWorklogId=755651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755651
 ]

ASF GitHub Bot logged work on HIVE-26127:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 08:54
Start Date: 12/Apr/22 08:54
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3198:
URL: https://github.com/apache/hive/pull/3198#discussion_r848169084


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -5399,7 +5399,12 @@ public void cleanUpOneDirectoryForReplace(Path path, 
FileSystem fs,
 if (isNeedRecycle && conf.getBoolVar(HiveConf.ConfVars.REPLCMENABLED)) {
   recycleDirToCmPath(path, purge);

Review Comment:
   Do we want to call `recycleDirToCmPath` for a path that does not exist? More 
general, do we want to call `deleteOldPathForReplace` when the path does not 
exist?





Issue Time Tracking
---

Worklog Id: (was: 755651)
Time Spent: 20m  (was: 10m)

> Insert overwrite throws FileNotFound when destination partition is deleted 
> ---
>
> Key: HIVE-26127
> URL: https://issues.apache.org/jira/browse/HIVE-26127
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # create external table src (col int) partitioned by (year int);
>  # create external table dest (col int) partitioned by (year int);
>  # insert into src partition (year=2022) values (1);
>  # insert into dest partition (year=2022) values (2);
>  # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
>  # insert overwrite table dest select * from src;
> We will get FileNotFoundException as below.
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1
>  could not be cleaned up.
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387)
>     at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) {code}
> It is because it call listStatus on a path doesn't exist. We should not fail 
> insert overwrite because there is nothing to be clean up.
> {code:java}
> fs.listStatus(path, pathFilter){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

2022-04-12 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-25941.
---
Resolution: Fixed

Pushed to master. Thanks [~kgyrtkirk] and [~amansinha100] for review.

> Long compilation time of complex query due to analysis for materialized view 
> rewrite
> 
>
> Key: HIVE-25941
> URL: https://issues.apache.org/jira/browse/HIVE-25941
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: sample.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When compiling query the optimizer tries to rewrite the query plan or 
> subtrees of the plan to use materialized view scans.
> If
> {code}
> set hive.materializedview.rewriting.sql.subquery=false;
> {code}
> the compilation succeed in less then 10 sec otherwise it takes several 
> minutes (~ 5min) depending on the hardware.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25941?focusedWorklogId=755635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755635
 ]

ASF GitHub Bot logged work on HIVE-25941:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 08:20
Start Date: 12/Apr/22 08:20
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #3014:
URL: https://github.com/apache/hive/pull/3014




Issue Time Tracking
---

Worklog Id: (was: 755635)
Time Spent: 1h 50m  (was: 1h 40m)

> Long compilation time of complex query due to analysis for materialized view 
> rewrite
> 
>
> Key: HIVE-25941
> URL: https://issues.apache.org/jira/browse/HIVE-25941
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: sample.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When compiling query the optimizer tries to rewrite the query plan or 
> subtrees of the plan to use materialized view scans.
> If
> {code}
> set hive.materializedview.rewriting.sql.subquery=false;
> {code}
> the compilation succeed in less then 10 sec otherwise it takes several 
> minutes (~ 5min) depending on the hardware.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26092?focusedWorklogId=755618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755618
 ]

ASF GitHub Bot logged work on HIVE-26092:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 06:36
Start Date: 12/Apr/22 06:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3185:
URL: https://github.com/apache/hive/pull/3185#discussion_r848025954


##
Jenkinsfile:
##
@@ -350,6 +350,18 @@ tar -xzf 
packaging/target/apache-hive-*-nightly-*-src.tar.gz
   }
 }
   }
+  branches['javadoc-check'] = {
+executorNode {
+  stage('Prepare') {
+  loadWS();
+  }
+  stage('Generate javadoc') {
+  sh """#!/bin/bash -e
+mvn clean install javadoc:javadoc javadoc:aggregate -DskipTests

Review Comment:
   I had to exclude some of the itest modules, because those have some weird 
compilation issues when generating javadoc. Since we are not really intereseted 
in the itest javadoc, I think it is good enough for now.
   
   What do you think? 





Issue Time Tracking
---

Worklog Id: (was: 755618)
Time Spent: 0.5h  (was: 20m)

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)