[jira] [Updated] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26131:
--
Labels: pull-request-available  (was: )

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
> Attachments: image-2022-04-12-13-07-09-647.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect OuptputFormat info:
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?focusedWorklogId=755609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755609
 ]

ASF GitHub Bot logged work on HIVE-26131:
-

Author: ASF GitHub Bot
Created on: 12/Apr/22 05:33
Start Date: 12/Apr/22 05:33
Worklog Time Spent: 10m 
  Work Description: zhangbutao opened a new pull request, #3200:
URL: https://github.com/apache/hive/pull/3200

   
   
   ### What changes were proposed in this pull request?
   
   Use correct OutputFormat when describing jdbc connector table
   
   ### Why are the changes needed?
   
   Incorrect OutputFormat when describing jdbc connector table
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Small fix, just local cluster test.
   After the fixing:
   
![image](https://user-images.githubusercontent.com/9760681/162887429-5e30dd2f-8b0f-49b6-8e74-150b9a569632.png)
   




Issue Time Tracking
---

Worklog Id: (was: 755609)
Remaining Estimate: 0h
Time Spent: 10m

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
> Fix For: 4.0.0-alpha-2
>
> Attachments: image-2022-04-12-13-07-09-647.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect OuptputFormat info:
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26131:
--
Attachment: (was: image-2022-04-12-13-07-36-876.png)

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
> Fix For: 4.0.0-alpha-2
>
> Attachments: image-2022-04-12-13-07-09-647.png
>
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect OuptputFormat info:
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26131 started by zhangbutao.
-
> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
> Attachments: image-2022-04-12-13-07-09-647.png, 
> image-2022-04-12-13-07-36-876.png
>
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect OuptputFormat info:
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26131:
--
Fix Version/s: 4.0.0-alpha-2

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
> Fix For: 4.0.0-alpha-2
>
> Attachments: image-2022-04-12-13-07-09-647.png, 
> image-2022-04-12-13-07-36-876.png
>
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect OuptputFormat info:
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26131:
--
Description: 
Step to repro:
{code:java}
CREATE CONNECTOR mysql_qtest
TYPE 'mysql'
URL 'jdbc:mysql://localhost:3306/testdb'
WITH DCPROPERTIES (
"hive.sql.dbcp.username"="root",
"hive.sql.dbcp.password"="");

CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
DBPROPERTIES("connector.remoteDbName"="testdb"); 

describe formatted db_mysql.test;{code}
You can see incorrect OuptputFormat info:

!image-2022-04-12-13-07-09-647.png!

  was:
Step to repro:
{code:java}
CREATE CONNECTOR mysql_qtest
TYPE 'mysql'
URL 'jdbc:mysql://localhost:3306/testdb'
WITH DCPROPERTIES (
"hive.sql.dbcp.username"="root",
"hive.sql.dbcp.password"="");

CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
DBPROPERTIES("connector.remoteDbName"="testdb"); 

describe formatted db_mysql.test;{code}
You can see incorrect 

!image-2022-04-12-13-07-09-647.png!


> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
> Attachments: image-2022-04-12-13-07-09-647.png, 
> image-2022-04-12-13-07-36-876.png
>
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect OuptputFormat info:
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26131:
--
Description: 
Step to repro:
{code:java}
CREATE CONNECTOR mysql_qtest
TYPE 'mysql'
URL 'jdbc:mysql://localhost:3306/testdb'
WITH DCPROPERTIES (
"hive.sql.dbcp.username"="root",
"hive.sql.dbcp.password"="");

CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
DBPROPERTIES("connector.remoteDbName"="testdb"); 

describe formatted db_mysql.test;{code}
You can see incorrect 

!image-2022-04-12-13-07-09-647.png!

  was:
Step to repro:
{code:java}
CREATE CONNECTOR mysql_qtest
TYPE 'mysql'
URL 'jdbc:mysql://localhost:3306/testdb'
WITH DCPROPERTIES (
"hive.sql.dbcp.username"="root",
"hive.sql.dbcp.password"="");

CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
DBPROPERTIES("connector.remoteDbName"="testdb"); 

describe formatted db_mysql.test;{code}


> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
> Attachments: image-2022-04-12-13-07-09-647.png, 
> image-2022-04-12-13-07-36-876.png
>
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect 
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26131:
--
Attachment: image-2022-04-12-13-07-09-647.png

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
> Attachments: image-2022-04-12-13-07-09-647.png, 
> image-2022-04-12-13-07-36-876.png
>
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26131:
--
Attachment: image-2022-04-12-13-07-36-876.png

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
> Attachments: image-2022-04-12-13-07-09-647.png, 
> image-2022-04-12-13-07-36-876.png
>
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}
> You can see incorrect 
> !image-2022-04-12-13-07-09-647.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-26131:
-

Assignee: zhangbutao

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26131) Incorrect OutputFormat when describing jdbc connector table

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26131:
--
Description: 
Step to repro:
{code:java}
CREATE CONNECTOR mysql_qtest
TYPE 'mysql'
URL 'jdbc:mysql://localhost:3306/testdb'
WITH DCPROPERTIES (
"hive.sql.dbcp.username"="root",
"hive.sql.dbcp.password"="");

CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
DBPROPERTIES("connector.remoteDbName"="testdb"); 

describe formatted db_mysql.test;{code}

> Incorrect OutputFormat when describing jdbc connector table 
> 
>
> Key: HIVE-26131
> URL: https://issues.apache.org/jira/browse/HIVE-26131
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Priority: Minor
>
> Step to repro:
> {code:java}
> CREATE CONNECTOR mysql_qtest
> TYPE 'mysql'
> URL 'jdbc:mysql://localhost:3306/testdb'
> WITH DCPROPERTIES (
> "hive.sql.dbcp.username"="root",
> "hive.sql.dbcp.password"="");
> CREATE REMOTE DATABASE db_mysql USING mysql_qtest with 
> DBPROPERTIES("connector.remoteDbName"="testdb"); 
> describe formatted db_mysql.test;{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=755376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755376
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 18:49
Start Date: 11/Apr/22 18:49
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r847635937


##
standalone-metastore/pom.xml:
##
@@ -361,6 +362,12 @@
 runtime
 true
   
+   Hive Metastore Thrift over HTTP
> ---
>
> Key: HIVE-21456
> URL: https://issues.apache.org/jira/browse/HIVE-21456
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Standalone Metastore
>Reporter: Amit Khanna
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, 
> HIVE-21456.4.patch, HIVE-21456.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Hive Metastore currently doesn't have support for HTTP transport because of 
> which it is not possible to access it via Knox. Adding support for Thrift 
> over HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=755368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755368
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 18:20
Start Date: 11/Apr/22 18:20
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r847613439


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:
##
@@ -343,21 +366,162 @@ public static void startMetaStore(int port, 
HadoopThriftAuthBridge bridge,
 startMetaStore(port, bridge, conf, false, null);
   }
 
-  /**
-   * Start Metastore based on a passed {@link HadoopThriftAuthBridge}.
-   *
-   * @param port The port on which the Thrift server will start to serve
-   * @param bridge
-   * @param conf Configuration overrides
-   * @param startMetaStoreThreads Start the background threads (initiator, 
cleaner, statsupdater, etc.)
-   * @param startedBackgroundThreads If startMetaStoreThreads is true, this 
AtomicBoolean will be switched to true,
-   *  when all of the background threads are scheduled. Useful for testing 
purposes to wait
-   *  until the MetaStore is fully initialized.
-   * @throws Throwable
-   */
-  public static void startMetaStore(int port, HadoopThriftAuthBridge bridge,
-  Configuration conf, boolean startMetaStoreThreads, AtomicBoolean 
startedBackgroundThreads) throws Throwable {
-isMetaStoreRemote = true;
+  public static boolean isThriftServerRunning() {
+return thriftServer != null && thriftServer.isRunning();
+  }
+
+  // TODO: Is it worth trying to use a server that supports HTTP/2?
+  //  Does the Thrift http client support this?
+
+  public static ThriftServer startHttpMetastore(int port, Configuration conf)
+  throws Exception {
+LOG.info("Attempting to start http metastore server on port: {}", port);

Review Comment:
   @pvary : Thanks for the pointers. I have addressed disabling TRACE for HMS 
http server.





Issue Time Tracking
---

Worklog Id: (was: 755368)
Time Spent: 4h 50m  (was: 4h 40m)

> Hive Metastore Thrift over HTTP
> ---
>
> Key: HIVE-21456
> URL: https://issues.apache.org/jira/browse/HIVE-21456
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Standalone Metastore
>Reporter: Amit Khanna
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, 
> HIVE-21456.4.patch, HIVE-21456.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Hive Metastore currently doesn't have support for HTTP transport because of 
> which it is not possible to access it via Knox. Adding support for Thrift 
> over HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=755367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755367
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 18:19
Start Date: 11/Apr/22 18:19
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r847612658


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HmsThriftHttpServlet.java:
##
@@ -0,0 +1,116 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.Enumeration;
+
+import javax.servlet.ServletException;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.thrift.TProcessor;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.server.TServlet;
+
+public class HmsThriftHttpServlet extends TServlet {
+
+  private static final Logger LOG = LoggerFactory
+  .getLogger(HmsThriftHttpServlet.class);
+
+  private static final String X_USER = MetaStoreUtils.USER_NAME_HTTP_HEADER;
+
+  private final boolean isSecurityEnabled;
+
+  public HmsThriftHttpServlet(TProcessor processor,
+  TProtocolFactory inProtocolFactory, TProtocolFactory outProtocolFactory) 
{
+super(processor, inProtocolFactory, outProtocolFactory);
+// This should ideally be reveiving an instance of the Configuration which 
is used for the check
+isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
+  }
+
+  public HmsThriftHttpServlet(TProcessor processor,
+  TProtocolFactory protocolFactory) {
+super(processor, protocolFactory);
+isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
+  }
+
+  @Override
+  protected void doPost(HttpServletRequest request,
+  HttpServletResponse response) throws ServletException, IOException {
+
+Enumeration headerNames = request.getHeaderNames();
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Logging headers in request");
+  while (headerNames.hasMoreElements()) {
+String headerName = headerNames.nextElement();
+LOG.debug("Header: [{}], Value: [{}]", headerName,
+request.getHeader(headerName));
+  }
+}
+String userFromHeader = request.getHeader(X_USER);
+if (userFromHeader == null || userFromHeader.isEmpty()) {
+  LOG.error("No user header: {} found", X_USER);
+  response.sendError(HttpServletResponse.SC_FORBIDDEN,
+  "User Header missing");
+  return;
+}
+
+// TODO: These should ideally be in some kind of a Cache with Weak 
referencse.
+// If HMS were to set up some kind of a session, this would go into the 
session by having
+// this filter work with a custom Processor / or set the username into the 
session
+// as is done for HS2.
+// In case of HMS, it looks like each request is independent, and there is 
no session
+// information, so the UGI needs to be set up in the Connection layer 
itself.
+UserGroupInformation clientUgi;
+// Temporary, and useless for now. Here only to allow this to work on an 
otherwise kerberized
+// server.
+if (isSecurityEnabled) {
+  LOG.info("Creating proxy user for: {}", userFromHeader);
+  clientUgi = UserGroupInformation.createProxyUser(userFromHeader, 
UserGroupInformation.getLoginUser());
+} else {
+  LOG.info("Creating remote user for: {}", userFromHeader);
+  clientUgi = UserGroupInformation.createRemoteUser(userFromHeader);
+}
+
+
+PrivilegedExceptionAction action = new 
PrivilegedExceptionAction() {
+  @Override
+  public Void run() throws Exception {
+HmsThriftHttpServlet.super.doPost(request, response);
+return 

[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=755363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755363
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 18:15
Start Date: 11/Apr/22 18:15
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r847609721


##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestRemoteHiveHttpMetaStore.java:
##
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import org.apache.hadoop.hive.metastore.annotation.MetastoreUnitTest;
+import org.junit.experimental.categories.Category;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+
+@Category(MetastoreCheckinTest.class)
+public class TestRemoteHiveHttpMetaStore extends TestRemoteHiveMetaStore {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(TestRemoteHiveHttpMetaStore.class);
+
+  @Override
+  public void start() throws Exception {
+MetastoreConf.setVar(conf, ConfVars.THRIFT_TRANSPORT_MODE, "http");
+LOG.info("Attempting to start test remote metastore in http mode");
+super.start();
+LOG.info("Successfully started test remote metastore in http mode");
+  }
+
+  @Override
+  protected HiveMetaStoreClient createClient() throws Exception {
+MetastoreConf.setVar(conf, 
ConfVars.METASTORE_CLIENT_THRIFT_TRANSPORT_MODE, "http");
+return super.createClient();
+  }
+}

Review Comment:
   Done



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HmsThriftHttpServlet.java:
##
@@ -0,0 +1,116 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.Enumeration;
+
+import javax.servlet.ServletException;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.thrift.TProcessor;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.server.TServlet;
+
+public class HmsThriftHttpServlet extends TServlet {
+
+  private static final Logger LOG = LoggerFactory
+  .getLogger(HmsThriftHttpServlet.class);
+
+  private static final String X_USER = MetaStoreUtils.USER_NAME_HTTP_HEADER;
+
+  private final boolean isSecurityEnabled;
+
+  public HmsThriftHttpServlet(TProcessor processor,
+  TProtocolFactory inProtocolFactory, TProtocolFactory outProtocolFactory) 
{
+super(processor, inProtocolFactory, outProtocolFactory);
+// This should ideally be reveiving an instance of the Configuration which 
is used for the check
+isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
+  }
+
+  public HmsThriftHttpServlet(TProcessor 

[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=755362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755362
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 18:14
Start Date: 11/Apr/22 18:14
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r847609041


##
standalone-metastore/pom.xml:
##
@@ -361,6 +362,12 @@
 runtime
 true
   
+   Hive Metastore Thrift over HTTP
> ---
>
> Key: HIVE-21456
> URL: https://issues.apache.org/jira/browse/HIVE-21456
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Standalone Metastore
>Reporter: Amit Khanna
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, 
> HIVE-21456.4.patch, HIVE-21456.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Hive Metastore currently doesn't have support for HTTP transport because of 
> which it is not possible to access it via Knox. Adding support for Thrift 
> over HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21456) Hive Metastore Thrift over HTTP

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21456?focusedWorklogId=755360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755360
 ]

ASF GitHub Bot logged work on HIVE-21456:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 18:14
Start Date: 11/Apr/22 18:14
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on code in PR #3105:
URL: https://github.com/apache/hive/pull/3105#discussion_r847608730


##
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestSSL.java:
##
@@ -437,15 +439,36 @@ public void testConnectionWrongCertCN() throws Exception {
* Test HMS server with SSL
* @throws Exception
*/
+  @Ignore
   @Test
   public void testMetastoreWithSSL() throws Exception {
 testSSLHMS(true);
   }
 
+  /**
+   * Test HMS server with Http + SSL
+   * @throws Exception
+   */
+  @Test
+  public void testMetastoreWithHttps() throws Exception {
+// MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.EVENT_DB_NOTIFICATION_API_AUTH, false);
+//MetastoreConf.setVar(conf, 
MetastoreConf.ConfVars.METASTORE_CLIENT_TRANSPORT_MODE, "http");
+SSLTestUtils.setMetastoreHttpsConf(conf);
+MetastoreConf.setVar(conf, 
MetastoreConf.ConfVars.SSL_TRUSTMANAGERFACTORY_ALGORITHM,
+KEY_MANAGER_FACTORY_ALGORITHM);
+MetastoreConf.setVar(conf, MetastoreConf.ConfVars.SSL_TRUSTSTORE_TYPE, 
KEY_STORE_TRUST_STORE_TYPE);
+MetastoreConf.setVar(conf, MetastoreConf.ConfVars.SSL_KEYSTORE_TYPE, 
KEY_STORE_TRUST_STORE_TYPE);
+MetastoreConf.setVar(conf, 
MetastoreConf.ConfVars.SSL_KEYMANAGERFACTORY_ALGORITHM,
+KEY_MANAGER_FACTORY_ALGORITHM);
+
+testSSLHMS(false);

Review Comment:
   Thanks for pointing it out. I am setting the conf 
`MetastoreConf.ConfVars.SSL_KEYSTORE_TYPE`  in testSSLHMS(false) now.





Issue Time Tracking
---

Worklog Id: (was: 755360)
Time Spent: 4h 10m  (was: 4h)

> Hive Metastore Thrift over HTTP
> ---
>
> Key: HIVE-21456
> URL: https://issues.apache.org/jira/browse/HIVE-21456
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Standalone Metastore
>Reporter: Amit Khanna
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21456.2.patch, HIVE-21456.3.patch, 
> HIVE-21456.4.patch, HIVE-21456.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Hive Metastore currently doesn't have support for HTTP transport because of 
> which it is not possible to access it via Knox. Adding support for Thrift 
> over HTTP transport will allow the clients to access via Knox



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755303
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 16:37
Start Date: 11/Apr/22 16:37
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r847528556


##
ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java:
##
@@ -187,6 +188,14 @@ public void parseRecordIdentifier(Configuration 
configuration) {
 }
   }
 
+  public void parsePositionDeleteInfo(Configuration configuration) {
+this.pdi = PositionDeleteInfo.parseFromConf(configuration);

Review Comment:
   Would it worth to set the `pdi` fields one-by-one instead of creating a new 
object for every row?





Issue Time Tracking
---

Worklog Id: (was: 755303)
Time Spent: 17.5h  (was: 17h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755302
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 16:34
Start Date: 11/Apr/22 16:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r847525880


##
ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java:
##
@@ -673,7 +674,31 @@ private String toErrorMessage(Writable value, Object row, 
ObjectInspector inspec
 ctx.getIoCxt().setRecordIdentifier(null);//so we don't 
accidentally cache the value; shouldn't
 //happen since IO layer either knows how to produce ROW__ID or not 
- but to be safe
   }
- break;
+  break;
+case PARTITION_SPEC_ID:

Review Comment:
   Ok.. I would have accepted the change in the `Deserializer` for this, but I 
do not see how can we extend the `VirtualColumn` to allow columns from the 
Deserializer...
   
   Any ideas are welcome, until then we will work with this





Issue Time Tracking
---

Worklog Id: (was: 755302)
Time Spent: 17h 20m  (was: 17h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755298
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 16:23
Start Date: 11/Apr/22 16:23
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r847515950


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -468,14 +475,17 @@ private CloseableIterable newOrcIterable(InputFile 
inputFile, FileScanTask ta
   Set idColumns = spec.identitySourceIds();
   Schema partitionSchema = TypeUtil.select(expectedSchema, idColumns);
   boolean projectsIdentityPartitionColumns = 
!partitionSchema.columns().isEmpty();
-  if (projectsIdentityPartitionColumns) {
+  if (expectedSchema.findField(MetadataColumns.PARTITION_COLUMN_ID) != 
null) {

Review Comment:
   Why is this change needed?





Issue Time Tracking
---

Worklog Id: (was: 755298)
Time Spent: 17h 10m  (was: 17h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755280
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 15:44
Start Date: 11/Apr/22 15:44
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r847473881


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -325,9 +327,40 @@ private void commitTable(FileIO io, ExecutorService 
executor, JobContext jobCont
   "numReduceTasks/numMapTasks", jobContext.getJobID(), name);
   return conf.getNumReduceTasks() > 0 ? conf.getNumReduceTasks() : 
conf.getNumMapTasks();
 });
-Collection dataFiles = dataFiles(numTasks, executor, location, 
jobContext, io, true);
 
-boolean isOverwrite = conf.getBoolean(InputFormatConfig.IS_OVERWRITE, 
false);
+if (HiveIcebergStorageHandler.isDelete(conf, name)) {
+  Collection writeResults = collectResults(numTasks, 
executor, location, jobContext, io, true);
+  commitDelete(jobContext, table, startTime, writeResults);
+} else if (HiveIcebergStorageHandler.isWrite(conf, name)) {
+  Collection writeResults = collectResults(numTasks, 
executor, location, jobContext, io, true);
+  boolean isOverwrite = conf.getBoolean(InputFormatConfig.IS_OVERWRITE, 
false);
+  commitInsert(jobContext, table, startTime, writeResults, isOverwrite);
+} else {
+  LOG.info("Unable to determine commit operation type for table: {}, 
jobID: {}. Will not create a commit.",
+  table, jobContext.getJobID());
+}
+  }
+
+  private void commitDelete(JobContext jobContext, Table table, long 
startTime, Collection results) {

Review Comment:
   That should allow you to do something like:
   ```
   // update
   Transaction transaction = table.newTransaction();
   commitDelete(table, Optional.of(transaction), startTime, deleteWriteResults);
   commitInsert(table, Optional.of(transaction), startTime, insertWriteResults, 
isOverwrite);
   transaction.commitTransaction();
   ```





Issue Time Tracking
---

Worklog Id: (was: 755280)
Time Spent: 17h  (was: 16h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755279
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 15:44
Start Date: 11/Apr/22 15:44
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r847473881


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -325,9 +327,40 @@ private void commitTable(FileIO io, ExecutorService 
executor, JobContext jobCont
   "numReduceTasks/numMapTasks", jobContext.getJobID(), name);
   return conf.getNumReduceTasks() > 0 ? conf.getNumReduceTasks() : 
conf.getNumMapTasks();
 });
-Collection dataFiles = dataFiles(numTasks, executor, location, 
jobContext, io, true);
 
-boolean isOverwrite = conf.getBoolean(InputFormatConfig.IS_OVERWRITE, 
false);
+if (HiveIcebergStorageHandler.isDelete(conf, name)) {
+  Collection writeResults = collectResults(numTasks, 
executor, location, jobContext, io, true);
+  commitDelete(jobContext, table, startTime, writeResults);
+} else if (HiveIcebergStorageHandler.isWrite(conf, name)) {
+  Collection writeResults = collectResults(numTasks, 
executor, location, jobContext, io, true);
+  boolean isOverwrite = conf.getBoolean(InputFormatConfig.IS_OVERWRITE, 
false);
+  commitInsert(jobContext, table, startTime, writeResults, isOverwrite);
+} else {
+  LOG.info("Unable to determine commit operation type for table: {}, 
jobID: {}. Will not create a commit.",
+  table, jobContext.getJobID());
+}
+  }
+
+  private void commitDelete(JobContext jobContext, Table table, long 
startTime, Collection results) {

Review Comment:
   That should allow you to do something like:
   ```
   // update
   Transaction transaction = table.newTransaction();
   commitDelete(table, Optional.of(transaction), startTime, deleteWriteResults);
   commitInsert(table, Optional.of(transaction), startTime, insertWriteResults);
   transaction.commitTransaction();
   ```





Issue Time Tracking
---

Worklog Id: (was: 755279)
Time Spent: 16h 50m  (was: 16h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26129) Non blocking DROP CONNECTOR

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26129:
--
Labels: pull-request-available  (was: )

> Non blocking DROP CONNECTOR
> ---
>
> Key: HIVE-26129
> URL: https://issues.apache.org/jira/browse/HIVE-26129
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Use a less restrictive lock for data connectors, they do not have any 
> dependencies on other tables. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26129) Non blocking DROP CONNECTOR

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26129?focusedWorklogId=755238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755238
 ]

ASF GitHub Bot logged work on HIVE-26129:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 13:59
Start Date: 11/Apr/22 13:59
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on code in PR #3173:
URL: https://github.com/apache/hive/pull/3173#discussion_r847360100


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/dataconnector/drop/DropDataConnectorAnalyzer.java:
##
@@ -18,13 +18,15 @@
 
 package org.apache.hadoop.hive.ql.ddl.dataconnector.drop;
 
+import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.api.DataConnector;
 import org.apache.hadoop.hive.ql.QueryState;
 import org.apache.hadoop.hive.ql.exec.TaskFactory;
 import org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.DDLType;
 import org.apache.hadoop.hive.ql.ddl.DDLWork;
 import org.apache.hadoop.hive.ql.hooks.ReadEntity;
 import org.apache.hadoop.hive.ql.hooks.WriteEntity;
+import org.apache.hadoop.hive.ql.io.AcidUtils;

Review Comment:
   nit: Appears to be unnecessary import



##
ql/src/java/org/apache/hadoop/hive/ql/ddl/dataconnector/drop/DropDataConnectorAnalyzer.java:
##
@@ -18,13 +18,15 @@
 
 package org.apache.hadoop.hive.ql.ddl.dataconnector.drop;
 
+import org.apache.hadoop.hive.conf.HiveConf;

Review Comment:
   nit: Appears to be unnecessary import.





Issue Time Tracking
---

Worklog Id: (was: 755238)
Remaining Estimate: 0h
Time Spent: 10m

> Non blocking DROP CONNECTOR
> ---
>
> Key: HIVE-26129
> URL: https://issues.apache.org/jira/browse/HIVE-26129
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Use a less restrictive lock for data connectors, they do not have any 
> dependencies on other tables. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25941?focusedWorklogId=755236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755236
 ]

ASF GitHub Bot logged work on HIVE-25941:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 13:58
Start Date: 11/Apr/22 13:58
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3014:
URL: https://github.com/apache/hive/pull/3014#discussion_r847360214


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveMaterializedViewASTSubQueryRewriteShuttle.java:
##
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.hadoop.hive.common.TableName;
+import org.apache.hadoop.hive.ql.lockmgr.HiveTxnManager;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewUtils;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.CalcitePlanner;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.EnumSet;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.Stack;
+import java.util.function.Predicate;
+
+import static java.util.Collections.singletonList;
+import static java.util.Collections.unmodifiableMap;
+import static java.util.Collections.unmodifiableSet;
+import static 
org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization.RewriteAlgorithm.NON_CALCITE;
+import static 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewUtils.extractTable;
+
+/**
+ * Traverse the plan and tries to rewrite subtrees of the plan to materialized 
view scans.
+ *
+ * The rewrite depends on whether the subtree's corresponding AST match with 
any materialized view
+ * definitions AST.
+ */
+public class HiveMaterializedViewASTSubQueryRewriteShuttle extends 
HiveRelShuttleImpl {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveMaterializedViewASTSubQueryRewriteShuttle.class);
+
+  private final Map subQueryMap;
+  private final ASTNode originalAST;
+  private final ASTNode expandedAST;
+  private final RelBuilder relBuilder;
+  private final Hive db;
+  private final Set tablesUsedByOriginalPlan;
+  private final HiveTxnManager txnManager;
+
+  public HiveMaterializedViewASTSubQueryRewriteShuttle(
+  Map subQueryMap,
+  ASTNode originalAST,
+  ASTNode expandedAST,
+  RelBuilder relBuilder,
+  Hive db,
+  Set tablesUsedByOriginalPlan,
+  HiveTxnManager txnManager) {
+this.subQueryMap = unmodifiableMap(subQueryMap);
+this.originalAST = originalAST;
+this.expandedAST = expandedAST;
+this.relBuilder = relBuilder;
+this.db = db;
+this.tablesUsedByOriginalPlan = unmodifiableSet(tablesUsedByOriginalPlan);
+this.txnManager = txnManager;
+  }
+
+  public RelNode rewrite(RelNode relNode) {
+return relNode.accept(this);
+  }
+
+  @Override
+  public RelNode visit(HiveProject project) {
+if (!subQueryMap.containsKey(project)) {
+  // No AST is found for this subtree
+  return super.visit(project);
+}
+
+// The AST associated to the RelNode is part of the original AST, but we 
need the expanded one
+// 1. Collect the path elements of this node in the original AST
+Stack path = new Stack<>();
+ASTNode curr = subQueryMap.get(project);
+while (curr != null && curr != originalAST) {
+  path.push(curr.getType());
+  curr = (ASTNode) 

[jira] [Work logged] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25941?focusedWorklogId=755235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755235
 ]

ASF GitHub Bot logged work on HIVE-25941:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 13:58
Start Date: 11/Apr/22 13:58
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3014:
URL: https://github.com/apache/hive/pull/3014#discussion_r847359651


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveMaterializedViewASTSubQueryRewriteShuttle.java:
##
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.hadoop.hive.common.TableName;
+import org.apache.hadoop.hive.ql.lockmgr.HiveTxnManager;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewUtils;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.CalcitePlanner;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.EnumSet;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.Stack;
+import java.util.function.Predicate;
+
+import static java.util.Collections.singletonList;
+import static java.util.Collections.unmodifiableMap;
+import static java.util.Collections.unmodifiableSet;
+import static 
org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization.RewriteAlgorithm.NON_CALCITE;
+import static 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewUtils.extractTable;
+
+/**
+ * Traverse the plan and tries to rewrite subtrees of the plan to materialized 
view scans.
+ *
+ * The rewrite depends on whether the subtree's corresponding AST match with 
any materialized view
+ * definitions AST.
+ */
+public class HiveMaterializedViewASTSubQueryRewriteShuttle extends 
HiveRelShuttleImpl {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveMaterializedViewASTSubQueryRewriteShuttle.class);
+
+  private final Map subQueryMap;
+  private final ASTNode originalAST;
+  private final ASTNode expandedAST;
+  private final RelBuilder relBuilder;
+  private final Hive db;
+  private final Set tablesUsedByOriginalPlan;
+  private final HiveTxnManager txnManager;
+
+  public HiveMaterializedViewASTSubQueryRewriteShuttle(
+  Map subQueryMap,
+  ASTNode originalAST,
+  ASTNode expandedAST,
+  RelBuilder relBuilder,
+  Hive db,
+  Set tablesUsedByOriginalPlan,
+  HiveTxnManager txnManager) {
+this.subQueryMap = unmodifiableMap(subQueryMap);
+this.originalAST = originalAST;
+this.expandedAST = expandedAST;
+this.relBuilder = relBuilder;
+this.db = db;
+this.tablesUsedByOriginalPlan = unmodifiableSet(tablesUsedByOriginalPlan);
+this.txnManager = txnManager;
+  }
+
+  public RelNode rewrite(RelNode relNode) {
+return relNode.accept(this);
+  }
+
+  @Override
+  public RelNode visit(HiveProject project) {
+if (!subQueryMap.containsKey(project)) {

Review Comment:
   Added check





Issue Time Tracking
---

Worklog Id: (was: 755235)
Time Spent: 1.5h  (was: 1h 20m)

> Long compilation time of complex query due to analysis for materialized view 
> rewrite
> 
>
> Key: HIVE-25941
> URL: https://issues.apache.org/jira/browse/HIVE-25941
> Project: 

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755231
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 13:46
Start Date: 11/Apr/22 13:46
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on PR #3131:
URL: https://github.com/apache/hive/pull/3131#issuecomment-1095073865

   @pvary I've refactored the `UpdateDeleteSemanticAnalyzer` to obtain the 
selectColumns and sortColumns during query rewriting from the 
`HiveStorageHandler` (see HiveStorageHandler#acidSelectColumns and 
HiveStorageHandler#acidSortColumns in 
[509c58b](https://github.com/apache/hive/pull/3131/commits/509c58b94693394e031b8780d3e6805286c85262))




Issue Time Tracking
---

Worklog Id: (was: 755231)
Time Spent: 16h 40m  (was: 16.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755230
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 13:43
Start Date: 11/Apr/22 13:43
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r847343603


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -325,9 +327,40 @@ private void commitTable(FileIO io, ExecutorService 
executor, JobContext jobCont
   "numReduceTasks/numMapTasks", jobContext.getJobID(), name);
   return conf.getNumReduceTasks() > 0 ? conf.getNumReduceTasks() : 
conf.getNumMapTasks();
 });
-Collection dataFiles = dataFiles(numTasks, executor, location, 
jobContext, io, true);
 
-boolean isOverwrite = conf.getBoolean(InputFormatConfig.IS_OVERWRITE, 
false);
+if (HiveIcebergStorageHandler.isDelete(conf, name)) {
+  Collection writeResults = collectResults(numTasks, 
executor, location, jobContext, io, true);
+  commitDelete(jobContext, table, startTime, writeResults);
+} else if (HiveIcebergStorageHandler.isWrite(conf, name)) {
+  Collection writeResults = collectResults(numTasks, 
executor, location, jobContext, io, true);
+  boolean isOverwrite = conf.getBoolean(InputFormatConfig.IS_OVERWRITE, 
false);
+  commitInsert(jobContext, table, startTime, writeResults, isOverwrite);
+} else {
+  LOG.info("Unable to determine commit operation type for table: {}, 
jobID: {}. Will not create a commit.",
+  table, jobContext.getJobID());
+}
+  }
+
+  private void commitDelete(JobContext jobContext, Table table, long 
startTime, Collection results) {

Review Comment:
   Thanks for checking! I've refactored the `commitDelete` and `commitInsert` 
to use an optional Transaction object, which can be passed in case of an update 
or merge query.





Issue Time Tracking
---

Worklog Id: (was: 755230)
Time Spent: 16.5h  (was: 16h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755228
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 13:39
Start Date: 11/Apr/22 13:39
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r847339367


##
ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java:
##
@@ -673,7 +674,31 @@ private String toErrorMessage(Writable value, Object row, 
ObjectInspector inspec
 ctx.getIoCxt().setRecordIdentifier(null);//so we don't 
accidentally cache the value; shouldn't
 //happen since IO layer either knows how to produce ROW__ID or not 
- but to be safe
   }
- break;
+  break;
+case PARTITION_SPEC_ID:

Review Comment:
   Unfortunately we don't have the Table object anywhere around this area as 
far as I can tell, so I'm not sure how we could inject the logic using the 
storage handler. Besides, this method is called `populateVirtualColumns` where 
all the other virtual cols are filled out too, so right now I don't see a 
better place to put it





Issue Time Tracking
---

Worklog Id: (was: 755228)
Time Spent: 16h 20m  (was: 16h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?focusedWorklogId=755225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755225
 ]

ASF GitHub Bot logged work on HIVE-26130:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 13:37
Start Date: 11/Apr/22 13:37
Worklog Time Spent: 10m 
  Work Description: zhangbutao commented on PR #3199:
URL: https://github.com/apache/hive/pull/3199#issuecomment-1095063659

   Failed tests unrelated




Issue Time Tracking
---

Worklog Id: (was: 755225)
Time Spent: 20m  (was: 10m)

> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25941?focusedWorklogId=755214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755214
 ]

ASF GitHub Bot logged work on HIVE-25941:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 13:06
Start Date: 11/Apr/22 13:06
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3014:
URL: https://github.com/apache/hive/pull/3014#discussion_r847305506


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewsCache.java:
##
@@ -205,4 +212,52 @@ HiveRelOptMaterialization get(String dbName, String 
viewName) {
   public boolean isEmpty() {
 return materializedViews.isEmpty();
   }
+
+
+  private static class ASTKey {
+private final ASTNode root;
+
+public ASTKey(ASTNode root) {
+  this.root = root;
+}
+
+@Override
+public boolean equals(Object o) {
+  if (this == o) return true;
+  if (o == null || getClass() != o.getClass()) return false;
+  ASTKey that = (ASTKey) o;
+  return equals(root, that.root);
+}
+
+private boolean equals(ASTNode astNode1, ASTNode astNode2) {
+  if (!(astNode1.getType() == astNode2.getType() &&
+  astNode1.getText().equals(astNode2.getText()) &&
+  astNode1.getChildCount() == astNode2.getChildCount())) {
+return false;
+  }
+
+  for (int i = 0; i < astNode1.getChildCount(); ++i) {
+if (!equals((ASTNode) astNode1.getChild(i), (ASTNode) 
astNode2.getChild(i))) {
+  return false;
+}
+  }
+
+  return true;
+}
+
+@Override
+public int hashCode() {
+  return hashcode(root);

Review Comment:
   * Hashcode of the ASTs stored in the `MaterializedViewCache` calculated only 
once: when the MVs are loaded when hs2 starts or a new MV is created because 
Java hashmap implementation caches the key's hashcode.
   * When we look-up a Materialization the hashcode of the key is calculated 
every time the get method is called. This is called only once for the entire 
tree per query.
   * To find sub-query rewrites the look-up is done by sub AST-s and the 
hashcode is also calculated for the subTrees but when I did some performance 
tests locally I didn't found this as a bottleneck.
   
   This solution is still much faster then generating the expanded query text 
of every possible sub-query using `UnparseTranslator` and `TokenRewriteStream`.





Issue Time Tracking
---

Worklog Id: (was: 755214)
Time Spent: 1h 20m  (was: 1h 10m)

> Long compilation time of complex query due to analysis for materialized view 
> rewrite
> 
>
> Key: HIVE-25941
> URL: https://issues.apache.org/jira/browse/HIVE-25941
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: sample.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When compiling query the optimizer tries to rewrite the query plan or 
> subtrees of the plan to use materialized view scans.
> If
> {code}
> set hive.materializedview.rewriting.sql.subquery=false;
> {code}
> the compilation succeed in less then 10 sec otherwise it takes several 
> minutes (~ 5min) depending on the hardware.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=755195=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755195
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 12:07
Start Date: 11/Apr/22 12:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3168:
URL: https://github.com/apache/hive/pull/3168#discussion_r847253069


##
standalone-metastore/pom.xml:
##
@@ -531,6 +531,29 @@
 
   
 
+
+  javadoc
+  
+
+  
+org.apache.maven.plugins
+maven-javadoc-plugin

Review Comment:
   Since the javadoc generation is a big mess ATM, I would suggest to keep it 
as it is, and if the tests are failing then we can decide what we want to do 
with them.
   
   See also: #3185





Issue Time Tracking
---

Worklog Id: (was: 755195)
Time Spent: 2h 10m  (was: 2h)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error

[jira] [Resolved] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-11 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26093.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~zabetak]!

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/../bin/javadoc
>  @options @packages
> [ERROR] 
> [ERROR] Refer to the generated Javadoc files in 
> '/Users/pvary/dev/upstream/hive/target/site/apidocs' dir.
> {code}
> We should fix this by removing one of the above



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26093) Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26093?focusedWorklogId=755194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755194
 ]

ASF GitHub Bot logged work on HIVE-26093:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 12:04
Start Date: 11/Apr/22 12:04
Worklog Time Spent: 10m 
  Work Description: pvary merged PR #3168:
URL: https://github.com/apache/hive/pull/3168




Issue Time Tracking
---

Worklog Id: (was: 755194)
Time Spent: 2h  (was: 1h 50m)

> Deduplicate org.apache.hadoop.hive.metastore.annotation package-info.java
> -
>
> Key: HIVE-26093
> URL: https://issues.apache.org/jira/browse/HIVE-26093
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently we define 
> org.apache.hadoop.hive.metastore.annotation.MetastoreVersionAnnotation in 2 
> places:
> - 
> ./standalone-metastore/metastore-common/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> - 
> ./standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java
> This causes javadoc generation to fail with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:aggregate (default-cli) 
> on project hive: An error has occurred in Javadoc report generation: 
> [ERROR] Exit code: 1 - 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:8:
>  warning: a package-info.java file has already been seen for package 
> org.apache.hadoop.hive.metastore.annotation
> [ERROR] package org.apache.hadoop.hive.metastore.annotation;
> [ERROR] ^
> [ERROR] javadoc: warning - Multiple sources of package comments found for 
> package "org.apache.hive.streaming"
> [ERROR] 
> /Users/pvary/dev/upstream/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java:556:
>  error: type MapSerializer does not take parameters
> [ERROR]   com.esotericsoftware.kryo.serializers.MapSerializer {
> [ERROR]  ^
> [ERROR] 
> /Users/pvary/dev/upstream/hive/standalone-metastore/metastore-server/src/gen/version/org/apache/hadoop/hive/metastore/annotation/package-info.java:4:
>  error: package org.apache.hadoop.hive.metastore.annotation has already been 
> annotated
> [ERROR] @MetastoreVersionAnnotation(version="4.0.0-alpha-1", 
> shortVersion="4.0.0-alpha-1",
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR]   at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR]   at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR]   at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
> [ERROR]   at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR]   at 
> com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR]   at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR]   at 
> com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR]   at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR]   at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR]   at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR]   at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR]   at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR]   at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR]   at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR]   at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: 
> /usr/local/Cellar/openjdk@8/1.8.0+302/libexec/openjdk.jdk/Contents/Home/jre/../bin/javadoc
>  @options @packages
> [ERROR] 
> [ERROR] Refer to the generated Javadoc files in 
> '/Users/pvary/dev/upstream/hive/target/site/apidocs' dir.
> {code}
> We should fix this by removing one of the above



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=755191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755191
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 11:47
Start Date: 11/Apr/22 11:47
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r847237421


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java:
##
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.data.GenericRecord;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructProjection;
+
+public class IcebergAcidUtil {
+
+  private IcebergAcidUtil() {
+  }
+
+  private static final Types.NestedField PARTITION_STRUCT_META_COL = null; // 
placeholder value in the map
+  private static final Map 
DELETE_FILE_READ_META_COLS = Maps.newLinkedHashMap();
+
+  static {
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_FILE_READ_META_COLS.put(PARTITION_STRUCT_META_COL, 1);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_FILE_READ_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  private static final Types.NestedField PARTITION_HASH_META_COL = 
Types.NestedField.required(
+  MetadataColumns.PARTITION_COLUMN_ID, 
MetadataColumns.PARTITION_COLUMN_NAME, Types.LongType.get());
+  private static final Map DELETE_SERDE_META_COLS 
= Maps.newLinkedHashMap();
+
+  static {
+DELETE_SERDE_META_COLS.put(MetadataColumns.SPEC_ID, 0);
+DELETE_SERDE_META_COLS.put(PARTITION_HASH_META_COL, 1);
+DELETE_SERDE_META_COLS.put(MetadataColumns.FILE_PATH, 2);
+DELETE_SERDE_META_COLS.put(MetadataColumns.ROW_POSITION, 3);
+  }
+
+  /**
+   * @param dataCols The columns of the original file read schema
+   * @param table The table object - it is used for populating the partition 
struct meta column
+   * @return The schema for reading files, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createFileReadSchemaForDelete(List 
dataCols, Table table) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + 
DELETE_FILE_READ_META_COLS.size());
+DELETE_FILE_READ_META_COLS.forEach((metaCol, index) -> {
+  if (metaCol == PARTITION_STRUCT_META_COL) {
+cols.add(MetadataColumns.metadataColumn(table, 
MetadataColumns.PARTITION_COLUMN_NAME));
+  } else {
+cols.add(metaCol);
+  }
+});
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param dataCols The columns of the serde projection schema
+   * @return The schema for SerDe operations, extended with metadata columns 
needed for deletes
+   */
+  public static Schema createSerdeSchemaForDelete(List 
dataCols) {
+List cols = 
Lists.newArrayListWithCapacity(dataCols.size() + DELETE_SERDE_META_COLS.size());
+DELETE_SERDE_META_COLS.forEach((metaCol, index) -> cols.add(metaCol));
+cols.addAll(dataCols);
+return new Schema(cols);
+  }
+
+  /**
+   * @param rec The record read by the file scan task, which contains both the 
metadata fields and the row data fields
+   * @param rowData The record object to populate with the rowData fields only
+   * @return The position delete object
+   */
+  public static PositionDelete getPositionDelete(Record rec, 
GenericRecord rowData) {
+PositionDelete positionDelete = PositionDelete.create();
+String filePath = 
rec.get(DELETE_SERDE_META_COLS.get(MetadataColumns.FILE_PATH), String.class);
+

[jira] [Updated] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26130:
--
Fix Version/s: 4.0.0-alpha-2
Affects Version/s: 4.0.0-alpha-1
   4.0.0-alpha-2

> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26130 started by zhangbutao.
-
> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25492) Major query-based compaction is skipped if partition is empty

2022-04-11 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits resolved HIVE-25492.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks for the review [~dkuzmenko]

> Major query-based compaction is skipped if partition is empty
> -
>
> Key: HIVE-25492
> URL: https://issues.apache.org/jira/browse/HIVE-25492
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: Karen Coppage
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently if the result of query-based compaction is an empty base, delta, or 
> delete delta, the empty directory is deleted.
> This is because of minor compaction – if there are only deltas to compact, 
> then no compacted delete delta should be created (only a compacted delta). In 
> the same way, if there are only delete deltas to compact, then no compacted 
> delta should be created (only a compacted delete delta).
> There is an issue with major compaction. If all the data in the partition has 
> been deleted, then we should get an empty base directory after compaction. 
> Instead, the empty base directory is deleted because it's empty and 
> compaction claims to succeed but we end up with the same deltas/delete deltas 
> we started with – basically compaction does not run.
> Where to start? MajorQueryCompactor#commitCompaction



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25492) Major query-based compaction is skipped if partition is empty

2022-04-11 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-25492:
---
Affects Version/s: 4.0.0-alpha-1
   4.0.0-alpha-2

> Major query-based compaction is skipped if partition is empty
> -
>
> Key: HIVE-25492
> URL: https://issues.apache.org/jira/browse/HIVE-25492
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: Karen Coppage
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently if the result of query-based compaction is an empty base, delta, or 
> delete delta, the empty directory is deleted.
> This is because of minor compaction – if there are only deltas to compact, 
> then no compacted delete delta should be created (only a compacted delta). In 
> the same way, if there are only delete deltas to compact, then no compacted 
> delta should be created (only a compacted delete delta).
> There is an issue with major compaction. If all the data in the partition has 
> been deleted, then we should get an empty base directory after compaction. 
> Instead, the empty base directory is deleted because it's empty and 
> compaction claims to succeed but we end up with the same deltas/delete deltas 
> we started with – basically compaction does not run.
> Where to start? MajorQueryCompactor#commitCompaction



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25492) Major query-based compaction is skipped if partition is empty

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25492?focusedWorklogId=755170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755170
 ]

ASF GitHub Bot logged work on HIVE-25492:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 10:12
Start Date: 11/Apr/22 10:12
Worklog Time Spent: 10m 
  Work Description: asinkovits merged PR #3157:
URL: https://github.com/apache/hive/pull/3157




Issue Time Tracking
---

Worklog Id: (was: 755170)
Time Spent: 2h 20m  (was: 2h 10m)

> Major query-based compaction is skipped if partition is empty
> -
>
> Key: HIVE-25492
> URL: https://issues.apache.org/jira/browse/HIVE-25492
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently if the result of query-based compaction is an empty base, delta, or 
> delete delta, the empty directory is deleted.
> This is because of minor compaction – if there are only deltas to compact, 
> then no compacted delete delta should be created (only a compacted delta). In 
> the same way, if there are only delete deltas to compact, then no compacted 
> delta should be created (only a compacted delete delta).
> There is an issue with major compaction. If all the data in the partition has 
> been deleted, then we should get an empty base directory after compaction. 
> Instead, the empty base directory is deleted because it's empty and 
> compaction claims to succeed but we end up with the same deltas/delete deltas 
> we started with – basically compaction does not run.
> Where to start? MajorQueryCompactor#commitCompaction



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26123?focusedWorklogId=755169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755169
 ]

ASF GitHub Bot logged work on HIVE-26123:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 10:09
Start Date: 11/Apr/22 10:09
Worklog Time Spent: 10m 
  Work Description: asolimando commented on PR #3196:
URL: https://github.com/apache/hive/pull/3196#issuecomment-1094860222

   > I really don't like to grow the number of core cli test drivers; why do we 
need separate for oracle/etc?
   > 
   > can't we use a qoption instead of a whole set of new drivers?
   > 
   > I wonder if we really need to have mile long q.out results for these kind 
of things. I think these kind of things should be run as part of some automated 
smoke tests for the release - with a real installation undernteath
   
   The existing test infra could be improved in many ways, regarding the 
metastore bit, there is already a ticket: 
https://issues.apache.org/jira/browse/HIVE-26005. 
   
   We have discussed offline with @zabetak and we thought it would be better to 
move forward with what we have, and to tackle the improvement in the other 
ticket, since at the moment we already have broken support for mysql for sysdb: 
https://issues.apache.org/jira/browse/HIVE-26125. 
   
   If you think HIVE-26005 is a must-do, please update the link between the 
tickets accordingly and I will pause this until me or somebody else finds the 
time to tackle it, otherwise if you agree that is better to have more coverage 
now and improve tests later, I am open to suggestions on how to improve the 
current proposal.
   
   I have explored the qoption way you suggest (adding a metastore option to 
sysdb qoption), I can start the sought docker containers, but changing the 
configuration properties to use another metastore failed, the new cli driver 
was the only way I could make these tests working.




Issue Time Tracking
---

Worklog Id: (was: 755169)
Time Spent: 1h  (was: 50m)

> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26130:
--
Description: 
_AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
judgment statement:
{code:java}
else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
{code}
In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
'EXTERNAL'='TRUE) to validate external table.

 

  was:
_AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
judgment statement:

 
{code:java}
else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
{code}
In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
'EXTERNAL'='TRUE) to validate external table.

 


> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?focusedWorklogId=755162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755162
 ]

ASF GitHub Bot logged work on HIVE-26130:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 09:56
Start Date: 11/Apr/22 09:56
Worklog Time Spent: 10m 
  Work Description: zhangbutao opened a new pull request, #3199:
URL: https://github.com/apache/hive/pull/3199

   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 755162)
Remaining Estimate: 0h
Time Spent: 10m

> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
>  
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26130:
--
Labels: pull-request-available  (was: )

> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
>  
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-26130:
--
Description: 
_AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
judgment statement:

 
{code:java}
else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
{code}
In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
'EXTERNAL'='TRUE) to validate external table.

 

> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
>  
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25941?focusedWorklogId=755148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755148
 ]

ASF GitHub Bot logged work on HIVE-25941:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 09:48
Start Date: 11/Apr/22 09:48
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3014:
URL: https://github.com/apache/hive/pull/3014#discussion_r847121787


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewsCache.java:
##
@@ -205,4 +212,52 @@ HiveRelOptMaterialization get(String dbName, String 
viewName) {
   public boolean isEmpty() {
 return materializedViews.isEmpty();
   }
+
+
+  private static class ASTKey {
+private final ASTNode root;
+
+public ASTKey(ASTNode root) {
+  this.root = root;
+}
+
+@Override
+public boolean equals(Object o) {
+  if (this == o) return true;
+  if (o == null || getClass() != o.getClass()) return false;
+  ASTKey that = (ASTKey) o;
+  return equals(root, that.root);
+}
+
+private boolean equals(ASTNode astNode1, ASTNode astNode2) {
+  if (!(astNode1.getType() == astNode2.getType() &&
+  astNode1.getText().equals(astNode2.getText()) &&
+  astNode1.getChildCount() == astNode2.getChildCount())) {
+return false;
+  }
+
+  for (int i = 0; i < astNode1.getChildCount(); ++i) {
+if (!equals((ASTNode) astNode1.getChild(i), (ASTNode) 
astNode2.getChild(i))) {
+  return false;
+}
+  }
+
+  return true;
+}
+
+@Override
+public int hashCode() {
+  return hashcode(root);

Review Comment:
   you could probably cache the hashcode - so that its not neccessary to 
compute it multiple times



##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveMaterializedViewASTSubQueryRewriteShuttle.java:
##
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.hadoop.hive.common.TableName;
+import org.apache.hadoop.hive.ql.lockmgr.HiveTxnManager;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewUtils;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.CalcitePlanner;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.EnumSet;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.Stack;
+import java.util.function.Predicate;
+
+import static java.util.Collections.singletonList;
+import static java.util.Collections.unmodifiableMap;
+import static java.util.Collections.unmodifiableSet;
+import static 
org.apache.hadoop.hive.ql.metadata.HiveRelOptMaterialization.RewriteAlgorithm.NON_CALCITE;
+import static 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewUtils.extractTable;
+
+/**
+ * Traverse the plan and tries to rewrite subtrees of the plan to materialized 
view scans.
+ *
+ * The rewrite depends on whether the subtree's corresponding AST match with 
any materialized view
+ * definitions AST.
+ */
+public class HiveMaterializedViewASTSubQueryRewriteShuttle extends 
HiveRelShuttleImpl {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveMaterializedViewASTSubQueryRewriteShuttle.class);
+
+  private final Map subQueryMap;
+  private final ASTNode originalAST;
+  private final ASTNode expandedAST;
+  private final RelBuilder relBuilder;
+  

[jira] [Assigned] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-04-11 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-26130:
-


> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26092?focusedWorklogId=755140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755140
 ]

ASF GitHub Bot logged work on HIVE-26092:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 09:13
Start Date: 11/Apr/22 09:13
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3185:
URL: https://github.com/apache/hive/pull/3185#discussion_r847111093


##
Jenkinsfile:
##
@@ -350,6 +350,18 @@ tar -xzf 
packaging/target/apache-hive-*-nightly-*-src.tar.gz
   }
 }
   }
+  branches['javadoc-check'] = {
+executorNode {
+  stage('Prepare') {
+  loadWS();
+  }
+  stage('Generate javadoc') {
+  sh """#!/bin/bash -e
+mvn clean install javadoc:javadoc javadoc:aggregate -DskipTests

Review Comment:
   this is not good; please look how other parts of this file are using maven.
   





Issue Time Tracking
---

Worklog Id: (was: 755140)
Time Spent: 20m  (was: 10m)

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26123?focusedWorklogId=755139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755139
 ]

ASF GitHub Bot logged work on HIVE-26123:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 09:10
Start Date: 11/Apr/22 09:10
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3196:
URL: https://github.com/apache/hive/pull/3196#discussion_r847106908


##
itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestPostgresMetastoreCliDriver.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.cli;
+
+import org.apache.hadoop.hive.cli.control.CliAdapter;
+import org.apache.hadoop.hive.cli.control.CliConfigs;
+import org.apache.hadoop.hive.cli.control.SplitSupport;
+import org.junit.ClassRule;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TestRule;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.junit.runners.Parameterized.Parameters;
+import org.junit.runners.model.Statement;
+
+import java.io.File;
+import java.util.List;
+
+@RunWith(Parameterized.class)
+public class TestPostgresMetastoreCliDriver {
+
+  static CliAdapter adapter = new 
CliConfigs.PostgresMetastoreCliConfig().getCliAdapter();
+
+  private static final int N_SPLITS = 32;

Review Comment:
   seems like copy-paste ? do you know what you are doing?



##
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java:
##
@@ -240,6 +240,101 @@ public MiniLlapLocalCliConfig() {
 }
   }
 
+  public static class PostgresMetastoreCliConfig extends AbstractCliConfig {
+public PostgresMetastoreCliConfig() {
+  super(CoreCliDriver.class);
+  try {
+setQueryDir("ql/src/test/queries/clientpositive");
+includesFrom(testConfigProps, "ms.postgres.query.files");
+setResultsDir("ql/src/test/results/clientpositive/mspostgres");
+setLogDir("itests/qtest/target/qfile-results/mspostgres");
+setInitScript("q_test_init.sql");
+setCleanupScript("q_test_cleanup.sql");
+setHiveConfDir("data/conf/llap");
+setClusterType(MiniClusterType.LLAP);
+setMetastoreType("postgres");
+  } catch (Exception e) {
+throw new RuntimeException("can't construct cliconfig", e);
+  }
+}
+  }
+
+  public static class MssqlMetastoreCliConfig extends AbstractCliConfig {
+public MssqlMetastoreCliConfig() {
+  super(CoreCliDriver.class);
+  try {
+setQueryDir("ql/src/test/queries/clientpositive");
+includesFrom(testConfigProps, "ms.mssql.query.files");
+setResultsDir("ql/src/test/results/clientpositive/msmssql");
+setLogDir("itests/qtest/target/qfile-results/msmssql");
+setInitScript("q_test_init.sql");
+setCleanupScript("q_test_cleanup.sql");
+setHiveConfDir("data/conf/llap");
+setClusterType(MiniClusterType.LLAP);
+setMetastoreType("mssql");
+  } catch (Exception e) {
+throw new RuntimeException("can't construct cliconfig", e);
+  }
+}
+  }
+
+  public static class OracleMetastoreCliConfig extends AbstractCliConfig {
+public OracleMetastoreCliConfig() {
+  super(CoreCliDriver.class);
+  try {
+setQueryDir("ql/src/test/queries/clientpositive");
+includesFrom(testConfigProps, "ms.oracle.query.files");
+setResultsDir("ql/src/test/results/clientpositive/msoracle");
+setLogDir("itests/qtest/target/qfile-results/msoracle");
+setInitScript("q_test_init.sql");
+setCleanupScript("q_test_cleanup.sql");
+setHiveConfDir("data/conf/llap");
+setClusterType(MiniClusterType.LLAP);
+setMetastoreType("oracle");
+  } catch (Exception e) {
+throw new RuntimeException("can't construct cliconfig", e);
+  }
+}
+  }
+
+  public static class MysqlMetastoreCliConfig extends AbstractCliConfig {
+public MysqlMetastoreCliConfig() {
+  super(CoreCliDriver.class);
+  try {
+

[jira] [Work logged] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26123?focusedWorklogId=755138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755138
 ]

ASF GitHub Bot logged work on HIVE-26123:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 09:09
Start Date: 11/Apr/22 09:09
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on PR #3196:
URL: https://github.com/apache/hive/pull/3196#issuecomment-1094773625

   I really don't like to grow the number of core cli test drivers; why do we 
need separate for oracle/etc? 
   
   can't we use a qoption instead of a whole set of new drivers? 
   
   I wonder if we really need to have mile long q.out results for these kind of 
things.
   I think these kind of things should be run as part of some automated smoke 
tests for the release - with a real installation undernteath




Issue Time Tracking
---

Worklog Id: (was: 755138)
Time Spent: 40m  (was: 0.5h)

> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=755133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755133
 ]

ASF GitHub Bot logged work on HIVE-26121:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 08:34
Start Date: 11/Apr/22 08:34
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on PR #3181:
URL: https://github.com/apache/hive/pull/3181#issuecomment-1094709982

   > endTransactionAndCleanup
   
   yes, we definitely need to synchronize 
`DriverTxnHandler.endTransactionAndCleanup`.  As for  
`DbTxnManager.java.stopHeartbeat` it was already kinda synchronized and it's 
possible that concurrent execution might still occur when shutdownhook is 
invoked.




Issue Time Tracking
---

Worklog Id: (was: 755133)
Time Spent: 50m  (was: 40m)

> Hive transaction rollback should be thread-safe
> ---
>
> Key: HIVE-26121
> URL: https://issues.apache.org/jira/browse/HIVE-26121
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When Hive query is being interrupted via cancel request, both the background 
> pool thread (HiveServer2-Background) executing the query and the HttpHandler 
> thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic 
> will eventually trigger the below method:
> {code}
> DriverTxnHandler.endTransactionAndCleanup(boolean commit)
> {code}
> Since this method could be invoked concurrently we need to synchronize access 
> to it, so that only 1 thread would attempt to abort the transaction and stop 
> the heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26123?focusedWorklogId=755127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755127
 ]

ASF GitHub Bot logged work on HIVE-26123:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 08:00
Start Date: 11/Apr/22 08:00
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3196:
URL: https://github.com/apache/hive/pull/3196#discussion_r847047102


##
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java:
##
@@ -240,6 +240,101 @@ public MiniLlapLocalCliConfig() {
 }
   }
 
+  public static class PostgresMetastoreCliConfig extends AbstractCliConfig {
+public PostgresMetastoreCliConfig() {
+  super(CoreCliDriver.class);
+  try {
+setQueryDir("ql/src/test/queries/clientpositive");
+includesFrom(testConfigProps, "ms.postgres.query.files");
+setResultsDir("ql/src/test/results/clientpositive/mspostgres");
+setLogDir("itests/qtest/target/qfile-results/mspostgres");
+setInitScript("q_test_init.sql");
+setCleanupScript("q_test_cleanup.sql");
+setHiveConfDir("data/conf/llap");
+setClusterType(MiniClusterType.LLAP);

Review Comment:
   Which is the least costly (wrt resources) cluster type?
   Do we need to initialize LLAP for this?





Issue Time Tracking
---

Worklog Id: (was: 755127)
Time Spent: 0.5h  (was: 20m)

> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=755120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755120
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 11/Apr/22 07:45
Start Date: 11/Apr/22 07:45
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r847034829


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##
@@ -425,11 +425,11 @@ void findUnknownPartitions(Table table, Set 
partPaths,  Set partPath
 Set correctPartPathsInMS = new HashSet<>(partPathsInMS);
 // remove partition paths in partPathsInMS, to getPartitionsNotOnFs
 partPathsInMS.removeAll(allPartDirs);
-FileSystem fs = tablePath.getFileSystem(conf);
 // There can be edge case where user can define partition directory 
outside of table directory
 // to avoid eviction of such partitions
 // we check for partition path not exists and add to result for 
getPartitionsNotOnFs.
 for (Path partPath : partPathsInMS) {
+  FileSystem fs = partPath.getFileSystem(conf);

Review Comment:
   This could be costly to do it every time.
   Are we expecting this to be different for every `partPath`?





Issue Time Tracking
---

Worklog Id: (was: 755120)
Time Spent: 5h 20m  (was: 5h 10m)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> 

[jira] [Comment Edited] (HIVE-23010) IllegalStateException in tez.ReduceRecordProcessor when containers are being reused

2022-04-11 Thread Wei Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520364#comment-17520364
 ] 

Wei Zhang edited comment on HIVE-23010 at 4/11/22 7:40 AM:
---

This is because the mergejoin operator will be added as the  dummyoperatore's 
child during the first attempt run, and the mergeworklist is cached across 
different attempts.


was (Author: zhangweilst):
This is because the mergejoin operator will be added as the  dummyoperatore's 
child, and the mergeworklist is cached across different attempts.

> IllegalStateException in tez.ReduceRecordProcessor when containers are being 
> reused
> ---
>
> Key: HIVE-23010
> URL: https://issues.apache.org/jira/browse/HIVE-23010
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sebastian Klemke
>Priority: Major
> Attachments: simplified-explain.txt
>
>
> When executing a query in Hive that runs a filesink, mergejoin and two group 
> by operators in a single reduce vertex (reducer 2 in 
> [^simplified-explain.txt]), the following exception occurs 
> non-deterministically:
> {code:java}
> java.lang.RuntimeException: java.lang.IllegalStateException: Was expecting 
> dummy store operator but found: FS[17]
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Was expecting dummy store 
> operator but found: FS[17]
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:421)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:148)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> {code}
> Looking at Yarn logs, IllegalStateException occurs in a container if and only 
> if
>  * the container has been running a task attempt of "Reducer 2" successfully 
> before
>  * the container is then being reused for another task attempt of the same 
> "Reducer 2" vertex
> The same query runs fine with tez.am.container.reuse.enabled=false.
> Apparently, this error occurs deterministically within a container that is 
> being reused for multiple task attempts of the same reduce vertex.
> We have not been able to reproduce this error deterministically or with a 
> smaller execution plan due to low probability of container reuse for same 
> vertex.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-23010) IllegalStateException in tez.ReduceRecordProcessor when containers are being reused

2022-04-11 Thread Wei Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520364#comment-17520364
 ] 

Wei Zhang commented on HIVE-23010:
--

This is because the mergejoin operator will be added as the  dummyoperatore's 
child, and the mergeworklist is cached across different attempts.

> IllegalStateException in tez.ReduceRecordProcessor when containers are being 
> reused
> ---
>
> Key: HIVE-23010
> URL: https://issues.apache.org/jira/browse/HIVE-23010
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sebastian Klemke
>Priority: Major
> Attachments: simplified-explain.txt
>
>
> When executing a query in Hive that runs a filesink, mergejoin and two group 
> by operators in a single reduce vertex (reducer 2 in 
> [^simplified-explain.txt]), the following exception occurs 
> non-deterministically:
> {code:java}
> java.lang.RuntimeException: java.lang.IllegalStateException: Was expecting 
> dummy store operator but found: FS[17]
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Was expecting dummy store 
> operator but found: FS[17]
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:421)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:148)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> {code}
> Looking at Yarn logs, IllegalStateException occurs in a container if and only 
> if
>  * the container has been running a task attempt of "Reducer 2" successfully 
> before
>  * the container is then being reused for another task attempt of the same 
> "Reducer 2" vertex
> The same query runs fine with tez.am.container.reuse.enabled=false.
> Apparently, this error occurs deterministically within a container that is 
> being reused for multiple task attempts of the same reduce vertex.
> We have not been able to reproduce this error deterministically or with a 
> smaller execution plan due to low probability of container reuse for same 
> vertex.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)