date:20210624

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614598
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:11
Start Date: 24/Jun/21 16:11
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658090073



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");

Review comment:
   Ok, thanks, just wanted to doube-check my understanding.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614598)
Time Spent: 1h  (was: 50m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614638
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 17:43
Start Date: 24/Jun/21 17:43
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r658156023



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -51,18 +50,17 @@
  */
 @Description(name = "date_format", value = "_FUNC_(date/timestamp/string, fmt) 
- converts a date/timestamp/string "
 + "to a value of string in the format specified by the date format fmt.",
-extended = "Supported formats are SimpleDateFormat formats - "
-+ 
"https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. "
+extended = "Supported formats are DateTimeFormatter formats - "
++ 
"https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.
 "

Review comment:
   @zabetak I think, if these 2 are the only change, then we can go ahead 
with DateTimeFormatter. What is your opinion?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614638)
Time Spent: 8h 20m  (was: 8h 10m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25213) Implement List getTables() for existing connectors.

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25213?focusedWorklogId=614665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614665
 ]

ASF GitHub Bot logged work on HIVE-25213:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 18:43
Start Date: 24/Jun/21 18:43
Worklog Time Spent: 10m 
  Work Description: dantongdong commented on a change in pull request #2371:
URL: https://github.com/apache/hive/pull/2371#discussion_r658196705



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/AbstractJDBCConnectorProvider.java
##
@@ -192,6 +215,7 @@ protected Connection getConnection() {
   }
 
   table = buildTableFromColsList(tableName, cols);
+  table.setDbName(scoped_db);

Review comment:
   Good catch. Will change!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614665)
Time Spent: 1.5h  (was: 1h 20m)

> Implement List getTables() for existing connectors.
> --
>
> Key: HIVE-25213
> URL: https://issues.apache.org/jira/browse/HIVE-25213
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Dantong Dong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In the initial implementation, connector providers do not implement the 
> getTables(string pattern) spi. We had deferred it for later. Only 
> getTableNames() and getTable() were implemented. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25243?focusedWorklogId=614485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614485
 ]

ASF GitHub Bot logged work on HIVE-25243:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 12:54
Start Date: 24/Jun/21 12:54
Worklog Time Spent: 10m 
  Work Description: maheshk114 merged pull request #2391:
URL: https://github.com/apache/hive/pull/2391


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614485)
Time Spent: 1.5h  (was: 1h 20m)

> Llap external client - Handle nested values when the parent struct is null
> --
>
> Key: HIVE-25243
> URL: https://issues.apache.org/jira/browse/HIVE-25243
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Consider the following table in text format - 
> {code}
> +---+
> |  c8   |
> +---+
> | NULL  |
> | {"r":null,"s":null,"t":null}  |
> | {"r":"a","s":9,"t":2.2}   |
> +---+
> {code}
> When we query above table via llap external client, it throws following 
> exception - 
> {code:java}
> Caused by: java.lang.NullPointerException: src
> at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33)
> at 
> io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537)
> at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199)
> at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486)
> at 
> io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34)
> at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191)
> at 
> org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296)
> at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135)
> {code}
> Created a test to repro it - 
> {code:java}
> /**
>  * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while 
> testing LLAP external client flow.
>  * The aim of turning off LLAP IO is -
>  * when we create table through this test, LLAP caches them and returns the 
> same
>  * when we do a read query, due to this we miss some code paths which may 
> have been hit otherwise.
>  */
> public class TestMiniLlapVectorArrowWithLlapIODisabled extends 
> BaseJdbcWithMiniLlap {
>   @BeforeClass
>   public static void beforeTest() throws Exception {
> HiveConf conf = defaultConf();
> conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true);
> 
> conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, 
> true);
> conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false");
> BaseJdbcWithMiniLlap.beforeTest(conf);
>   }
>   @Override
>   protected InputFormat getInputFormat() {
> //For unit testing, no harm in hard-coding allocator ceiling to 
> LONG.MAX_VALUE
> return new LlapArrowRowInputFormat(Long.MAX_VALUE);
>   }
>   @Test
>   public void testNullsInStructFields() throws Exception {
> createDataTypesTable("datatypes");
> RowCollector2 rowCollector = new RowCollector2();
> // c8 struct
> String query = "select c8 from datatypes";
> int rowCount = processQuery(query, 1, rowCollector);
> assertEquals(3, rowCount);
>   }
> }
> {code}
> Cause - As we see in the table above, first row of the table is NULL, and 
> correspondingly

[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=614500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614500
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 13:16
Start Date: 24/Jun/21 13:16
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2413:
URL: https://github.com/apache/hive/pull/2413#discussion_r657937252



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -176,6 +177,41 @@ public void 
testReplOperationsNotCapturedInNotificationLog() throws Throwable {
 assert lastEventId == currentEventId;
   }
 
+  @Test
+  public void testReadOperationsNotCapturedInNotificationLog() throws 
Throwable {
+//Perform empty bootstrap dump and load
+String dbName = testName.getMethodName();
+String replDbName = "replicated_" + testName.getMethodName();
+try {
+  primary.run("CREATE DATABASE " + dbName + " WITH DBPROPERTIES ( '" +
+  SOURCE_OF_REPLICATION + "' = '1,2,3')");
+  primary.hiveConf.set("hive.txn.readonly.enabled", "true");
+  primary.run("CREATE TABLE " + dbName + ".t1 (id int)");
+  primary.dump(dbName);
+  replica.run("REPL LOAD " + dbName + " INTO " + replDbName);
+  //Perform empty incremental dump and load so that all db level 
properties are altered.
+  primary.dump(dbName);
+  replica.run("REPL LOAD " + dbName + " INTO " + replDbName);
+  primary.run("INSERT INTO " + dbName + ".t1 VALUES(1)");
+  long lastEventId = primary.getCurrentNotificationEventId().getEventId();
+  primary.run("USE " + dbName);
+  primary.run("DESCRIBE DATABASE " + dbName);
+  primary.run("DESCRIBE "+ dbName + ".t1");
+  primary.run("SELECT * FROM " + dbName + ".t1");
+  primary.run("SHOW TABLES " + dbName);
+  primary.run("SHOW TABLE EXTENDED LIKE 't1'");
+  primary.run("SHOW TBLPROPERTIES t1");
+  primary.run("EXPLAIN SELECT * from " + dbName + ".t1");
+  primary.run("SHOW LOCKS");
+  primary.run("EXPLAIN SHOW LOCKS");

Review comment:
   Could you please add test case for 'EXPLAIN LOCKS \' that is 
widely used?
   ```
   EXPLAIN LOCKS UPDATE target SET b = 1 WHERE p IN (SELECT t.q1 FROM source t 
WHERE t.a1=5)'
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614500)
Time Spent: 3h  (was: 2h 50m)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=614531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614531
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 14:11
Start Date: 24/Jun/21 14:11
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1742:
URL: https://github.com/apache/hive/pull/1742


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614531)
Time Spent: 2h 20m  (was: 2h 10m)

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614590
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:04
Start Date: 24/Jun/21 16:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658084643



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");

Review comment:
   They are basically the same, and generated as well.
   Only in partitioned case did I went for checking the partition columns too. 
Just to be sure




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614590)
Time Spent: 40m  (was: 0.5h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25213) Implement List getTables() for existing connectors.

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25213?focusedWorklogId=614663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614663
 ]

ASF GitHub Bot logged work on HIVE-25213:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 18:42
Start Date: 24/Jun/21 18:42
Worklog Time Spent: 10m 
  Work Description: dantongdong commented on a change in pull request #2371:
URL: https://github.com/apache/hive/pull/2371#discussion_r658196105



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/AbstractJDBCConnectorProvider.java
##
@@ -53,7 +53,6 @@
   private static final String JDBC_OUTPUTFORMAT_CLASS = 
"org.apache.hive.storage.jdbc.JdbcOutputFormat".intern();
 
   String type = null; // MYSQL, POSTGRES, ORACLE, DERBY, MSSQL, DB2 etc.
-  String driverClassName = null;

Review comment:
   AbstractJDBCConnectorProvider Class extends 
AbstractDataConnectorProvider(super) Class. driverClassName is already a field 
in the super class. Having driverClassName=null here will overwrite the passed 
in driverClassName by specific provider, as AbstractJDBCConnectorProvider 
constructor is calling its super class constructor. This line is the root cause 
of "Could not find a provider for remote database" error. Removing the line 
will stop the overwritten problem. All the provider can set driverClassName via 
calling AbstractDataConnectorProvider's constructor, which is what 
AbstractJDBCConnectorProvider is doing for all the providers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614663)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement List getTables() for existing connectors.
> --
>
> Key: HIVE-25213
> URL: https://issues.apache.org/jira/browse/HIVE-25213
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Dantong Dong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In the initial implementation, connector providers do not implement the 
> getTables(string pattern) spi. We had deferred it for later. Only 
> getTableNames() and getTable() were implemented. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614615
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:51
Start Date: 24/Jun/21 16:51
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r658120606



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) {
 
   private void deleteParentRecursive(Path parent, int depth, boolean 
mustPurge, boolean needRecycle)

Review comment:
   I think it works if deleteDir fails without side effect when the dir is 
non-empty. But we need to make sure all filesystems that Hive supports work in 
this way. Also I noticed that deleteDir calls moveToTrash first, so it could be 
more complex: 
https://github.com/apache/hive/blob/f2de30ca8bc2b63887496775f9a0769057a17ee0/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreFsImpl.java#L41.
 Avoiding duplicated checks seems to be safer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614615)
Time Spent: 1h  (was: 50m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614593
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:05
Start Date: 24/Jun/21 16:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658085550



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithInsertOverwrite() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,

Review comment:
   My thought process was the following:
   - We need a basic test - so created unpartitioned insert
   - We need to test partitioned tables (what happens with the partition 
columns) - so created partitioned test (not sure that this is strictly needed)
   - We need to test IOW - so created IOW
   
   I am open to discussions either way




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614593)
Time Spent: 50m  (was: 40m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614601
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:14
Start Date: 24/Jun/21 16:14
Worklog Time Spent: 10m 
  Work Description: medb commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r658092035



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) {
 
   private void deleteParentRecursive(Path parent, int depth, boolean 
mustPurge, boolean needRecycle)

Review comment:
   I see. Did we consider just calling a non-recursive `deleteDir` on 
parent instead of checking if it's empty and based on the delete 
success/failure try to delete its parent recursively?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614601)
Time Spent: 40m  (was: 0.5h)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614614
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:49
Start Date: 24/Jun/21 16:49
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r658120606



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) {
 
   private void deleteParentRecursive(Path parent, int depth, boolean 
mustPurge, boolean needRecycle)

Review comment:
   I think it works if deleteDir fails when the dir is non-empty with no 
side effect. But we need to make sure all filesystems that Hive supports work 
in this way. Also I noticed that deleteDir calls moveToTrash first, so it could 
be more complex: 
https://github.com/apache/hive/blob/f2de30ca8bc2b63887496775f9a0769057a17ee0/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreFsImpl.java#L41.
 Avoiding duplicated checks seems to be safer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614614)
Time Spent: 50m  (was: 40m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25137) getAllWriteEventInfo should go through the HMS client instead of using RawStore directly

2021-06-24 Thread Yu-Wen Lai (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369003#comment-17369003
 ] 

Yu-Wen Lai commented on HIVE-25137:
---

[~pmadhukar] There is no update on the PR for several weeks. Will you follow up 
the PR? If you won't, could I pick up this task?

> getAllWriteEventInfo should go through the HMS client instead of using 
> RawStore directly
> 
>
> Key: HIVE-25137
> URL: https://issues.apache.org/jira/browse/HIVE-25137
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pratyush Madhukar
>Assignee: Pratyush Madhukar
>Priority: Major
>
> {code:java}
> private List getAllWriteEventInfo(Context withinContext) 
> throws Exception {
> String contextDbName = 
> StringUtils.normalizeIdentifier(withinContext.replScope.getDbName());
> RawStore rawStore = 
> HiveMetaStore.HMSHandler.getMSForConf(withinContext.hiveConf);
> List writeEventInfoList
> = rawStore.getAllWriteEventInfo(eventMessage.getTxnId(), 
> contextDbName, null);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614640
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 17:51
Start Date: 24/Jun/21 17:51
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r658162341



##
File path: ql/src/test/queries/clientpositive/udf_date_format.q
##
@@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd 
HH:mm:ss.SSS z");
 --julian date
 set hive.local.time.zone=UTC;
 select date_format("1001-01-05","dd---MM--");
+
+--dates prior to 1900
+set hive.local.time.zone=Asia/Bangkok;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Europe/Berlin;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Africa/Johannesburg;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');

Review comment:
   @guptanikhil007 
   I think, @zabetak point is what will be the interpretation of timezone for a 
timestamp value stored in a table. Will it be treated as UTC or local timezone? 
If local timezone, then will the output changes when we change the local 
timezone config?
   From the code, it seems `yes`. But, I guess, the behavior is same with 
`SimpleDateFormat `as well. Pls confirm.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614640)
Time Spent: 8.5h  (was: 8h 20m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614458
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 12:11
Start Date: 24/Jun/21 12:11
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657889399



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -353,6 +375,23 @@ private RelNode toJoinInsertIncremental(
   basePlan, mdProvider, executorProvider, 
HiveJoinInsertIncrementalRewritingRule.INSTANCE);
 }
 
+private RelNode toPartitionInsertOverwrite(
+RelNode basePlan, RelMetadataProvider mdProvider, RexExecutor 
executorProvider,
+HiveRelOptMaterialization materialization, RelNode 
calcitePreMVRewritingPlan) {
+
+  if (materialization.isSourceTablesUpdateDeleteModified()) {
+return calcitePreMVRewritingPlan;
+  }
+
+  RelOptHiveTable hiveTable = (RelOptHiveTable) 
materialization.tableRel.getTable();
+  if (!AcidUtils.isInsertOnlyTable(hiveTable.getHiveTableMD())) {
+return applyPreJoinOrderingTransforms(basePlan, mdProvider, 
executorProvider);
+  }
+
+  return toIncrementalRebuild(
+  basePlan, mdProvider, executorProvider, 
HiveAggregatePartitionIncrementalRewritingRule.INSTANCE);

Review comment:
   I experienced that currently we can handle non-aggregate cases by the 
record level incremental join rules. If those are not applicable we don not 
even have a union based plan to transform further to an incremental one.
   
   However I'm happy to include the non-aggregate variant if a use case is 
found.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614458)
Time Spent: 2h  (was: 1h 50m)

> Incremental rebuild of partitioned insert only materialized views
> -
>
> Key: HIVE-25253
> URL: https://issues.apache.org/jira/browse/HIVE-25253
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614459
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 12:11
Start Date: 24/Jun/21 12:11
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657889592



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -353,6 +375,23 @@ private RelNode toJoinInsertIncremental(
   basePlan, mdProvider, executorProvider, 
HiveJoinInsertIncrementalRewritingRule.INSTANCE);
 }
 
+private RelNode toPartitionInsertOverwrite(
+RelNode basePlan, RelMetadataProvider mdProvider, RexExecutor 
executorProvider,
+HiveRelOptMaterialization materialization, RelNode 
calcitePreMVRewritingPlan) {
+
+  if (materialization.isSourceTablesUpdateDeleteModified()) {
+return calcitePreMVRewritingPlan;
+  }
+
+  RelOptHiveTable hiveTable = (RelOptHiveTable) 
materialization.tableRel.getTable();
+  if (!AcidUtils.isInsertOnlyTable(hiveTable.getHiveTableMD())) {

Review comment:
   I tested scenarios when the view definition has aggregate functions like 
`avg`, `std`, `variance`.
   These functions are represented in the Calcite plan by a formula which input 
is usually `sum` and `count`. Example:
   ```
   HiveProject(b=[$0], avgc=[/(CAST($2):DOUBLE, $3)], a=[$1])
 HiveAggregate(group=[{0, 1}], agg#0=[sum($2)], agg#1=[count($2)])
   HiveProject($f0=[$1], $f1=[$0], $f2=[$2])
 HiveTableScan(table=[[default, t1]], table:alias=[t1])
   ```
   This type of plan is not converted to a Union based MV rewrite so execution 
doesn't even reach the incremental rewriting rules.
   It seems that the rules generates the Union based MV rewrite should be 
improved first.
   
   Adding TODO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614459)
Time Spent: 2h 10m  (was: 2h)

> Incremental rebuild of partitioned insert only materialized views
> -
>
> Key: HIVE-25253
> URL: https://issues.apache.org/jira/browse/HIVE-25253
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-21614) Derby does not support CLOB comparisons

2021-06-24 Thread Piotr Findeisen (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-21614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368778#comment-17368778
 ] 

Piotr Findeisen commented on HIVE-21614:


[~hankfanchiu] , in Trino i worked around this limitation by issuing filter 
with `LIKE` instead of `=` predicate.

The code is here 

https://github.com/trinodb/trino/blob/d95eafe397fe4b476b2b1a73baeb7643349d4bdb/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java#L973-L981

> Derby does not support CLOB comparisons
> ---
>
> Key: HIVE-21614
> URL: https://issues.apache.org/jira/browse/HIVE-21614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.0.0
>Reporter: Vlad Rozov
>Priority: Major
>
> HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes 
> exception with Derby DB:
> {noformat}
> Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB 
> (UCS_BASIC)' are not supported. Types must be comparable. String types must 
> also have matching collation. If collation does not match, a possible 
> solution is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
> 'T1')
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown
>  Source)
>   at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown 
> Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>   at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>   ... 42 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=614498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614498
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 13:15
Start Date: 24/Jun/21 13:15
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2413:
URL: https://github.com/apache/hive/pull/2413#discussion_r657937252



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -176,6 +177,41 @@ public void 
testReplOperationsNotCapturedInNotificationLog() throws Throwable {
 assert lastEventId == currentEventId;
   }
 
+  @Test
+  public void testReadOperationsNotCapturedInNotificationLog() throws 
Throwable {
+//Perform empty bootstrap dump and load
+String dbName = testName.getMethodName();
+String replDbName = "replicated_" + testName.getMethodName();
+try {
+  primary.run("CREATE DATABASE " + dbName + " WITH DBPROPERTIES ( '" +
+  SOURCE_OF_REPLICATION + "' = '1,2,3')");
+  primary.hiveConf.set("hive.txn.readonly.enabled", "true");
+  primary.run("CREATE TABLE " + dbName + ".t1 (id int)");
+  primary.dump(dbName);
+  replica.run("REPL LOAD " + dbName + " INTO " + replDbName);
+  //Perform empty incremental dump and load so that all db level 
properties are altered.
+  primary.dump(dbName);
+  replica.run("REPL LOAD " + dbName + " INTO " + replDbName);
+  primary.run("INSERT INTO " + dbName + ".t1 VALUES(1)");
+  long lastEventId = primary.getCurrentNotificationEventId().getEventId();
+  primary.run("USE " + dbName);
+  primary.run("DESCRIBE DATABASE " + dbName);
+  primary.run("DESCRIBE "+ dbName + ".t1");
+  primary.run("SELECT * FROM " + dbName + ".t1");
+  primary.run("SHOW TABLES " + dbName);
+  primary.run("SHOW TABLE EXTENDED LIKE 't1'");
+  primary.run("SHOW TBLPROPERTIES t1");
+  primary.run("EXPLAIN SELECT * from " + dbName + ".t1");
+  primary.run("SHOW LOCKS");
+  primary.run("EXPLAIN SHOW LOCKS");

Review comment:
   Could you please add test case for 'EXPLAIN LOCKS ' that is 
widely used?
   ```
   EXPLAIN LOCKS UPDATE target SET b = 1 WHERE p IN (SELECT t.q1 FROM source t 
WHERE t.a1=5)'
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614498)
Time Spent: 2h 50m  (was: 2h 40m)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614550
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:04
Start Date: 24/Jun/21 15:04
Worklog Time Spent: 10m 
  Work Description: medb commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r658031926



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) {
 
   private void deleteParentRecursive(Path parent, int depth, boolean 
mustPurge, boolean needRecycle)

Review comment:
   Is there a reason why it can not use [HCFS API to delete dir 
recursively](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#delete-org.apache.hadoop.fs.Path-boolean-)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614550)
Time Spent: 20m  (was: 10m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614553
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:13
Start Date: 24/Jun/21 15:13
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r658040342



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -106,25 +102,18 @@ public Object evaluate(DeferredObject[] arguments) throws 
HiveException {
   return null;
 }
 
-ZoneId id = (SessionState.get() == null) ? new 
HiveConf().getLocalTimeZone() : SessionState.get().getConf()
-.getLocalTimeZone();
 // the function should support both short date and full timestamp format
 // time part of the timestamp should not be skipped
 Timestamp ts = getTimestampValue(arguments, 0, tsConverters);
+
 if (ts == null) {
-  Date d = getDateValue(arguments, 0, dtInputTypes, dtConverters);
-  if (d == null) {
-return null;
-  }
-  ts = Timestamp.ofEpochMilli(d.toEpochMilli(id), id);
+  return null;

Review comment:
   It works with Timestamp converter

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -17,31 +17,30 @@
  */
 package org.apache.hadoop.hive.ql.udf.generic;
 
-import static 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping.DATE_GROUP;
-import static 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP;
-
-import java.text.SimpleDateFormat;
-import java.time.Instant;
-import java.time.LocalDateTime;
-import java.time.ZoneId;
-import java.time.ZoneOffset;
-
-import org.apache.hadoop.hive.common.type.Date;
 import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.common.type.TimestampTZUtil;

Review comment:
   ok
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614553)
Time Spent: 8h  (was: 7h 50m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive

2021-06-24 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-25286:
-


> Set stats to inaccurate when an Iceberg table is modified outside Hive
> --
>
> Key: HIVE-25286
> URL: https://issues.apache.org/jira/browse/HIVE-25286
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> When an Iceberg table is modified outside of Hive then the stats should be 
> set to inaccurate since there is no way to ensure that the HMS stats are 
> updated correctly and this could cause incorrect query results.
> The proposed solution is only working for HiveCatalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25285) Retire HiveProjectJoinTransposeRule

2021-06-24 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368783#comment-17368783
 ] 

Zoltan Haindrich commented on HIVE-25285:
-

suggested by Jesus 
[here|https://github.com/apache/hive/pull/2423#issuecomment-867355015]

we could probably also remove some other rules which were copied


> Retire HiveProjectJoinTransposeRule
> ---
>
> Key: HIVE-25285
> URL: https://issues.apache.org/jira/browse/HIVE-25285
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>
> we don't neccessary need our own rule anymore - a plain 
> ProjectJoinTransposeRule  could probably work



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=614460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614460
 ]

ASF GitHub Bot logged work on HIVE-25246:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 12:15
Start Date: 24/Jun/21 12:15
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2396:
URL: https://github.com/apache/hive/pull/2396#discussion_r657888141



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1122,6 +1154,49 @@ public void abortTxns(AbortTxnsRequest rqst) throws 
MetaException {
 }
   }
 
+  private void markDbAsReplIncompatible(Connection dbConn, String database) 
throws SQLException, MetaException {

Review comment:
   We needn't to have almost a copy of updateReplId(). If you need similar 
code for both externalize that part. I was also wondering if we would ever need 
and update.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -124,6 +124,11 @@ public int execute() {
 try {
   long loadTaskStartTime = System.currentTimeMillis();
   SecurityUtils.reloginExpiringKeytabUser();
+  //Don't proceed if target db is replication incompatible.
+  Database targetDb = getHive().getDatabase(work.dbNameToLoadIn);
+  if (targetDb != null && MetaStoreUtils.isDbReplIncompatible(targetDb)) {
+throw new 
SemanticException(ErrorMsg.REPL_INCOMPATIBLE_EXCEPTION.getMsg());

Review comment:
   Add DB name as well

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -139,6 +142,40 @@ public void tearDown() throws Throwable {
 primary.run("drop database if exists " + primaryDbName + "_extra cascade");
   }
 
+  @Test
+  public void testTargetDbReplIncompatible() throws Throwable {
+HiveConf primaryConf = primary.getConf();
+TxnStore txnHandler = TxnUtils.getTxnStore(primary.getConf());
+
+primary.run("use " + primaryDbName)
+.run("CREATE TABLE t1(a string) STORED AS TEXTFILE")
+.dump(primaryDbName);
+replica.load(replicatedDbName, primaryDbName);
+
+
assertFalse(MetaStoreUtils.isDbReplIncompatible(replica.getDatabase(replicatedDbName)));
+
+Long sourceTxnId = openTxns(1, txnHandler, primaryConf).get(0);
+txnHandler.abortTxn(new AbortTxnRequest(sourceTxnId));
+
+sourceTxnId = openTxns(1, txnHandler, primaryConf).get(0);
+
+primary.dump(primaryDbName);
+replica.load(replicatedDbName, primaryDbName);
+
assertFalse(MetaStoreUtils.isDbReplIncompatible(replica.getDatabase(replicatedDbName)));
+
+Long targetTxnId = 
txnHandler.getTargetTxnId(HiveUtils.getReplPolicy(replicatedDbName), 
sourceTxnId);
+txnHandler.abortTxn(new AbortTxnRequest(targetTxnId));
+
assertTrue(MetaStoreUtils.isDbReplIncompatible(replica.getDatabase(replicatedDbName)));
+
+WarehouseInstance.Tuple dumpData = primary.dump(primaryDbName);
+
+assertFalse(ReplUtils.failedWithNonRecoverableError(new 
Path(dumpData.dumpLocation), conf));
+replica.loadFailure(replicatedDbName, primaryDbName);
+assertTrue(ReplUtils.failedWithNonRecoverableError(new 
Path(dumpData.dumpLocation), conf));
+
+primary.dumpFailure(primaryDbName);

Review comment:
   Check for this :  
assertTrue(ReplUtils.failedWithNonRecoverableError(new 
Path(dumpData.dumpLocation), conf));
   event after dump failure




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614460)
Time Spent: 2h 40m  (was: 2.5h)

> Fix the clean up of open repl created transactions
> --
>
> Key: HIVE-25246
> URL: https://issues.apache.org/jira/browse/HIVE-25246
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871
 ] 

David Mollitor commented on HIVE-24484:
---

[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871
 ] 

David Mollitor edited comment on HIVE-24484 at 6/24/21, 2:05 PM:
-

[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers#getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null}} value and then it was up to the caller to 
create a new one and initialize it (providing the Configuration object).  It 
seems like in 3.3.1, it now always returns a value, but it looks like the 
initialization isn't what Hive is expecting.  The initialization  
{{refreshSuperUserGroupsConfiguration}} creates its own (empty) configuration 
whereas before Hive was passing in its own Configuration.


was (Author: belugabehr):
[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null}} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=614530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614530
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 14:08
Start Date: 24/Jun/21 14:08
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1742:
URL: https://github.com/apache/hive/pull/1742


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614530)
Time Spent: 2h 10m  (was: 2h)

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614561
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:26
Start Date: 24/Jun/21 15:26
Worklog Time Spent: 10m 
  Work Description: ranu010101 commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r658051894



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) {
 
   private void deleteParentRecursive(Path parent, int depth, boolean 
mustPurge, boolean needRecycle)

Review comment:
   HCFS Api deletes all children recursively while this method 
(deleteParentRecursive) deletes a file and keeps on deleting parent directories 
if they are empty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614561)
Time Spent: 0.5h  (was: 20m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614585
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:50
Start Date: 24/Jun/21 15:50
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r658073147



##
File path: ql/src/test/queries/clientpositive/udf_date_format.q
##
@@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd 
HH:mm:ss.SSS z");
 --julian date
 set hive.local.time.zone=UTC;
 select date_format("1001-01-05","dd---MM--");
+
+--dates prior to 1900
+set hive.local.time.zone=Asia/Bangkok;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Europe/Berlin;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Africa/Johannesburg;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614585)
Time Spent: 8h 10m  (was: 8h)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25278) HiveProjectJoinTransposeRule may do invalid transformations with windowing expressions

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25278?focusedWorklogId=614450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614450
 ]

ASF GitHub Bot logged work on HIVE-25278:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 11:19
Start Date: 24/Jun/21 11:19
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2423:
URL: https://github.com/apache/hive/pull/2423#issuecomment-867554905


   definetly; the two rules seemed almost identical!
   opened: HIVE-25285


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614450)
Time Spent: 0.5h  (was: 20m)

> HiveProjectJoinTransposeRule may do invalid transformations with windowing 
> expressions 
> ---
>
> Key: HIVE-25278
> URL: https://issues.apache.org/jira/browse/HIVE-25278
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> running
> {code}
> create table table1 (acct_num string, interest_rate decimal(10,7)) stored as 
> orc;
> create table table2 (act_id string) stored as orc;
> CREATE TABLE temp_output AS
> SELECT act_nbr, row_num
> FROM (SELECT t2.act_id as act_nbr,
> row_number() over (PARTITION BY trim(acct_num) ORDER BY interest_rate DESC) 
> AS row_num
> FROM table1 t1
> INNER JOIN table2 t2
> ON trim(acct_num) = t2.act_id) t
> WHERE t.row_num = 1;
> {code}
> may result in error like:
> {code}
> Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
> Invalid column reference 'interest_rate': (possible column names are: 
> interest_rate, trim) (state=42000,code=4)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-24 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25208.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~kuczoram] and [~Marton Bod]!

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25080) Create metric about oldest entry in "ready for cleaning" state

2021-06-24 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-25080:
-
Parent: HIVE-24824
Issue Type: Sub-task  (was: Bug)

> Create metric about oldest entry in "ready for cleaning" state
> --
>
> Key: HIVE-25080
> URL: https://issues.apache.org/jira/browse/HIVE-25080
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a compaction txn commits, COMPACTION_QUEUE.CQ_COMMIT_TIME is updated 
> with the current time. Then the compaction state is set to "ready for 
> cleaning". (... and then the Cleaner runs and the state is set to "succeeded" 
> hopefully)
> Based on this we know (roughly) how long a compaction has been in state 
> "ready for cleaning".
> We should create a metric similar to compaction_oldest_enqueue_age_in_sec 
> that would show that the cleaner is blocked by something i.e. find the 
> compaction in "ready for cleaning" that has the oldest commit time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-06-24 Thread Nikhil Gupta (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368895#comment-17368895
 ] 

Nikhil Gupta edited comment on HIVE-25104 at 6/24/21, 2:44 PM:
---

I have a list of issues which we can club together: (These are after hive 3.1 
release):
 # HIVE-25268: date_format udf doesn't work for dates prior to 1900 if the 
timezone is different from UTC
 # HIVE-25093: date_format() UDF is returning output in UTC time zone only 
(Ashish Sharma, reviewed by Adesh Rao, Nikhil Gupta, Sankar Hariappan)
 # HIVE-25104: Backward incompatible timestamp serialization in Parquet for 
certain timezones (Stamatis Zampetakis, reviewed by Jesus Camacho Rodriguez)
 # HIVE-24113: NPE in GenericUDFToUnixTimeStamp (#1460) (Raj Kumar Singh, 
reviewed by Zoltan Haindrich and Laszlo Pinter)
 # HIVE-24074: Incorrect handling of timestamp in Parquet/Avro when written in 
certain time zones in versions before Hive 3.x (Jesus Camacho Rodriguez, 
reviewed by Prasanth Jayachandran)
 # HIVE-22840: Race condition in formatters of TimestampColumnVector and 
DateColumnVector (Shubham Chaurasia, reviewed by Jesus Camacho Rodriguez)
 # HIVE-22589: Add storage support for ProlepticCalendar in ORC, Parquet, and 
Avro (Jesus Camacho Rodriguez, reviewed by David Lavati, Lszl 
Bodor, Prasanth Jayachandran)
 # HIVE-22405: Add ColumnVector support for ProlepticCalendar 
(Lszl Bodor via Owen O'Malley, Jesus Camacho Rodriguez)
 # HIVE-22331: unix_timestamp without argument returns timestamp in millisecond 
instead of second (Naresh P R, reviewed Jesus Camacho Rodriguez)
 # HIVE-22170: from_unixtime and unix_timestamp should use user session time 
zone (Jesus Camacho Rodriguez, reviewed by Vineet Garg)
 # HIVE-21729: Arrow serializer sometimes shifts timestamp by one second 
(Shubham Chaurasia, reviewed by Sankar Hariappan)
 # HIVE-21291: Restore historical way of handling timestamps in Avro while 
keeping the new semantics at the same time (Karen Coppage, reviewed by Jesus 
Camacho Rodriguez)


was (Author: gupta.nikhil0007):
I have a list of issues which we can club together: (These are after hive 3.1 
release):
 # HIVE-25093: date_format() UDF is returning output in UTC time zone only 
(Ashish Sharma, reviewed by Adesh Rao, Nikhil Gupta, Sankar Hariappan)
 # HIVE-25104: Backward incompatible timestamp serialization in Parquet for 
certain timezones (Stamatis Zampetakis, reviewed by Jesus Camacho Rodriguez)
 # HIVE-24113: NPE in GenericUDFToUnixTimeStamp (#1460) (Raj Kumar Singh, 
reviewed by Zoltan Haindrich and Laszlo Pinter)
 # HIVE-24074: Incorrect handling of timestamp in Parquet/Avro when written in 
certain time zones in versions before Hive 3.x (Jesus Camacho Rodriguez, 
reviewed by Prasanth Jayachandran)
 # HIVE-22840: Race condition in formatters of TimestampColumnVector and 
DateColumnVector (Shubham Chaurasia, reviewed by Jesus Camacho Rodriguez)
 # HIVE-22589: Add storage support for ProlepticCalendar in ORC, Parquet, and 
Avro (Jesus Camacho Rodriguez, reviewed by David Lavati, Lszl 
Bodor, Prasanth Jayachandran)
 # HIVE-22405: Add ColumnVector support for ProlepticCalendar 
(Lszl Bodor via Owen O'Malley, Jesus Camacho Rodriguez)
 # HIVE-22331: unix_timestamp without argument returns timestamp in millisecond 
instead of second (Naresh P R, reviewed Jesus Camacho Rodriguez)
 # HIVE-22170: from_unixtime and unix_timestamp should use user session time 
zone (Jesus Camacho Rodriguez, reviewed by Vineet Garg)
 # HIVE-21729: Arrow serializer sometimes shifts timestamp by one second 
(Shubham Chaurasia, reviewed by Sankar Hariappan)
 # HIVE-21291: Restore historical way of handling timestamps in Avro while 
keeping the new semantics at the same time (Karen Coppage, reviewed by Jesus 
Camacho Rodriguez)

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to

[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614548
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 14:59
Start Date: 24/Jun/21 14:59
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r658025667



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -106,25 +102,18 @@ public Object evaluate(DeferredObject[] arguments) throws 
HiveException {
   return null;
 }
 
-ZoneId id = (SessionState.get() == null) ? new 
HiveConf().getLocalTimeZone() : SessionState.get().getConf()
-.getLocalTimeZone();
 // the function should support both short date and full timestamp format
 // time part of the timestamp should not be skipped
 Timestamp ts = getTimestampValue(arguments, 0, tsConverters);
+
 if (ts == null) {
-  Date d = getDateValue(arguments, 0, dtInputTypes, dtConverters);
-  if (d == null) {
-return null;
-  }
-  ts = Timestamp.ofEpochMilli(d.toEpochMilli(id), id);
+  return null;

Review comment:
   Why DateConverter is removed? Isn't it needed to convert input like 
"2021-06-24" or does it work with TimestampCoverter?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -17,31 +17,30 @@
  */
 package org.apache.hadoop.hive.ql.udf.generic;
 
-import static 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping.DATE_GROUP;
-import static 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP;
-
-import java.text.SimpleDateFormat;
-import java.time.Instant;
-import java.time.LocalDateTime;
-import java.time.ZoneId;
-import java.time.ZoneOffset;
-
-import org.apache.hadoop.hive.common.type.Date;
 import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.common.type.TimestampTZUtil;

Review comment:
   Unused class. Can be removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614548)
Time Spent: 7h 50m  (was: 7h 40m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614577
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:38
Start Date: 24/Jun/21 15:38
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658062987



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");

Review comment:
   Are the columns stats gathered for `first_name` and `last_name` as well, 
we're just saving on the number of describe calls?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614577)
Time Spent: 0.5h  (was: 20m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25284) There was an issue in the software

2021-06-24 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25284.
---
Resolution: Information Provided

Apache Hive project developers can not help with TSplus purchases

> There was an issue in the software 
> ---
>
> Key: HIVE-25284
> URL: https://issues.apache.org/jira/browse/HIVE-25284
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: GillAdam
>Priority: Minor
>
> i had purchased a software of TSplus using the [TSplus coupon 
> code|https://discountshelp.com/coupon-store/tsplus-coupon-code/] and then the 
> code was not applied to my purchase so tell me what is the issue 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614462
 ]

ASF GitHub Bot logged work on HIVE-25250:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 12:24
Start Date: 24/Jun/21 12:24
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2404:
URL: https://github.com/apache/hive/pull/2404#issuecomment-867594564


   you have modified the default value of retries inside  
`jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java` in an unrelated 
changeset - and this changeset  follows that pattern;
   
   I think going from 5 to 3 would be okayish but going from 1 to anything else 
is not the same - because you will enable the retries by default.
   
   I don't think we should change things like that...that change should have 
done in a separate patch - because it have touched production code; and it 
changed the default behaviour as well.
   
   It seems like `maxRetries` is not documented here 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
   
   I think the correct approach is to set it back to 1 and change the retry 
number for the tests thru the jdbc url.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614462)
Time Spent: 1h 40m  (was: 1.5h)

> Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
> --
>
> Key: HIVE-25250
> URL: https://issues.apache.org/jira/browse/HIVE-25250
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23456) Upgrade Calcite version to 1.25.0

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23456?focusedWorklogId=614541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614541
 ]

ASF GitHub Bot logged work on HIVE-23456:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 14:32
Start Date: 24/Jun/21 14:32
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #2203:
URL: https://github.com/apache/hive/pull/2203


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614541)
Time Spent: 40m  (was: 0.5h)

> Upgrade Calcite version to 1.25.0
> -
>
> Key: HIVE-23456
> URL: https://issues.apache.org/jira/browse/HIVE-23456
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23456.01.patch, HIVE-23456.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871
 ] 

David Mollitor edited comment on HIVE-24484 at 6/24/21, 1:55 PM:
-

[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null}} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.


was (Author: belugabehr):
[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23456) Upgrade Calcite version to 1.25.0

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-23456:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~soumyakanti.das], [~zabetak]!

> Upgrade Calcite version to 1.25.0
> -
>
> Key: HIVE-23456
> URL: https://issues.apache.org/jira/browse/HIVE-23456
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23456.01.patch, HIVE-23456.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-22254) Mappings.NoElementException: no target in mapping, in `MaterializedViewAggregateRule

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-22254.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Mappings.NoElementException: no target in mapping, in 
> `MaterializedViewAggregateRule
> 
>
> Key: HIVE-22254
> URL: https://issues.apache.org/jira/browse/HIVE-22254
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: ojoin_full.sql
>
>
> A Mappings.NoElementException happens on an edge condition for a query using 
> a materialized view.
> The query contains a "group by" clause which contains fields from both sides 
> of a join.  There is no real reason to group by this same field twice, but 
> there is also no reason that this shouldn't succeed.
> Attached is a script which causes this failure.  The query causing the 
> problem looks like this:
> explain extended select sum(1)
> from fact inner join dim1
> on fact.f1 = dim1.pk1
> group by f1, pk1;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-06-24 Thread Nikhil Gupta (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368895#comment-17368895
 ] 

Nikhil Gupta commented on HIVE-25104:
-

I have a list of issues which we can club together: (These are after hive 3.1 
release):
 # HIVE-25093: date_format() UDF is returning output in UTC time zone only 
(Ashish Sharma, reviewed by Adesh Rao, Nikhil Gupta, Sankar Hariappan)
 # HIVE-25104: Backward incompatible timestamp serialization in Parquet for 
certain timezones (Stamatis Zampetakis, reviewed by Jesus Camacho Rodriguez)
 # HIVE-24113: NPE in GenericUDFToUnixTimeStamp (#1460) (Raj Kumar Singh, 
reviewed by Zoltan Haindrich and Laszlo Pinter)
 # HIVE-24074: Incorrect handling of timestamp in Parquet/Avro when written in 
certain time zones in versions before Hive 3.x (Jesus Camacho Rodriguez, 
reviewed by Prasanth Jayachandran)
 # HIVE-22840: Race condition in formatters of TimestampColumnVector and 
DateColumnVector (Shubham Chaurasia, reviewed by Jesus Camacho Rodriguez)
 # HIVE-22589: Add storage support for ProlepticCalendar in ORC, Parquet, and 
Avro (Jesus Camacho Rodriguez, reviewed by David Lavati, Lszl 
Bodor, Prasanth Jayachandran)
 # HIVE-22405: Add ColumnVector support for ProlepticCalendar 
(Lszl Bodor via Owen O'Malley, Jesus Camacho Rodriguez)
 # HIVE-22331: unix_timestamp without argument returns timestamp in millisecond 
instead of second (Naresh P R, reviewed Jesus Camacho Rodriguez)
 # HIVE-22170: from_unixtime and unix_timestamp should use user session time 
zone (Jesus Camacho Rodriguez, reviewed by Vineet Garg)
 # HIVE-21729: Arrow serializer sometimes shifts timestamp by one second 
(Shubham Chaurasia, reviewed by Sankar Hariappan)
 # HIVE-21291: Restore historical way of handling timestamps in Avro while 
keeping the new semantics at the same time (Karen Coppage, reviewed by Jesus 
Camacho Rodriguez)

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614574
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:36
Start Date: 24/Jun/21 15:36
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658061851



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithInsertOverwrite() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,

Review comment:
   If we have test cases for unpartitioned insert, partitioned insert, 
unpartitioned IOW, should we have a test case for partitioned IOW as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614574)
Time Spent: 20m  (was: 10m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25286?focusedWorklogId=614575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614575
 ]

ASF GitHub Bot logged work on HIVE-25286:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:36
Start Date: 24/Jun/21 15:36
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #2427:
URL: https://github.com/apache/hive/pull/2427


   ### What changes were proposed in this pull request?
   
   - Introduce a new configuration property: `iceberg.hive.keep.stats`. If this 
property is set then we keep the statistics at a new Iceberg commit. Otherwise 
we invalidate the stats as we can not make sure that they are correct.
   - Fix a NullPointerExeption in HiveIcebergMetaHook which happens when we are 
storing stats.
   - Adds a new unit tests, and enhances the check that the stat values are 
accurate
   
   Also contains HIVE-25276 as a base.
   
   ### Why are the changes needed?
   When someone modifies the Iceberg table outside of Hive then we should make 
sure that the stats are invalid
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Extra unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614575)
Remaining Estimate: 0h
Time Spent: 10m

> Set stats to inaccurate when an Iceberg table is modified outside Hive
> --
>
> Key: HIVE-25286
> URL: https://issues.apache.org/jira/browse/HIVE-25286
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When an Iceberg table is modified outside of Hive then the stats should be 
> set to inaccurate since there is no way to ensure that the HMS stats are 
> updated correctly and this could cause incorrect query results.
> The proposed solution is only working for HiveCatalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25286:
--
Labels: pull-request-available  (was: )

> Set stats to inaccurate when an Iceberg table is modified outside Hive
> --
>
> Key: HIVE-25286
> URL: https://issues.apache.org/jira/browse/HIVE-25286
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When an Iceberg table is modified outside of Hive then the stats should be 
> set to inaccurate since there is no way to ensure that the HMS stats are 
> updated correctly and this could cause incorrect query results.
> The proposed solution is only working for HiveCatalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614833
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 04:32
Start Date: 25/Jun/21 04:32
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r658156023



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java
##
@@ -51,18 +50,17 @@
  */
 @Description(name = "date_format", value = "_FUNC_(date/timestamp/string, fmt) 
- converts a date/timestamp/string "
 + "to a value of string in the format specified by the date format fmt.",
-extended = "Supported formats are SimpleDateFormat formats - "
-+ 
"https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. "
+extended = "Supported formats are DateTimeFormatter formats - "
++ 
"https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.
 "

Review comment:
   @zabetak I think, if these 2 are the only difference, then we can go 
ahead with DateTimeFormatter. What is your opinion?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614833)
Time Spent: 8h 40m  (was: 8.5h)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
> --
>
> Key: HIVE-25268
> URL: https://issues.apache.org/jira/browse/HIVE-25268
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> *Hive 1.2.1*:
> {code:java}
>  select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1400-01-14 01:00:00 ICT  |
> +--+--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+--+
> |   _c0|
> +--+--+
> | 1800-01-14 01:00:00 ICT  |
> +--+--+
> {code}
> *Hive 3.1, Hive 4.0:*
> {code:java}
> select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1400-01-06 01:17:56 ICT  |
> +--+
> select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z');
> +--+
> |   _c0|
> +--+
> | 1800-01-14 01:17:56 ICT  |
> +--+
> {code}
> VM timezone is set to 'Asia/Bangkok'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614835
 ]

ASF GitHub Bot logged work on HIVE-25268:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 05:05
Start Date: 25/Jun/21 05:05
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on a change in pull request 
#2409:
URL: https://github.com/apache/hive/pull/2409#discussion_r658476821



##
File path: ql/src/test/queries/clientpositive/udf_date_format.q
##
@@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd 
HH:mm:ss.SSS z");
 --julian date
 set hive.local.time.zone=UTC;
 select date_format("1001-01-05","dd---MM--");
+
+--dates prior to 1900
+set hive.local.time.zone=Asia/Bangkok;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Europe/Berlin;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');
+
+set hive.local.time.zone=Africa/Johannesburg;
+select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z');

Review comment:
   All the timestamps are considered to be in local time zone in both Hive 
1.2 and Hive master (patched).
   Both Hive 1.2 and 3.1 have similar results for the date_format function.
   E.g.: 
   Hive 1.2.1 (with timezone set to Asia/Bangkok)
   ```
   0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select * from test_tbl_orc;
   +--+--+--+
   | test_tbl_orc.clusterversion  |test_tbl_orc.col2 |
   +--+--+--+
   | Hive 4.0 | 1800-01-14 01:01:10.123  |
   +--+--+--+
   1 row selected (0.372 seconds)
   0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> desc test_tbl_orc;
   +-++--+--+
   |col_name | data_type  | comment  |
   +-++--+--+
   | clusterversion  | string |  |
   | col2| timestamp  |  |
   +-++--+--+
   2 rows selected (0.65 seconds)
   0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select date_format(col2, 
"-MM-dd HH:mm:ss.SSS z") from test_tbl_orc;
   +--+--+
   | _c0  |
   +--+--+
   | 1800-01-14 01:01:10.123 ICT  |
   +--+--+
   ```
   Hive master (using MiniHS2 and same orc file from Hive 1.2)
   ```
   0: jdbc:hive2://localhost:1/> select * from test_tbl_orc;
   +--+--+
   | test_tbl_orc.clusterversion  |test_tbl_orc.col2 |
   +--+--+
   | Hive 4.0 | 1800-01-14 01:01:10.123  |
   +--+--+
   1 row selected (0.23 seconds)
   0: jdbc:hive2://localhost:1/> desc test_tbl_orc;
   +-++--+
   |col_name | data_type  | comment  |
   +-++--+
   | clusterversion  | string |  |
   | col2| timestamp  |  |
   +-++--+
   2 rows selected (0.24 seconds)
   0: jdbc:hive2://localhost:1/> set hive.local.time.zone=Asia/Bangkok;
   No rows affected (0.102 seconds)
   0: jdbc:hive2://localhost:1/> select date_format(col2, "-MM-dd 
HH:mm:ss.SSS z") from test_tbl_orc;
   +--+
   | _c0  |
   +--+
   | 1800-01-14 01:01:10.123 ICT  |
   +--+
   1 row selected (0.261 seconds)
   0: jdbc:hive2://localhost:1/> set hive.local.time.zone=LOCAL;
   No rows affected (0.036 seconds)
   0: jdbc:hive2://localhost:1/> select date_format(col2, "-MM-dd 
HH:mm:ss.SSS z") from test_tbl_orc;
   +--+
   | _c0  |
   +--+
   | 1800-01-14 01:01:10.123 PST  |
   +--+
   1 row selected (0.492 seconds)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614835)
Time Spent: 8h 50m  (was: 8h 40m)

> date_format udf doesn't work for dates prior to 1900 if the timezone is 
> different from UTC
>

[jira] [Work logged] (HIVE-25283) Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup conversion fails on output mismatch after alter table

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25283?focusedWorklogId=614794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614794
 ]

ASF GitHub Bot logged work on HIVE-25283:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 00:53
Start Date: 25/Jun/21 00:53
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #2426:
URL: https://github.com/apache/hive/pull/2426


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614794)
Time Spent: 0.5h  (was: 20m)

> Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup 
> conversion fails on output mismatch after alter table
> ---
>
> Key: HIVE-25283
> URL: https://issues.apache.org/jira/browse/HIVE-25283
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25283) Schema evolution fails on output mismatch after alter table

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25283.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~scarlin]!

> Schema evolution fails on output mismatch after alter table
> ---
>
> Key: HIVE-25283
> URL: https://issues.apache.org/jira/browse/HIVE-25283
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25283) Schema evolution fails on output mismatch after alter table

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25283:
---
Summary: Schema evolution fails on output mismatch after alter table  (was: 
Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup 
conversion fails on output mismatch after alter table)

> Schema evolution fails on output mismatch after alter table
> ---
>
> Key: HIVE-25283
> URL: https://issues.apache.org/jira/browse/HIVE-25283
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25283) Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup conversion fails on output mismatch after alter table

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-25283:
--

Assignee: Steve Carlin

> Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup 
> conversion fails on output mismatch after alter table
> ---
>
> Key: HIVE-25283
> URL: https://issues.apache.org/jira/browse/HIVE-25283
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=614773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614773
 ]

ASF GitHub Bot logged work on HIVE-25026:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 00:07
Start Date: 25/Jun/21 00:07
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2189:
URL: https://github.com/apache/hive/pull/2189#issuecomment-868086290


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614773)
Time Spent: 1h 20m  (was: 1h 10m)

> hive sql result is duplicate data cause of same task resubmission
> -
>
> Key: HIVE-25026
> URL: https://issues.apache.org/jira/browse/HIVE-25026
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: hezhang
>Assignee: hezhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-25026.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This issue is the same with hive-24577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-287) support count(*) and count distinct on multiple columns

2021-06-24 Thread GillAdam (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368683#comment-17368683
 ] 

GillAdam commented on HIVE-287:
---

i had purchased a software of TSplus using the [TSplus coupon 
code|https://discountshelp.com/coupon-store/tsplus-coupon-code/] and then the 
code was not applied to my purchase so tell me what is the issue 

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614411
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:31
Start Date: 24/Jun/21 09:31
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657787170



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_partitioned_create_rewrite_agg.q
##
@@ -0,0 +1,44 @@
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+
+CREATE TABLE t1(a int, b int,c int) STORED AS ORC TBLPROPERTIES 
('transactional' = 'true');
+
+INSERT INTO t1(a, b, c) VALUES
+(1, 1, 1),
+(1, 1, 4),
+(2, 1, 2),
+(1, 2, 10),
+(2, 2, 11),
+(1, 3, 100),
+(null, 4, 200);
+
+CREATE MATERIALIZED VIEW mat1 PARTITIONED ON (a) STORED AS ORC TBLPROPERTIES 
("transactional"="true", "transactional_properties"="insert_only") AS

Review comment:
   added test for ii) since it also includes i)

##
File path: ql/src/test/results/clientpositive/llap/masking_mv_by_text_2.q.out
##
@@ -25,6 +25,7 @@ POSTHOOK: type: CREATE_MATERIALIZED_VIEW
 POSTHOOK: Input: default@masking_test_n_mv
 POSTHOOK: Output: database:default
 POSTHOOK: Output: default@masking_test_view_n_mv
+POSTHOOK: Lineage: masking_test_view_n_mv.col0 EXPRESSION 
[(masking_test_n_mv)masking_test_n_mv.FieldSchema(name:key, type:int, 
comment:null), (masking_test_n_mv)masking_test_n_mv.FieldSchema(name:value, 
type:string, comment:null), ]

Review comment:
   reverted




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614411)
Time Spent: 1h 50m  (was: 1h 40m)

> Incremental rebuild of partitioned insert only materialized views
> -
>
> Key: HIVE-25253
> URL: https://issues.apache.org/jira/browse/HIVE-25253
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25279) Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229

2021-06-24 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25279.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~pvary]!

> Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229
> 
>
> Key: HIVE-25279
> URL: https://issues.apache.org/jira/browse/HIVE-25279
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-25240 added new q.out files, HIVE-25229 modified the lineage output. 
> Both test were successful without the other, but when they were committed 
> query tests are failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25279) Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25279?focusedWorklogId=614395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614395
 ]

ASF GitHub Bot logged work on HIVE-25279:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:08
Start Date: 24/Jun/21 09:08
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2424:
URL: https://github.com/apache/hive/pull/2424


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614395)
Time Spent: 0.5h  (was: 20m)

> Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229
> 
>
> Key: HIVE-25279
> URL: https://issues.apache.org/jira/browse/HIVE-25279
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-25240 added new q.out files, HIVE-25229 modified the lineage output. 
> Both test were successful without the other, but when they were committed 
> query tests are failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614397
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:08
Start Date: 24/Jun/21 09:08
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657770038



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -353,6 +375,23 @@ private RelNode toJoinInsertIncremental(
   basePlan, mdProvider, executorProvider, 
HiveJoinInsertIncrementalRewritingRule.INSTANCE);
 }
 
+private RelNode toPartitionInsertOverwrite(
+RelNode basePlan, RelMetadataProvider mdProvider, RexExecutor 
executorProvider,
+HiveRelOptMaterialization materialization, RelNode 
calcitePreMVRewritingPlan) {
+
+  if (materialization.isSourceTablesUpdateDeleteModified()) {
+return calcitePreMVRewritingPlan;
+  }
+
+  RelOptHiveTable hiveTable = (RelOptHiveTable) 
materialization.tableRel.getTable();
+  if (!AcidUtils.isInsertOnlyTable(hiveTable.getHiveTableMD())) {
+return applyPreJoinOrderingTransforms(basePlan, mdProvider, 
executorProvider);
+  }
+
+  return toIncrementalRebuild(
+  basePlan, mdProvider, executorProvider, 
HiveAggregatePartitionIncrementalRewritingRule.INSTANCE);
+}
+
 private RelNode toIncrementalRebuild(

Review comment:
   renamed all to* methods to apply* in this class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614397)
Time Spent: 40m  (was: 0.5h)

> Incremental rebuild of partitioned insert only materialized views
> -
>
> Key: HIVE-25253
> URL: https://issues.apache.org/jira/browse/HIVE-25253
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614394
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:08
Start Date: 24/Jun/21 09:08
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657769522



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -261,41 +263,61 @@ protected RelNode 
applyMaterializedViewRewriting(RelOptPlanner planner, RelNode
 if (materialization.isSourceTablesCompacted()) {
   return calcitePreMVRewritingPlan;
 }
-// First we need to check if it is valid to convert to MERGE/INSERT 
INTO.
-// If we succeed, we modify the plan and afterwards the AST.
-// MV should be an acid table.
-MaterializedViewRewritingRelVisitor visitor = new 
MaterializedViewRewritingRelVisitor();
-visitor.go(basePlan);
-if (visitor.isRewritingAllowed()) {
-  if (materialization.isSourceTablesUpdateDeleteModified()) {
-if (visitor.isContainsAggregate()) {
-  if (visitor.getCountIndex() < 0) {
-// count(*) is necessary for determine which rows should be 
deleted from the view
-// if view definition does not have it incremental rebuild can 
not be performed, bail out
-return calcitePreMVRewritingPlan;
-  }
-  return toAggregateInsertDeleteIncremental(basePlan, mdProvider, 
executorProvider);
-} else {
-  return toJoinInsertDeleteIncremental(
-  basePlan, mdProvider, executorProvider, optCluster, 
calcitePreMVRewritingPlan);
-}
-  } else {
-// Trigger rewriting to remove UNION branch with MV
-if (visitor.isContainsAggregate()) {
-  return toAggregateInsertIncremental(basePlan, mdProvider, 
executorProvider, optCluster, calcitePreMVRewritingPlan);
-} else {
-  return toJoinInsertIncremental(basePlan, mdProvider, 
executorProvider);
-}
-  }
-} else if (materialization.isSourceTablesUpdateDeleteModified()) {
-  return calcitePreMVRewritingPlan;
+
+RelNode incrementalRebuildPlan = toIncrementalRebuild(
+basePlan, mdProvider, executorProvider, optCluster, 
calcitePreMVRewritingPlan, materialization);
+if (mvRebuildMode != 
MaterializationRebuildMode.INSERT_OVERWRITE_REBUILD) {
+  return incrementalRebuildPlan;
 }
+
+return toPartitionInsertOverwrite(
+basePlan, mdProvider, executorProvider, materialization, 
calcitePreMVRewritingPlan);
   }
 
   // Now we trigger some needed optimization rules again
   return applyPreJoinOrderingTransforms(basePlan, mdProvider, 
executorProvider);
 }
 
+private RelNode toIncrementalRebuild(
+RelNode basePlan,
+RelMetadataProvider mdProvider,
+RexExecutor executorProvider,
+RelOptCluster optCluster,
+RelNode calcitePreMVRewritingPlan,
+HiveRelOptMaterialization materialization) {
+  // First we need to check if it is valid to convert to MERGE/INSERT INTO.
+  // If we succeed, we modify the plan and afterwards the AST.
+  // MV should be an acid table.
+  MaterializedViewRewritingRelVisitor visitor = new 
MaterializedViewRewritingRelVisitor();
+  visitor.go(basePlan);
+  if (visitor.isRewritingAllowed()) {
+if (materialization.isSourceTablesUpdateDeleteModified()) {
+  if (visitor.isContainsAggregate()) {
+if (visitor.getCountIndex() < 0) {
+  // count(*) is necessary for determine which rows should be 
deleted from the view
+  // if view definition does not have it incremental rebuild can 
not be performed, bail out
+  return calcitePreMVRewritingPlan;
+}
+return toAggregateInsertDeleteIncremental(basePlan, mdProvider, 
executorProvider);
+  } else {
+return toJoinInsertDeleteIncremental(
+basePlan, mdProvider, executorProvider, optCluster, 
calcitePreMVRewritingPlan);
+  }
+} else {
+  // Trigger rewriting to remove UNION branch with MV
+  if (visitor.isContainsAggregate()) {
+return toAggregateInsertIncremental(basePlan, mdProvider, 
executorProvider, optCluster, calcitePreMVRewritingPlan);
+  } else {
+return toJoinInsertIncremental(basePlan, mdProvider, 
executorProvider);
+  }
+}
+  } else if

[jira] [Commented] (HIVE-21614) Derby does not support CLOB comparisons

2021-06-24 Thread Hank Fanchiu (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-21614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368642#comment-17368642
 ] 

Hank Fanchiu commented on HIVE-21614:
-

I've run into this issue, in an attempt to push the filtering -- for the 
Iceberg table type – to the Metastore: 
https://github.com/apache/iceberg/pull/2722.

The Iceberg tests using Derby failed for the same reason as described above: 
https://github.com/apache/iceberg/pull/2722#issuecomment-867363019.

> Derby does not support CLOB comparisons
> ---
>
> Key: HIVE-21614
> URL: https://issues.apache.org/jira/browse/HIVE-21614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.0.0
>Reporter: Vlad Rozov
>Priority: Major
>
> HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes 
> exception with Derby DB:
> {noformat}
> Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB 
> (UCS_BASIC)' are not supported. Types must be comparable. String types must 
> also have matching collation. If collation does not match, a possible 
> solution is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
> 'T1')
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown
>  Source)
>   at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown 
> Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>   at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>   ... 42 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614383
 ]

ASF GitHub Bot logged work on HIVE-25250:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:01
Start Date: 24/Jun/21 09:01
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2404:
URL: https://github.com/apache/hive/pull/2404#issuecomment-867467938


   @kgyrtkirk 
   
   Thanks for pointing to flaky check pipeline. I ran and got the green build 
there
   
   http://ci.hive.apache.org/job/hive-flaky-check/269/ 
   
   Please review and merge the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614383)
Time Spent: 40m  (was: 0.5h)

> Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
> --
>
> Key: HIVE-25250
> URL: https://issues.apache.org/jira/browse/HIVE-25250
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25242?focusedWorklogId=614414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614414
 ]

ASF GitHub Bot logged work on HIVE-25242:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:37
Start Date: 24/Jun/21 09:37
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2390:
URL: https://github.com/apache/hive/pull/2390


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614414)
Time Spent: 50m  (was: 40m)

>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
> vectorized through the vectorized adaptor.
> Queries like this one, performs very slowly because the concat is not chosen 
> to be vectorized.
> {code:java}
> select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
> between to_date('2018-12-01') and to_date('2021-03-01');  {code}
> The patch whitelists the concat udf so that it uses the vectorized adaptor in 
> chosen mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614427
 ]

ASF GitHub Bot logged work on HIVE-25250:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:59
Start Date: 24/Jun/21 09:59
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2404:
URL: https://github.com/apache/hive/pull/2404#issuecomment-867506954


   @kgyrtkirk  Could you please merge the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614427)
Time Spent: 1.5h  (was: 1h 20m)

> Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
> --
>
> Key: HIVE-25250
> URL: https://issues.apache.org/jira/browse/HIVE-25250
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614391
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:07
Start Date: 24/Jun/21 09:07
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657768898



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -261,41 +263,61 @@ protected RelNode 
applyMaterializedViewRewriting(RelOptPlanner planner, RelNode
 if (materialization.isSourceTablesCompacted()) {
   return calcitePreMVRewritingPlan;
 }
-// First we need to check if it is valid to convert to MERGE/INSERT 
INTO.
-// If we succeed, we modify the plan and afterwards the AST.
-// MV should be an acid table.
-MaterializedViewRewritingRelVisitor visitor = new 
MaterializedViewRewritingRelVisitor();
-visitor.go(basePlan);
-if (visitor.isRewritingAllowed()) {
-  if (materialization.isSourceTablesUpdateDeleteModified()) {
-if (visitor.isContainsAggregate()) {
-  if (visitor.getCountIndex() < 0) {
-// count(*) is necessary for determine which rows should be 
deleted from the view
-// if view definition does not have it incremental rebuild can 
not be performed, bail out
-return calcitePreMVRewritingPlan;
-  }
-  return toAggregateInsertDeleteIncremental(basePlan, mdProvider, 
executorProvider);
-} else {
-  return toJoinInsertDeleteIncremental(
-  basePlan, mdProvider, executorProvider, optCluster, 
calcitePreMVRewritingPlan);
-}
-  } else {
-// Trigger rewriting to remove UNION branch with MV
-if (visitor.isContainsAggregate()) {
-  return toAggregateInsertIncremental(basePlan, mdProvider, 
executorProvider, optCluster, calcitePreMVRewritingPlan);
-} else {
-  return toJoinInsertIncremental(basePlan, mdProvider, 
executorProvider);
-}
-  }
-} else if (materialization.isSourceTablesUpdateDeleteModified()) {
-  return calcitePreMVRewritingPlan;
+
+RelNode incrementalRebuildPlan = toIncrementalRebuild(

Review comment:
   renamed

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -261,41 +263,61 @@ protected RelNode 
applyMaterializedViewRewriting(RelOptPlanner planner, RelNode
 if (materialization.isSourceTablesCompacted()) {
   return calcitePreMVRewritingPlan;
 }
-// First we need to check if it is valid to convert to MERGE/INSERT 
INTO.
-// If we succeed, we modify the plan and afterwards the AST.
-// MV should be an acid table.
-MaterializedViewRewritingRelVisitor visitor = new 
MaterializedViewRewritingRelVisitor();
-visitor.go(basePlan);
-if (visitor.isRewritingAllowed()) {
-  if (materialization.isSourceTablesUpdateDeleteModified()) {
-if (visitor.isContainsAggregate()) {
-  if (visitor.getCountIndex() < 0) {
-// count(*) is necessary for determine which rows should be 
deleted from the view
-// if view definition does not have it incremental rebuild can 
not be performed, bail out
-return calcitePreMVRewritingPlan;
-  }
-  return toAggregateInsertDeleteIncremental(basePlan, mdProvider, 
executorProvider);
-} else {
-  return toJoinInsertDeleteIncremental(
-  basePlan, mdProvider, executorProvider, optCluster, 
calcitePreMVRewritingPlan);
-}
-  } else {
-// Trigger rewriting to remove UNION branch with MV
-if (visitor.isContainsAggregate()) {
-  return toAggregateInsertIncremental(basePlan, mdProvider, 
executorProvider, optCluster, calcitePreMVRewritingPlan);
-} else {
-  return toJoinInsertIncremental(basePlan, mdProvider, 
executorProvider);
-}
-  }
-} else if (materialization.isSourceTablesUpdateDeleteModified()) {
-  return calcitePreMVRewritingPlan;
+
+RelNode incrementalRebuildPlan = toIncrementalRebuild(
+basePlan, mdProvider, executorProvider, optCluster, 
calcitePreMVRewritingPlan, materialization);
+if (mvRebuildMode != 
MaterializationRebuildMode.INSERT_OVERWRITE_REBUILD) {
+  return incrementalRebuildPlan;
 }
+
+return

[jira] [Resolved] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-24 Thread Attila Magyar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-25242.
--
Resolution: Fixed

>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
> vectorized through the vectorized adaptor.
> Queries like this one, performs very slowly because the concat is not chosen 
> to be vectorized.
> {code:java}
> select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
> between to_date('2018-12-01') and to_date('2021-03-01');  {code}
> The patch whitelists the concat udf so that it uses the vectorized adaptor in 
> chosen mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25279) Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25279?focusedWorklogId=614388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614388
 ]

ASF GitHub Bot logged work on HIVE-25279:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:03
Start Date: 24/Jun/21 09:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2424:
URL: https://github.com/apache/hive/pull/2424#issuecomment-867469912


   I think we should merge this sooner than later - it will just cause 
testfailures in innocent PRs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614388)
Time Spent: 20m  (was: 10m)

> Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229
> 
>
> Key: HIVE-25279
> URL: https://issues.apache.org/jira/browse/HIVE-25279
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-25240 added new q.out files, HIVE-25229 modified the lineage output. 
> Both test were successful without the other, but when they were committed 
> query tests are failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614404
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:23
Start Date: 24/Jun/21 09:23
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657780988



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java
##
@@ -1252,4 +1252,18 @@ public static ImmutableBitSet extractRefs(Aggregate 
aggregate) {
 }
 return refs.build();
   }
+
+  public static Set findRexTableInputRefs(RexNode rexNode) {

Review comment:
   `RexUtil.gatherTableReferences` returns `Set` and not 
Set. `RelTableRef` does not contains the index of this 
InputRef in the TS schema which is required to identify if the 
`RexTableInputRef` instance refers a partition column or not. 
`RelOptHiveTable.getPartColInfoMap()` contains the indexes not the instances.
   The RexTableInputRef index is also required by 
`HiveCardinalityPreservingJoinOptimization` that is why I extarcted 
`findRexTableInputRefs`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614404)
Time Spent: 1h  (was: 50m)

> Incremental rebuild of partitioned insert only materialized views
> -
>
> Key: HIVE-25253
> URL: https://issues.apache.org/jira/browse/HIVE-25253
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614405
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:23
Start Date: 24/Jun/21 09:23
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657781122



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java
##
@@ -0,0 +1,152 @@
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Union;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexTableInputRef;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+import static 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil.findRexTableInputRefs;
+
+/**
+ * Rule to prepare the plan for incremental view maintenance if the view is 
partitioned and insert only:
+ * Insert overwrite the partitions which are affected since the last rebuild 
only and leave the
+ * rest of the partitions intact.
+ *
+ * Assume that we have a materialized view partitioned on column a and writeId 
was 1 at the last rebuild:
+ *
+ * CREATE MATERIALIZED VIEW mat1 PARTITIONED ON (a) STORED AS ORC 
TBLPROPERTIES ("transactional"="true", 
"transactional_properties"="insert_only") AS
+ * SELECT a, b, sum(c) sumc FROM t1 GROUP BY b, a;
+ *
+ * 1. Query all rows from source tables since the last rebuild.
+ * 2. Query all rows from MV which are in any of the partitions queried in 1.
+ * 3. Take the union of rows from 1. and 2. and perform the same aggregations 
defined in the MV
+ *
+ * SELECT b, sum(sumc), a FROM (

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614405)
Time Spent: 1h 10m  (was: 1h)

> Incremental rebuild of partitioned insert only materialized views
> -
>
> Key: HIVE-25253
> URL: https://issues.apache.org/jira/browse/HIVE-25253
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614408
 ]

ASF GitHub Bot logged work on HIVE-25250:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:24
Start Date: 24/Jun/21 09:24
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2404:
URL: https://github.com/apache/hive/pull/2404#issuecomment-867484416


   or the fix was in HIVE-25093? ...in case that's true I still don't see the 
connection between timestamps and the hs2 connection reset :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614408)
Time Spent: 1h  (was: 50m)

> Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
> --
>
> Key: HIVE-25250
> URL: https://issues.apache.org/jira/browse/HIVE-25250
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614407
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:24
Start Date: 24/Jun/21 09:24
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657781909



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java
##
@@ -0,0 +1,152 @@
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Union;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexTableInputRef;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+import static 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil.findRexTableInputRefs;

Review comment:
   I explained why it is needed above. Please let me know If you know an 
alternative other than you mentioned earlier.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614407)
Time Spent: 1h 20m  (was: 1h 10m)

> Incremental rebuild of partitioned insert only materialized views
> -
>
> Key: HIVE-25253
> URL: https://issues.apache.org/jira/browse/HIVE-25253
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614406
 ]

ASF GitHub Bot logged work on HIVE-25250:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:23
Start Date: 24/Jun/21 09:23
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2404:
URL: https://github.com/apache/hive/pull/2404#issuecomment-867483301


   I'm a little bit amazed that changing the default `maxRetries` fixes the 
issue - could you please give some details about it; I'm really interested :)
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614406)
Time Spent: 50m  (was: 40m)

> Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
> --
>
> Key: HIVE-25250
> URL: https://issues.apache.org/jira/browse/HIVE-25250
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614421
 ]

ASF GitHub Bot logged work on HIVE-25250:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:51
Start Date: 24/Jun/21 09:51
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma edited a comment on pull request 
#2404:
URL: https://github.com/apache/hive/pull/2404#issuecomment-867501246


   @kgyrtkirk  Thank you for the quick  reply.
   
   When I did some changes in date_format UDF (HIVE-25093) and push the PR. In 
hive-precommit test only test which was failling is 
TestHS2ImpersonationWithRemoteMS.testImpersonation. After investigation I found 
that it has no relation with UDF but the test it self was flaky. So instead of 
created an new jira and raising a new PR for fixing 
TestHS2ImpersonationWithRemoteMS.testImpersonation. I fixed it as part of 
date_format UDF(HIVE-25093). Before (HIVE-25093) get merged you created new 
jira (HIVE-25250) and marked the test ignored and merged to master. 
   
   Now coming to why test failed. In test testImpersonation it try to connect 
to miniHs2. 
   
   DriverManager.getConnection(miniHS2.getJdbcURL(), "foo", null);   
   
   Since this is a network call there is a retry in place to avoid unwanted 
network failers. So we have a variable ("maxRetries") which control the number 
of retries and default value of which was "1" and that variable can be 
overridden by passing "maxRetries={somevalue}" as part of jdbc URL. Since 
default value "maxRetries=1" is very less. So I changed it to "maxRetries=5" as 
part of (HIVE-25093). But since you have marked this ignored so I am rasing 
this PR to uncomment the test and "maxRetries=5" seems to be at higher side. So 
I am making it maxRetries=3" and uncommenting the test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614421)
Time Spent: 1h 20m  (was: 1h 10m)

> Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
> --
>
> Key: HIVE-25250
> URL: https://issues.apache.org/jira/browse/HIVE-25250
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614420
 ]

ASF GitHub Bot logged work on HIVE-25250:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:50
Start Date: 24/Jun/21 09:50
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2404:
URL: https://github.com/apache/hive/pull/2404#issuecomment-867501246


   @kgyrtkirk  Thank you reply.
   
   When I did some changes in date_format UDF (HIVE-25093) and push the PR. In 
hive-precommit test only test which was failling is 
TestHS2ImpersonationWithRemoteMS.testImpersonation. After investigation I found 
that it has no relation with UDF but the test it self was flaky. So instead of 
created an new jira and raising a new PR for fixing 
TestHS2ImpersonationWithRemoteMS.testImpersonation. I fixed it as part of 
date_format UDF(HIVE-25093). Before (HIVE-25093) get merged you created new 
jira (HIVE-25250) and marked the test ignored and merged to master. 
   
   Now coming to why test failed. In test testImpersonation it try to connect 
to miniHs2. 
   
   DriverManager.getConnection(miniHS2.getJdbcURL(), "foo", null);   
   
   Since this is a network call there is a retry in place to avoid unwanted 
network failers. So we have a variable ("maxRetries") which control the number 
of retries and default value of which was "1" and that variable can be 
overridden by passing "maxRetries={somevalue}" as part of jdbc URL. Since 
default value "maxRetries=1" is very less. So I changed it to "maxRetries=5" as 
part of (HIVE-25093). But since you have marked this ignored so I am rasing 
this PR to uncomment the test and "maxRetries=5" seems to be at higher side. So 
I am making it maxRetries=3" and uncommenting the test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614420)
Time Spent: 1h 10m  (was: 1h)

> Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
> --
>
> Key: HIVE-25250
> URL: https://issues.apache.org/jira/browse/HIVE-25250
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614398
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:10
Start Date: 24/Jun/21 09:10
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657771007



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -353,6 +375,23 @@ private RelNode toJoinInsertIncremental(
   basePlan, mdProvider, executorProvider, 
HiveJoinInsertIncrementalRewritingRule.INSTANCE);
 }
 
+private RelNode toPartitionInsertOverwrite(
+RelNode basePlan, RelMetadataProvider mdProvider, RexExecutor 
executorProvider,
+HiveRelOptMaterialization materialization, RelNode 
calcitePreMVRewritingPlan) {
+
+  if (materialization.isSourceTablesUpdateDeleteModified()) {
+return calcitePreMVRewritingPlan;

Review comment:
   added TODO to implement a version of the 
HiveAggregatePartitionIncrementalRewritingRule which can handle delete 
operations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614398)
Time Spent: 50m  (was: 40m)

> Incremental rebuild of partitioned insert only materialized views
> -
>
> Key: HIVE-25253
> URL: https://issues.apache.org/jira/browse/HIVE-25253
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614409
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:25
Start Date: 24/Jun/21 09:25
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657782155



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java
##
@@ -0,0 +1,152 @@
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Union;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexTableInputRef;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+import static 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil.findRexTableInputRefs;
+
+/**
+ * Rule to prepare the plan for incremental view maintenance if the view is 
partitioned and insert only:
+ * Insert overwrite the partitions which are affected since the last rebuild 
only and leave the
+ * rest of the partitions intact.
+ *
+ * Assume that we have a materialized view partitioned on column a and writeId 
was 1 at the last rebuild:
+ *
+ * CREATE MATERIALIZED VIEW mat1 PARTITIONED ON (a) STORED AS ORC 
TBLPROPERTIES ("transactional"="true", 
"transactional_properties"="insert_only") AS
+ * SELECT a, b, sum(c) sumc FROM t1 GROUP BY b, a;
+ *
+ * 1. Query all rows from source tables since the last rebuild.
+ * 2. Query all rows from MV which are in any of the partitions queried in 1.
+ * 3. Take the union of rows from 1. and 2. and perform the same aggregations 
defined in the MV
+ *
+ * SELECT b, sum(sumc), a FROM (
+ * SELECT b, sumc, a FROM mat1
+ * LEFT SEMI JOIN (SELECT b, sum(c), a FROM t1 WHERE ROW__ID.writeId > 1 
GROUP BY b, a) q ON (mat1.a <=> q.a)
+ * UNION ALL
+ * SELECT b, sum(c) sumc, a FROM t1 WHERE ROW__ID.writeId > 1 GROUP BY b, a
+ * ) sub
+ * GROUP BY a, b
+ */
+public class HiveAggregatePartitionIncrementalRewritingRule extends RelOptRule 
{
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveAggregatePartitionIncrementalRewritingRule.class);
+
+  public static final HiveAggregatePartitionIncrementalRewritingRule INSTANCE =
+  new HiveAggregatePartitionIncrementalRewritingRule();
+
+  private HiveAggregatePartitionIncrementalRewritingRule() {
+super(operand(Aggregate.class, operand(Union.class, any())),
+HiveRelFactories.HIVE_BUILDER, 
"HiveJoinPartitionIncrementalRewritingRule");

Review comment:
   fixed.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java
##
@@ -0,0 +1,152 @@
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF

[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614410
 ]

ASF GitHub Bot logged work on HIVE-25253:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:25
Start Date: 24/Jun/21 09:25
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2401:
URL: https://github.com/apache/hive/pull/2401#discussion_r657782662



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java
##
@@ -0,0 +1,152 @@
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Union;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexTableInputRef;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+import static 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil.findRexTableInputRefs;
+
+/**
+ * Rule to prepare the plan for incremental view maintenance if the view is 
partitioned and insert only:
+ * Insert overwrite the partitions which are affected since the last rebuild 
only and leave the
+ * rest of the partitions intact.
+ *
+ * Assume that we have a materialized view partitioned on column a and writeId 
was 1 at the last rebuild:
+ *
+ * CREATE MATERIALIZED VIEW mat1 PARTITIONED ON (a) STORED AS ORC 
TBLPROPERTIES ("transactional"="true", 
"transactional_properties"="insert_only") AS
+ * SELECT a, b, sum(c) sumc FROM t1 GROUP BY b, a;
+ *
+ * 1. Query all rows from source tables since the last rebuild.
+ * 2. Query all rows from MV which are in any of the partitions queried in 1.
+ * 3. Take the union of rows from 1. and 2. and perform the same aggregations 
defined in the MV
+ *
+ * SELECT b, sum(sumc), a FROM (
+ * SELECT b, sumc, a FROM mat1
+ * LEFT SEMI JOIN (SELECT b, sum(c), a FROM t1 WHERE ROW__ID.writeId > 1 
GROUP BY b, a) q ON (mat1.a <=> q.a)
+ * UNION ALL
+ * SELECT b, sum(c) sumc, a FROM t1 WHERE ROW__ID.writeId > 1 GROUP BY b, a
+ * ) sub
+ * GROUP BY a, b
+ */
+public class HiveAggregatePartitionIncrementalRewritingRule extends RelOptRule 
{
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveAggregatePartitionIncrementalRewritingRule.class);
+
+  public static final HiveAggregatePartitionIncrementalRewritingRule INSTANCE =
+  new HiveAggregatePartitionIncrementalRewritingRule();
+
+  private HiveAggregatePartitionIncrementalRewritingRule() {
+super(operand(Aggregate.class, operand(Union.class, any())),
+HiveRelFactories.HIVE_BUILDER, 
"HiveJoinPartitionIncrementalRewritingRule");
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+RexBuilder rexBuilder = call.builder().getRexBuilder();
+
+final Aggregate aggregate = call.rel(0);
+final Union union = call.rel(1);
+final RelNode queryBranch = union.getInput(0);
+final RelNode mvBranch = union.getInput(1);
+
+// find Partition col indexes in mvBranch top operator row schema
+// mvBranch can be more complex than just a TS on the MV and the partition 
columns indexes in the top Operator's
+

[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=614341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614341
 ]

ASF GitHub Bot logged work on HIVE-25272:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 06:31
Start Date: 24/Jun/21 06:31
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2413:
URL: https://github.com/apache/hive/pull/2413#discussion_r657662547



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -182,10 +183,20 @@ public boolean accept(Path path) {
   public static final int MAX_STATEMENTS_PER_TXN = 1;
   public static final Pattern LEGACY_BUCKET_DIGIT_PATTERN = 
Pattern.compile("^[0-9]{6}");
   public static final Pattern BUCKET_PATTERN = 
Pattern.compile("bucket_([0-9]+)(_[0-9]+)?$");
+  private static final Set READ_TXN_TOKENS = new HashSet();
 
   private static Cache dirCache;
   private static AtomicBoolean dirCacheInited = new AtomicBoolean();
 
+  static {
+READ_TXN_TOKENS.addAll(Arrays.asList(

Review comment:
   Yes, they are covered




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614341)
Time Spent: 2h 40m  (was: 2.5h)

> READ transactions are getting logged in NOTIFICATION LOG
> 
>
> Key: HIVE-25272
> URL: https://issues.apache.org/jira/browse/HIVE-25272
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> While READ transactions are already skipped from getting logged in 
> NOTIFICATION logs, few are still getting logged. Need to skip those 
> transactions as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25242?focusedWorklogId=614415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614415
 ]

ASF GitHub Bot logged work on HIVE-25242:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 09:38
Start Date: 24/Jun/21 09:38
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2390:
URL: https://github.com/apache/hive/pull/2390#issuecomment-867493230


   merged, thanks @zeroflag for the patch and @pgaref for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614415)
Time Spent: 1h  (was: 50m)

>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
> vectorized through the vectorized adaptor.
> Queries like this one, performs very slowly because the concat is not chosen 
> to be vectorized.
> {code:java}
> select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
> between to_date('2018-12-01') and to_date('2021-03-01');  {code}
> The patch whitelists the concat udf so that it uses the vectorized adaptor in 
> chosen mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

76 matches

Mail list logo