[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614598 ] ASF GitHub Bot logged work on HIVE-25276: - Author: ASF GitHub Bot Created on: 24/Jun/21 16:11 Start Date: 24/Jun/21 16:11 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2419: URL: https://github.com/apache/hive/pull/2419#discussion_r658090073 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java ## @@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws IOException { Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1)); } + @Test + public void testStatWithInsert() { +TableIdentifier identifier = TableIdentifier.of("default", "customers"); + +shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, true); +testTables.createTable(shell, identifier.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of()); + +if (testTableType != TestTables.TestTableType.HIVE_CATALOG) { + // If the location is set and we have to gather stats, then we have to update the table stats now + shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE STATISTICS FOR COLUMNS"); +} + +String insert = testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, identifier, false); +shell.executeStatement(insert); + +checkColStat(identifier.name(), "customer_id"); Review comment: Ok, thanks, just wanted to doube-check my understanding. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614598) Time Spent: 1h (was: 50m) > Enable automatic statistics generation for Iceberg tables > - > > Key: HIVE-25276 > URL: https://issues.apache.org/jira/browse/HIVE-25276 > Project: Hive > Issue Type: Improvement >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > During inserts we should have calculate the column statistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC
[ https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614638 ] ASF GitHub Bot logged work on HIVE-25268: - Author: ASF GitHub Bot Created on: 24/Jun/21 17:43 Start Date: 24/Jun/21 17:43 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #2409: URL: https://github.com/apache/hive/pull/2409#discussion_r658156023 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java ## @@ -51,18 +50,17 @@ */ @Description(name = "date_format", value = "_FUNC_(date/timestamp/string, fmt) - converts a date/timestamp/string " + "to a value of string in the format specified by the date format fmt.", -extended = "Supported formats are SimpleDateFormat formats - " -+ "https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. " +extended = "Supported formats are DateTimeFormatter formats - " ++ "https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html. " Review comment: @zabetak I think, if these 2 are the only change, then we can go ahead with DateTimeFormatter. What is your opinion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614638) Time Spent: 8h 20m (was: 8h 10m) > date_format udf doesn't work for dates prior to 1900 if the timezone is > different from UTC > -- > > Key: HIVE-25268 > URL: https://issues.apache.org/jira/browse/HIVE-25268 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > *Hive 1.2.1*: > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1400-01-14 01:00:00 ICT | > +--+--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1800-01-14 01:00:00 ICT | > +--+--+ > {code} > *Hive 3.1, Hive 4.0:* > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1400-01-06 01:17:56 ICT | > +--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1800-01-14 01:17:56 ICT | > +--+ > {code} > VM timezone is set to 'Asia/Bangkok' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25213) Implement List getTables() for existing connectors.
[ https://issues.apache.org/jira/browse/HIVE-25213?focusedWorklogId=614665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614665 ] ASF GitHub Bot logged work on HIVE-25213: - Author: ASF GitHub Bot Created on: 24/Jun/21 18:43 Start Date: 24/Jun/21 18:43 Worklog Time Spent: 10m Work Description: dantongdong commented on a change in pull request #2371: URL: https://github.com/apache/hive/pull/2371#discussion_r658196705 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/AbstractJDBCConnectorProvider.java ## @@ -192,6 +215,7 @@ protected Connection getConnection() { } table = buildTableFromColsList(tableName, cols); + table.setDbName(scoped_db); Review comment: Good catch. Will change! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614665) Time Spent: 1.5h (was: 1h 20m) > Implement List getTables() for existing connectors. > -- > > Key: HIVE-25213 > URL: https://issues.apache.org/jira/browse/HIVE-25213 > Project: Hive > Issue Type: Sub-task >Reporter: Naveen Gangam >Assignee: Dantong Dong >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > In the initial implementation, connector providers do not implement the > getTables(string pattern) spi. We had deferred it for later. Only > getTableNames() and getTable() were implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null
[ https://issues.apache.org/jira/browse/HIVE-25243?focusedWorklogId=614485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614485 ] ASF GitHub Bot logged work on HIVE-25243: - Author: ASF GitHub Bot Created on: 24/Jun/21 12:54 Start Date: 24/Jun/21 12:54 Worklog Time Spent: 10m Work Description: maheshk114 merged pull request #2391: URL: https://github.com/apache/hive/pull/2391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614485) Time Spent: 1.5h (was: 1h 20m) > Llap external client - Handle nested values when the parent struct is null > -- > > Key: HIVE-25243 > URL: https://issues.apache.org/jira/browse/HIVE-25243 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Consider the following table in text format - > {code} > +---+ > | c8 | > +---+ > | NULL | > | {"r":null,"s":null,"t":null} | > | {"r":"a","s":9,"t":2.2} | > +---+ > {code} > When we query above table via llap external client, it throws following > exception - > {code:java} > Caused by: java.lang.NullPointerException: src > at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33) > at > io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199) > at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486) > at > io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34) > at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933) > at > org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191) > at > org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135) > {code} > Created a test to repro it - > {code:java} > /** > * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while > testing LLAP external client flow. > * The aim of turning off LLAP IO is - > * when we create table through this test, LLAP caches them and returns the > same > * when we do a read query, due to this we miss some code paths which may > have been hit otherwise. > */ > public class TestMiniLlapVectorArrowWithLlapIODisabled extends > BaseJdbcWithMiniLlap { > @BeforeClass > public static void beforeTest() throws Exception { > HiveConf conf = defaultConf(); > conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true); > > conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, > true); > conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false"); > BaseJdbcWithMiniLlap.beforeTest(conf); > } > @Override > protected InputFormat getInputFormat() { > //For unit testing, no harm in hard-coding allocator ceiling to > LONG.MAX_VALUE > return new LlapArrowRowInputFormat(Long.MAX_VALUE); > } > @Test > public void testNullsInStructFields() throws Exception { > createDataTypesTable("datatypes"); > RowCollector2 rowCollector = new RowCollector2(); > // c8 struct > String query = "select c8 from datatypes"; > int rowCount = processQuery(query, 1, rowCollector); > assertEquals(3, rowCount); > } > } > {code} > Cause - As we see in the table above, first row of the table is NULL, and > correspondingly
[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG
[ https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=614500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614500 ] ASF GitHub Bot logged work on HIVE-25272: - Author: ASF GitHub Bot Created on: 24/Jun/21 13:16 Start Date: 24/Jun/21 13:16 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2413: URL: https://github.com/apache/hive/pull/2413#discussion_r657937252 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -176,6 +177,41 @@ public void testReplOperationsNotCapturedInNotificationLog() throws Throwable { assert lastEventId == currentEventId; } + @Test + public void testReadOperationsNotCapturedInNotificationLog() throws Throwable { +//Perform empty bootstrap dump and load +String dbName = testName.getMethodName(); +String replDbName = "replicated_" + testName.getMethodName(); +try { + primary.run("CREATE DATABASE " + dbName + " WITH DBPROPERTIES ( '" + + SOURCE_OF_REPLICATION + "' = '1,2,3')"); + primary.hiveConf.set("hive.txn.readonly.enabled", "true"); + primary.run("CREATE TABLE " + dbName + ".t1 (id int)"); + primary.dump(dbName); + replica.run("REPL LOAD " + dbName + " INTO " + replDbName); + //Perform empty incremental dump and load so that all db level properties are altered. + primary.dump(dbName); + replica.run("REPL LOAD " + dbName + " INTO " + replDbName); + primary.run("INSERT INTO " + dbName + ".t1 VALUES(1)"); + long lastEventId = primary.getCurrentNotificationEventId().getEventId(); + primary.run("USE " + dbName); + primary.run("DESCRIBE DATABASE " + dbName); + primary.run("DESCRIBE "+ dbName + ".t1"); + primary.run("SELECT * FROM " + dbName + ".t1"); + primary.run("SHOW TABLES " + dbName); + primary.run("SHOW TABLE EXTENDED LIKE 't1'"); + primary.run("SHOW TBLPROPERTIES t1"); + primary.run("EXPLAIN SELECT * from " + dbName + ".t1"); + primary.run("SHOW LOCKS"); + primary.run("EXPLAIN SHOW LOCKS"); Review comment: Could you please add test case for 'EXPLAIN LOCKS \' that is widely used? ``` EXPLAIN LOCKS UPDATE target SET b = 1 WHERE p IN (SELECT t.q1 FROM source t WHERE t.a1=5)' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614500) Time Spent: 3h (was: 2h 50m) > READ transactions are getting logged in NOTIFICATION LOG > > > Key: HIVE-25272 > URL: https://issues.apache.org/jira/browse/HIVE-25272 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > While READ transactions are already skipped from getting logged in > NOTIFICATION logs, few are still getting logged. Need to skip those > transactions as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=614531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614531 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 24/Jun/21 14:11 Start Date: 24/Jun/21 14:11 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1742: URL: https://github.com/apache/hive/pull/1742 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614531) Time Spent: 2h 20m (was: 2h 10m) > Upgrade Hadoop to 3.3.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614590 ] ASF GitHub Bot logged work on HIVE-25276: - Author: ASF GitHub Bot Created on: 24/Jun/21 16:04 Start Date: 24/Jun/21 16:04 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2419: URL: https://github.com/apache/hive/pull/2419#discussion_r658084643 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java ## @@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws IOException { Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1)); } + @Test + public void testStatWithInsert() { +TableIdentifier identifier = TableIdentifier.of("default", "customers"); + +shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, true); +testTables.createTable(shell, identifier.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of()); + +if (testTableType != TestTables.TestTableType.HIVE_CATALOG) { + // If the location is set and we have to gather stats, then we have to update the table stats now + shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE STATISTICS FOR COLUMNS"); +} + +String insert = testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, identifier, false); +shell.executeStatement(insert); + +checkColStat(identifier.name(), "customer_id"); Review comment: They are basically the same, and generated as well. Only in partitioned case did I went for checking the partition columns too. Just to be sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614590) Time Spent: 40m (was: 0.5h) > Enable automatic statistics generation for Iceberg tables > - > > Key: HIVE-25276 > URL: https://issues.apache.org/jira/browse/HIVE-25276 > Project: Hive > Issue Type: Improvement >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > During inserts we should have calculate the column statistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25213) Implement List getTables() for existing connectors.
[ https://issues.apache.org/jira/browse/HIVE-25213?focusedWorklogId=614663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614663 ] ASF GitHub Bot logged work on HIVE-25213: - Author: ASF GitHub Bot Created on: 24/Jun/21 18:42 Start Date: 24/Jun/21 18:42 Worklog Time Spent: 10m Work Description: dantongdong commented on a change in pull request #2371: URL: https://github.com/apache/hive/pull/2371#discussion_r658196105 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/AbstractJDBCConnectorProvider.java ## @@ -53,7 +53,6 @@ private static final String JDBC_OUTPUTFORMAT_CLASS = "org.apache.hive.storage.jdbc.JdbcOutputFormat".intern(); String type = null; // MYSQL, POSTGRES, ORACLE, DERBY, MSSQL, DB2 etc. - String driverClassName = null; Review comment: AbstractJDBCConnectorProvider Class extends AbstractDataConnectorProvider(super) Class. driverClassName is already a field in the super class. Having driverClassName=null here will overwrite the passed in driverClassName by specific provider, as AbstractJDBCConnectorProvider constructor is calling its super class constructor. This line is the root cause of "Could not find a provider for remote database" error. Removing the line will stop the overwritten problem. All the provider can set driverClassName via calling AbstractDataConnectorProvider's constructor, which is what AbstractJDBCConnectorProvider is doing for all the providers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614663) Time Spent: 1h 20m (was: 1h 10m) > Implement List getTables() for existing connectors. > -- > > Key: HIVE-25213 > URL: https://issues.apache.org/jira/browse/HIVE-25213 > Project: Hive > Issue Type: Sub-task >Reporter: Naveen Gangam >Assignee: Dantong Dong >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In the initial implementation, connector providers do not implement the > getTables(string pattern) spi. We had deferred it for later. Only > getTableNames() and getTable() were implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614615 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 24/Jun/21 16:51 Start Date: 24/Jun/21 16:51 Worklog Time Spent: 10m Work Description: coufon commented on a change in pull request #2421: URL: https://github.com/apache/hive/pull/2421#discussion_r658120606 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) { private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, boolean needRecycle) Review comment: I think it works if deleteDir fails without side effect when the dir is non-empty. But we need to make sure all filesystems that Hive supports work in this way. Also I noticed that deleteDir calls moveToTrash first, so it could be more complex: https://github.com/apache/hive/blob/f2de30ca8bc2b63887496775f9a0769057a17ee0/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreFsImpl.java#L41. Avoiding duplicated checks seems to be safer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614615) Time Spent: 1h (was: 50m) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614593 ] ASF GitHub Bot logged work on HIVE-25276: - Author: ASF GitHub Bot Created on: 24/Jun/21 16:05 Start Date: 24/Jun/21 16:05 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2419: URL: https://github.com/apache/hive/pull/2419#discussion_r658085550 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java ## @@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws IOException { Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1)); } + @Test + public void testStatWithInsert() { +TableIdentifier identifier = TableIdentifier.of("default", "customers"); + +shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, true); +testTables.createTable(shell, identifier.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of()); + +if (testTableType != TestTables.TestTableType.HIVE_CATALOG) { + // If the location is set and we have to gather stats, then we have to update the table stats now + shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE STATISTICS FOR COLUMNS"); +} + +String insert = testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, identifier, false); +shell.executeStatement(insert); + +checkColStat(identifier.name(), "customer_id"); + } + + @Test + public void testStatWithInsertOverwrite() { +TableIdentifier identifier = TableIdentifier.of("default", "customers"); + +shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, true); +testTables.createTable(shell, identifier.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, Review comment: My thought process was the following: - We need a basic test - so created unpartitioned insert - We need to test partitioned tables (what happens with the partition columns) - so created partitioned test (not sure that this is strictly needed) - We need to test IOW - so created IOW I am open to discussions either way -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614593) Time Spent: 50m (was: 40m) > Enable automatic statistics generation for Iceberg tables > - > > Key: HIVE-25276 > URL: https://issues.apache.org/jira/browse/HIVE-25276 > Project: Hive > Issue Type: Improvement >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > During inserts we should have calculate the column statistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614601 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 24/Jun/21 16:14 Start Date: 24/Jun/21 16:14 Worklog Time Spent: 10m Work Description: medb commented on a change in pull request #2421: URL: https://github.com/apache/hive/pull/2421#discussion_r658092035 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) { private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, boolean needRecycle) Review comment: I see. Did we consider just calling a non-recursive `deleteDir` on parent instead of checking if it's empty and based on the delete success/failure try to delete its parent recursively? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614601) Time Spent: 40m (was: 0.5h) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614614 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 24/Jun/21 16:49 Start Date: 24/Jun/21 16:49 Worklog Time Spent: 10m Work Description: coufon commented on a change in pull request #2421: URL: https://github.com/apache/hive/pull/2421#discussion_r658120606 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) { private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, boolean needRecycle) Review comment: I think it works if deleteDir fails when the dir is non-empty with no side effect. But we need to make sure all filesystems that Hive supports work in this way. Also I noticed that deleteDir calls moveToTrash first, so it could be more complex: https://github.com/apache/hive/blob/f2de30ca8bc2b63887496775f9a0769057a17ee0/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreFsImpl.java#L41. Avoiding duplicated checks seems to be safer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614614) Time Spent: 50m (was: 40m) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25137) getAllWriteEventInfo should go through the HMS client instead of using RawStore directly
[ https://issues.apache.org/jira/browse/HIVE-25137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369003#comment-17369003 ] Yu-Wen Lai commented on HIVE-25137: --- [~pmadhukar] There is no update on the PR for several weeks. Will you follow up the PR? If you won't, could I pick up this task? > getAllWriteEventInfo should go through the HMS client instead of using > RawStore directly > > > Key: HIVE-25137 > URL: https://issues.apache.org/jira/browse/HIVE-25137 > Project: Hive > Issue Type: Improvement >Reporter: Pratyush Madhukar >Assignee: Pratyush Madhukar >Priority: Major > > {code:java} > private List getAllWriteEventInfo(Context withinContext) > throws Exception { > String contextDbName = > StringUtils.normalizeIdentifier(withinContext.replScope.getDbName()); > RawStore rawStore = > HiveMetaStore.HMSHandler.getMSForConf(withinContext.hiveConf); > List writeEventInfoList > = rawStore.getAllWriteEventInfo(eventMessage.getTxnId(), > contextDbName, null); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC
[ https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614640 ] ASF GitHub Bot logged work on HIVE-25268: - Author: ASF GitHub Bot Created on: 24/Jun/21 17:51 Start Date: 24/Jun/21 17:51 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #2409: URL: https://github.com/apache/hive/pull/2409#discussion_r658162341 ## File path: ql/src/test/queries/clientpositive/udf_date_format.q ## @@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd HH:mm:ss.SSS z"); --julian date set hive.local.time.zone=UTC; select date_format("1001-01-05","dd---MM--"); + +--dates prior to 1900 +set hive.local.time.zone=Asia/Bangkok; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); +select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); + +set hive.local.time.zone=Europe/Berlin; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); +select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); + +set hive.local.time.zone=Africa/Johannesburg; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); Review comment: @guptanikhil007 I think, @zabetak point is what will be the interpretation of timezone for a timestamp value stored in a table. Will it be treated as UTC or local timezone? If local timezone, then will the output changes when we change the local timezone config? From the code, it seems `yes`. But, I guess, the behavior is same with `SimpleDateFormat `as well. Pls confirm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614640) Time Spent: 8.5h (was: 8h 20m) > date_format udf doesn't work for dates prior to 1900 if the timezone is > different from UTC > -- > > Key: HIVE-25268 > URL: https://issues.apache.org/jira/browse/HIVE-25268 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 8.5h > Remaining Estimate: 0h > > *Hive 1.2.1*: > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1400-01-14 01:00:00 ICT | > +--+--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1800-01-14 01:00:00 ICT | > +--+--+ > {code} > *Hive 3.1, Hive 4.0:* > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1400-01-06 01:17:56 ICT | > +--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1800-01-14 01:17:56 ICT | > +--+ > {code} > VM timezone is set to 'Asia/Bangkok' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614458 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 12:11 Start Date: 24/Jun/21 12:11 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657889399 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java ## @@ -353,6 +375,23 @@ private RelNode toJoinInsertIncremental( basePlan, mdProvider, executorProvider, HiveJoinInsertIncrementalRewritingRule.INSTANCE); } +private RelNode toPartitionInsertOverwrite( +RelNode basePlan, RelMetadataProvider mdProvider, RexExecutor executorProvider, +HiveRelOptMaterialization materialization, RelNode calcitePreMVRewritingPlan) { + + if (materialization.isSourceTablesUpdateDeleteModified()) { +return calcitePreMVRewritingPlan; + } + + RelOptHiveTable hiveTable = (RelOptHiveTable) materialization.tableRel.getTable(); + if (!AcidUtils.isInsertOnlyTable(hiveTable.getHiveTableMD())) { +return applyPreJoinOrderingTransforms(basePlan, mdProvider, executorProvider); + } + + return toIncrementalRebuild( + basePlan, mdProvider, executorProvider, HiveAggregatePartitionIncrementalRewritingRule.INSTANCE); Review comment: I experienced that currently we can handle non-aggregate cases by the record level incremental join rules. If those are not applicable we don not even have a union based plan to transform further to an incremental one. However I'm happy to include the non-aggregate variant if a use case is found. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614458) Time Spent: 2h (was: 1h 50m) > Incremental rebuild of partitioned insert only materialized views > - > > Key: HIVE-25253 > URL: https://issues.apache.org/jira/browse/HIVE-25253 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614459 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 12:11 Start Date: 24/Jun/21 12:11 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657889592 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java ## @@ -353,6 +375,23 @@ private RelNode toJoinInsertIncremental( basePlan, mdProvider, executorProvider, HiveJoinInsertIncrementalRewritingRule.INSTANCE); } +private RelNode toPartitionInsertOverwrite( +RelNode basePlan, RelMetadataProvider mdProvider, RexExecutor executorProvider, +HiveRelOptMaterialization materialization, RelNode calcitePreMVRewritingPlan) { + + if (materialization.isSourceTablesUpdateDeleteModified()) { +return calcitePreMVRewritingPlan; + } + + RelOptHiveTable hiveTable = (RelOptHiveTable) materialization.tableRel.getTable(); + if (!AcidUtils.isInsertOnlyTable(hiveTable.getHiveTableMD())) { Review comment: I tested scenarios when the view definition has aggregate functions like `avg`, `std`, `variance`. These functions are represented in the Calcite plan by a formula which input is usually `sum` and `count`. Example: ``` HiveProject(b=[$0], avgc=[/(CAST($2):DOUBLE, $3)], a=[$1]) HiveAggregate(group=[{0, 1}], agg#0=[sum($2)], agg#1=[count($2)]) HiveProject($f0=[$1], $f1=[$0], $f2=[$2]) HiveTableScan(table=[[default, t1]], table:alias=[t1]) ``` This type of plan is not converted to a Union based MV rewrite so execution doesn't even reach the incremental rewriting rules. It seems that the rules generates the Union based MV rewrite should be improved first. Adding TODO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614459) Time Spent: 2h 10m (was: 2h) > Incremental rebuild of partitioned insert only materialized views > - > > Key: HIVE-25253 > URL: https://issues.apache.org/jira/browse/HIVE-25253 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21614) Derby does not support CLOB comparisons
[ https://issues.apache.org/jira/browse/HIVE-21614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368778#comment-17368778 ] Piotr Findeisen commented on HIVE-21614: [~hankfanchiu] , in Trino i worked around this limitation by issuing filter with `LIKE` instead of `=` predicate. The code is here https://github.com/trinodb/trino/blob/d95eafe397fe4b476b2b1a73baeb7643349d4bdb/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java#L973-L981 > Derby does not support CLOB comparisons > --- > > Key: HIVE-21614 > URL: https://issues.apache.org/jira/browse/HIVE-21614 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.4, 3.0.0 >Reporter: Vlad Rozov >Priority: Major > > HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes > exception with Derby DB: > {noformat} > Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB > (UCS_BASIC)' are not supported. Types must be comparable. String types must > also have matching collation. If collation does not match, a possible > solution is to cast operands to force them to the default collation (e.g. > SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = > 'T1') > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown > Source) > at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown > Source) > at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown > Source) > at > org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown > Source) > at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown > Source) > at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown > Source) > at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source) > at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source) > at > org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown > Source) > ... 42 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG
[ https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=614498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614498 ] ASF GitHub Bot logged work on HIVE-25272: - Author: ASF GitHub Bot Created on: 24/Jun/21 13:15 Start Date: 24/Jun/21 13:15 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2413: URL: https://github.com/apache/hive/pull/2413#discussion_r657937252 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -176,6 +177,41 @@ public void testReplOperationsNotCapturedInNotificationLog() throws Throwable { assert lastEventId == currentEventId; } + @Test + public void testReadOperationsNotCapturedInNotificationLog() throws Throwable { +//Perform empty bootstrap dump and load +String dbName = testName.getMethodName(); +String replDbName = "replicated_" + testName.getMethodName(); +try { + primary.run("CREATE DATABASE " + dbName + " WITH DBPROPERTIES ( '" + + SOURCE_OF_REPLICATION + "' = '1,2,3')"); + primary.hiveConf.set("hive.txn.readonly.enabled", "true"); + primary.run("CREATE TABLE " + dbName + ".t1 (id int)"); + primary.dump(dbName); + replica.run("REPL LOAD " + dbName + " INTO " + replDbName); + //Perform empty incremental dump and load so that all db level properties are altered. + primary.dump(dbName); + replica.run("REPL LOAD " + dbName + " INTO " + replDbName); + primary.run("INSERT INTO " + dbName + ".t1 VALUES(1)"); + long lastEventId = primary.getCurrentNotificationEventId().getEventId(); + primary.run("USE " + dbName); + primary.run("DESCRIBE DATABASE " + dbName); + primary.run("DESCRIBE "+ dbName + ".t1"); + primary.run("SELECT * FROM " + dbName + ".t1"); + primary.run("SHOW TABLES " + dbName); + primary.run("SHOW TABLE EXTENDED LIKE 't1'"); + primary.run("SHOW TBLPROPERTIES t1"); + primary.run("EXPLAIN SELECT * from " + dbName + ".t1"); + primary.run("SHOW LOCKS"); + primary.run("EXPLAIN SHOW LOCKS"); Review comment: Could you please add test case for 'EXPLAIN LOCKS ' that is widely used? ``` EXPLAIN LOCKS UPDATE target SET b = 1 WHERE p IN (SELECT t.q1 FROM source t WHERE t.a1=5)' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614498) Time Spent: 2h 50m (was: 2h 40m) > READ transactions are getting logged in NOTIFICATION LOG > > > Key: HIVE-25272 > URL: https://issues.apache.org/jira/browse/HIVE-25272 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > While READ transactions are already skipped from getting logged in > NOTIFICATION logs, few are still getting logged. Need to skip those > transactions as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614550 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 24/Jun/21 15:04 Start Date: 24/Jun/21 15:04 Worklog Time Spent: 10m Work Description: medb commented on a change in pull request #2421: URL: https://github.com/apache/hive/pull/2421#discussion_r658031926 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) { private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, boolean needRecycle) Review comment: Is there a reason why it can not use [HCFS API to delete dir recursively](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#delete-org.apache.hadoop.fs.Path-boolean-)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614550) Time Spent: 20m (was: 10m) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC
[ https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614553 ] ASF GitHub Bot logged work on HIVE-25268: - Author: ASF GitHub Bot Created on: 24/Jun/21 15:13 Start Date: 24/Jun/21 15:13 Worklog Time Spent: 10m Work Description: guptanikhil007 commented on a change in pull request #2409: URL: https://github.com/apache/hive/pull/2409#discussion_r658040342 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java ## @@ -106,25 +102,18 @@ public Object evaluate(DeferredObject[] arguments) throws HiveException { return null; } -ZoneId id = (SessionState.get() == null) ? new HiveConf().getLocalTimeZone() : SessionState.get().getConf() -.getLocalTimeZone(); // the function should support both short date and full timestamp format // time part of the timestamp should not be skipped Timestamp ts = getTimestampValue(arguments, 0, tsConverters); + if (ts == null) { - Date d = getDateValue(arguments, 0, dtInputTypes, dtConverters); - if (d == null) { -return null; - } - ts = Timestamp.ofEpochMilli(d.toEpochMilli(id), id); + return null; Review comment: It works with Timestamp converter ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java ## @@ -17,31 +17,30 @@ */ package org.apache.hadoop.hive.ql.udf.generic; -import static org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping.DATE_GROUP; -import static org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP; - -import java.text.SimpleDateFormat; -import java.time.Instant; -import java.time.LocalDateTime; -import java.time.ZoneId; -import java.time.ZoneOffset; - -import org.apache.hadoop.hive.common.type.Date; import org.apache.hadoop.hive.common.type.Timestamp; +import org.apache.hadoop.hive.common.type.TimestampTZUtil; Review comment: ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614553) Time Spent: 8h (was: 7h 50m) > date_format udf doesn't work for dates prior to 1900 if the timezone is > different from UTC > -- > > Key: HIVE-25268 > URL: https://issues.apache.org/jira/browse/HIVE-25268 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 8h > Remaining Estimate: 0h > > *Hive 1.2.1*: > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1400-01-14 01:00:00 ICT | > +--+--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1800-01-14 01:00:00 ICT | > +--+--+ > {code} > *Hive 3.1, Hive 4.0:* > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1400-01-06 01:17:56 ICT | > +--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1800-01-14 01:17:56 ICT | > +--+ > {code} > VM timezone is set to 'Asia/Bangkok' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive
[ https://issues.apache.org/jira/browse/HIVE-25286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-25286: - > Set stats to inaccurate when an Iceberg table is modified outside Hive > -- > > Key: HIVE-25286 > URL: https://issues.apache.org/jira/browse/HIVE-25286 > Project: Hive > Issue Type: New Feature >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > > When an Iceberg table is modified outside of Hive then the stats should be > set to inaccurate since there is no way to ensure that the HMS stats are > updated correctly and this could cause incorrect query results. > The proposed solution is only working for HiveCatalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25285) Retire HiveProjectJoinTransposeRule
[ https://issues.apache.org/jira/browse/HIVE-25285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368783#comment-17368783 ] Zoltan Haindrich commented on HIVE-25285: - suggested by Jesus [here|https://github.com/apache/hive/pull/2423#issuecomment-867355015] we could probably also remove some other rules which were copied > Retire HiveProjectJoinTransposeRule > --- > > Key: HIVE-25285 > URL: https://issues.apache.org/jira/browse/HIVE-25285 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Priority: Major > > we don't neccessary need our own rule anymore - a plain > ProjectJoinTransposeRule could probably work -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25246) Fix the clean up of open repl created transactions
[ https://issues.apache.org/jira/browse/HIVE-25246?focusedWorklogId=614460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614460 ] ASF GitHub Bot logged work on HIVE-25246: - Author: ASF GitHub Bot Created on: 24/Jun/21 12:15 Start Date: 24/Jun/21 12:15 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2396: URL: https://github.com/apache/hive/pull/2396#discussion_r657888141 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -1122,6 +1154,49 @@ public void abortTxns(AbortTxnsRequest rqst) throws MetaException { } } + private void markDbAsReplIncompatible(Connection dbConn, String database) throws SQLException, MetaException { Review comment: We needn't to have almost a copy of updateReplId(). If you need similar code for both externalize that part. I was also wondering if we would ever need and update. ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -124,6 +124,11 @@ public int execute() { try { long loadTaskStartTime = System.currentTimeMillis(); SecurityUtils.reloginExpiringKeytabUser(); + //Don't proceed if target db is replication incompatible. + Database targetDb = getHive().getDatabase(work.dbNameToLoadIn); + if (targetDb != null && MetaStoreUtils.isDbReplIncompatible(targetDb)) { +throw new SemanticException(ErrorMsg.REPL_INCOMPATIBLE_EXCEPTION.getMsg()); Review comment: Add DB name as well ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -139,6 +142,40 @@ public void tearDown() throws Throwable { primary.run("drop database if exists " + primaryDbName + "_extra cascade"); } + @Test + public void testTargetDbReplIncompatible() throws Throwable { +HiveConf primaryConf = primary.getConf(); +TxnStore txnHandler = TxnUtils.getTxnStore(primary.getConf()); + +primary.run("use " + primaryDbName) +.run("CREATE TABLE t1(a string) STORED AS TEXTFILE") +.dump(primaryDbName); +replica.load(replicatedDbName, primaryDbName); + + assertFalse(MetaStoreUtils.isDbReplIncompatible(replica.getDatabase(replicatedDbName))); + +Long sourceTxnId = openTxns(1, txnHandler, primaryConf).get(0); +txnHandler.abortTxn(new AbortTxnRequest(sourceTxnId)); + +sourceTxnId = openTxns(1, txnHandler, primaryConf).get(0); + +primary.dump(primaryDbName); +replica.load(replicatedDbName, primaryDbName); + assertFalse(MetaStoreUtils.isDbReplIncompatible(replica.getDatabase(replicatedDbName))); + +Long targetTxnId = txnHandler.getTargetTxnId(HiveUtils.getReplPolicy(replicatedDbName), sourceTxnId); +txnHandler.abortTxn(new AbortTxnRequest(targetTxnId)); + assertTrue(MetaStoreUtils.isDbReplIncompatible(replica.getDatabase(replicatedDbName))); + +WarehouseInstance.Tuple dumpData = primary.dump(primaryDbName); + +assertFalse(ReplUtils.failedWithNonRecoverableError(new Path(dumpData.dumpLocation), conf)); +replica.loadFailure(replicatedDbName, primaryDbName); +assertTrue(ReplUtils.failedWithNonRecoverableError(new Path(dumpData.dumpLocation), conf)); + +primary.dumpFailure(primaryDbName); Review comment: Check for this : assertTrue(ReplUtils.failedWithNonRecoverableError(new Path(dumpData.dumpLocation), conf)); event after dump failure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614460) Time Spent: 2h 40m (was: 2.5h) > Fix the clean up of open repl created transactions > -- > > Key: HIVE-25246 > URL: https://issues.apache.org/jira/browse/HIVE-25246 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871 ] David Mollitor commented on HIVE-24484: --- [HADOOP-17367] is another biting issue for Hive. In 3.1.0, the the return value from {{ProxyUsers# getDefaultImpersonationProvider}} changed. In 3.1.0, the method could return a {{null} value and then it was up to the caller to create a new one and initialize it. It seems like in 3.3.1, it always returns a value, but it looks like the initialization isn't what Hive is expecting. The initialization {{refreshSuperUserGroupsConfiguration}} creates its own configuration whereas before Hive was passing in its own Configuration. > Upgrade Hadoop to 3.3.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871 ] David Mollitor edited comment on HIVE-24484 at 6/24/21, 2:05 PM: - [HADOOP-17367] is another biting issue for Hive. In 3.1.0, the the return value from {{ProxyUsers#getDefaultImpersonationProvider}} changed. In 3.1.0, the method could return a {{null}} value and then it was up to the caller to create a new one and initialize it (providing the Configuration object). It seems like in 3.3.1, it now always returns a value, but it looks like the initialization isn't what Hive is expecting. The initialization {{refreshSuperUserGroupsConfiguration}} creates its own (empty) configuration whereas before Hive was passing in its own Configuration. was (Author: belugabehr): [HADOOP-17367] is another biting issue for Hive. In 3.1.0, the the return value from {{ProxyUsers# getDefaultImpersonationProvider}} changed. In 3.1.0, the method could return a {{null}} value and then it was up to the caller to create a new one and initialize it. It seems like in 3.3.1, it always returns a value, but it looks like the initialization isn't what Hive is expecting. The initialization {{refreshSuperUserGroupsConfiguration}} creates its own configuration whereas before Hive was passing in its own Configuration. > Upgrade Hadoop to 3.3.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=614530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614530 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 24/Jun/21 14:08 Start Date: 24/Jun/21 14:08 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1742: URL: https://github.com/apache/hive/pull/1742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614530) Time Spent: 2h 10m (was: 2h) > Upgrade Hadoop to 3.3.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=614561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614561 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 24/Jun/21 15:26 Start Date: 24/Jun/21 15:26 Worklog Time Spent: 10m Work Description: ranu010101 commented on a change in pull request #2421: URL: https://github.com/apache/hive/pull/2421#discussion_r658051894 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5088,10 +5089,8 @@ private boolean isDatabaseRemote(String name) { private void deleteParentRecursive(Path parent, int depth, boolean mustPurge, boolean needRecycle) Review comment: HCFS Api deletes all children recursively while this method (deleteParentRecursive) deletes a file and keeps on deleting parent directories if they are empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614561) Time Spent: 0.5h (was: 20m) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC
[ https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614585 ] ASF GitHub Bot logged work on HIVE-25268: - Author: ASF GitHub Bot Created on: 24/Jun/21 15:50 Start Date: 24/Jun/21 15:50 Worklog Time Spent: 10m Work Description: guptanikhil007 commented on a change in pull request #2409: URL: https://github.com/apache/hive/pull/2409#discussion_r658073147 ## File path: ql/src/test/queries/clientpositive/udf_date_format.q ## @@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd HH:mm:ss.SSS z"); --julian date set hive.local.time.zone=UTC; select date_format("1001-01-05","dd---MM--"); + +--dates prior to 1900 +set hive.local.time.zone=Asia/Bangkok; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); +select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); + +set hive.local.time.zone=Europe/Berlin; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); +select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); + +set hive.local.time.zone=Africa/Johannesburg; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614585) Time Spent: 8h 10m (was: 8h) > date_format udf doesn't work for dates prior to 1900 if the timezone is > different from UTC > -- > > Key: HIVE-25268 > URL: https://issues.apache.org/jira/browse/HIVE-25268 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 8h 10m > Remaining Estimate: 0h > > *Hive 1.2.1*: > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1400-01-14 01:00:00 ICT | > +--+--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1800-01-14 01:00:00 ICT | > +--+--+ > {code} > *Hive 3.1, Hive 4.0:* > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1400-01-06 01:17:56 ICT | > +--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1800-01-14 01:17:56 ICT | > +--+ > {code} > VM timezone is set to 'Asia/Bangkok' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25278) HiveProjectJoinTransposeRule may do invalid transformations with windowing expressions
[ https://issues.apache.org/jira/browse/HIVE-25278?focusedWorklogId=614450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614450 ] ASF GitHub Bot logged work on HIVE-25278: - Author: ASF GitHub Bot Created on: 24/Jun/21 11:19 Start Date: 24/Jun/21 11:19 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #2423: URL: https://github.com/apache/hive/pull/2423#issuecomment-867554905 definetly; the two rules seemed almost identical! opened: HIVE-25285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614450) Time Spent: 0.5h (was: 20m) > HiveProjectJoinTransposeRule may do invalid transformations with windowing > expressions > --- > > Key: HIVE-25278 > URL: https://issues.apache.org/jira/browse/HIVE-25278 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > running > {code} > create table table1 (acct_num string, interest_rate decimal(10,7)) stored as > orc; > create table table2 (act_id string) stored as orc; > CREATE TABLE temp_output AS > SELECT act_nbr, row_num > FROM (SELECT t2.act_id as act_nbr, > row_number() over (PARTITION BY trim(acct_num) ORDER BY interest_rate DESC) > AS row_num > FROM table1 t1 > INNER JOIN table2 t2 > ON trim(acct_num) = t2.act_id) t > WHERE t.row_num = 1; > {code} > may result in error like: > {code} > Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 > Invalid column reference 'interest_rate': (possible column names are: > interest_rate, trim) (state=42000,code=4) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork
[ https://issues.apache.org/jira/browse/HIVE-25208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-25208. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the review [~kuczoram] and [~Marton Bod]! > Refactor Iceberg commit to the MoveTask/MoveWork > > > Key: HIVE-25208 > URL: https://issues.apache.org/jira/browse/HIVE-25208 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we > should commit in MoveWork so we are using the same flow as normal tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25080) Create metric about oldest entry in "ready for cleaning" state
[ https://issues.apache.org/jira/browse/HIVE-25080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage updated HIVE-25080: - Parent: HIVE-24824 Issue Type: Sub-task (was: Bug) > Create metric about oldest entry in "ready for cleaning" state > -- > > Key: HIVE-25080 > URL: https://issues.apache.org/jira/browse/HIVE-25080 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a compaction txn commits, COMPACTION_QUEUE.CQ_COMMIT_TIME is updated > with the current time. Then the compaction state is set to "ready for > cleaning". (... and then the Cleaner runs and the state is set to "succeeded" > hopefully) > Based on this we know (roughly) how long a compaction has been in state > "ready for cleaning". > We should create a metric similar to compaction_oldest_enqueue_age_in_sec > that would show that the cleaner is blocked by something i.e. find the > compaction in "ready for cleaning" that has the oldest commit time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones
[ https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368895#comment-17368895 ] Nikhil Gupta edited comment on HIVE-25104 at 6/24/21, 2:44 PM: --- I have a list of issues which we can club together: (These are after hive 3.1 release): # HIVE-25268: date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC # HIVE-25093: date_format() UDF is returning output in UTC time zone only (Ashish Sharma, reviewed by Adesh Rao, Nikhil Gupta, Sankar Hariappan) # HIVE-25104: Backward incompatible timestamp serialization in Parquet for certain timezones (Stamatis Zampetakis, reviewed by Jesus Camacho Rodriguez) # HIVE-24113: NPE in GenericUDFToUnixTimeStamp (#1460) (Raj Kumar Singh, reviewed by Zoltan Haindrich and Laszlo Pinter) # HIVE-24074: Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x (Jesus Camacho Rodriguez, reviewed by Prasanth Jayachandran) # HIVE-22840: Race condition in formatters of TimestampColumnVector and DateColumnVector (Shubham Chaurasia, reviewed by Jesus Camacho Rodriguez) # HIVE-22589: Add storage support for ProlepticCalendar in ORC, Parquet, and Avro (Jesus Camacho Rodriguez, reviewed by David Lavati, Lszl Bodor, Prasanth Jayachandran) # HIVE-22405: Add ColumnVector support for ProlepticCalendar (Lszl Bodor via Owen O'Malley, Jesus Camacho Rodriguez) # HIVE-22331: unix_timestamp without argument returns timestamp in millisecond instead of second (Naresh P R, reviewed Jesus Camacho Rodriguez) # HIVE-22170: from_unixtime and unix_timestamp should use user session time zone (Jesus Camacho Rodriguez, reviewed by Vineet Garg) # HIVE-21729: Arrow serializer sometimes shifts timestamp by one second (Shubham Chaurasia, reviewed by Sankar Hariappan) # HIVE-21291: Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time (Karen Coppage, reviewed by Jesus Camacho Rodriguez) was (Author: gupta.nikhil0007): I have a list of issues which we can club together: (These are after hive 3.1 release): # HIVE-25093: date_format() UDF is returning output in UTC time zone only (Ashish Sharma, reviewed by Adesh Rao, Nikhil Gupta, Sankar Hariappan) # HIVE-25104: Backward incompatible timestamp serialization in Parquet for certain timezones (Stamatis Zampetakis, reviewed by Jesus Camacho Rodriguez) # HIVE-24113: NPE in GenericUDFToUnixTimeStamp (#1460) (Raj Kumar Singh, reviewed by Zoltan Haindrich and Laszlo Pinter) # HIVE-24074: Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x (Jesus Camacho Rodriguez, reviewed by Prasanth Jayachandran) # HIVE-22840: Race condition in formatters of TimestampColumnVector and DateColumnVector (Shubham Chaurasia, reviewed by Jesus Camacho Rodriguez) # HIVE-22589: Add storage support for ProlepticCalendar in ORC, Parquet, and Avro (Jesus Camacho Rodriguez, reviewed by David Lavati, Lszl Bodor, Prasanth Jayachandran) # HIVE-22405: Add ColumnVector support for ProlepticCalendar (Lszl Bodor via Owen O'Malley, Jesus Camacho Rodriguez) # HIVE-22331: unix_timestamp without argument returns timestamp in millisecond instead of second (Naresh P R, reviewed Jesus Camacho Rodriguez) # HIVE-22170: from_unixtime and unix_timestamp should use user session time zone (Jesus Camacho Rodriguez, reviewed by Vineet Garg) # HIVE-21729: Arrow serializer sometimes shifts timestamp by one second (Shubham Chaurasia, reviewed by Sankar Hariappan) # HIVE-21291: Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time (Karen Coppage, reviewed by Jesus Camacho Rodriguez) > Backward incompatible timestamp serialization in Parquet for certain timezones > -- > > Key: HIVE-25104 > URL: https://issues.apache.org/jira/browse/HIVE-25104 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > HIVE-12192, HIVE-20007 changed the way that timestamp computations are > performed and to some extend how timestamps are serialized and deserialized > in files (Parquet, Avro). > In versions that include HIVE-12192 or HIVE-20007 the serialization in > Parquet files is not backwards compatible. In other words writing timestamps > with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them > with another (not including the previous issues) may lead to
[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC
[ https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614548 ] ASF GitHub Bot logged work on HIVE-25268: - Author: ASF GitHub Bot Created on: 24/Jun/21 14:59 Start Date: 24/Jun/21 14:59 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #2409: URL: https://github.com/apache/hive/pull/2409#discussion_r658025667 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java ## @@ -106,25 +102,18 @@ public Object evaluate(DeferredObject[] arguments) throws HiveException { return null; } -ZoneId id = (SessionState.get() == null) ? new HiveConf().getLocalTimeZone() : SessionState.get().getConf() -.getLocalTimeZone(); // the function should support both short date and full timestamp format // time part of the timestamp should not be skipped Timestamp ts = getTimestampValue(arguments, 0, tsConverters); + if (ts == null) { - Date d = getDateValue(arguments, 0, dtInputTypes, dtConverters); - if (d == null) { -return null; - } - ts = Timestamp.ofEpochMilli(d.toEpochMilli(id), id); + return null; Review comment: Why DateConverter is removed? Isn't it needed to convert input like "2021-06-24" or does it work with TimestampCoverter? ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java ## @@ -17,31 +17,30 @@ */ package org.apache.hadoop.hive.ql.udf.generic; -import static org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping.DATE_GROUP; -import static org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP; - -import java.text.SimpleDateFormat; -import java.time.Instant; -import java.time.LocalDateTime; -import java.time.ZoneId; -import java.time.ZoneOffset; - -import org.apache.hadoop.hive.common.type.Date; import org.apache.hadoop.hive.common.type.Timestamp; +import org.apache.hadoop.hive.common.type.TimestampTZUtil; Review comment: Unused class. Can be removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614548) Time Spent: 7h 50m (was: 7h 40m) > date_format udf doesn't work for dates prior to 1900 if the timezone is > different from UTC > -- > > Key: HIVE-25268 > URL: https://issues.apache.org/jira/browse/HIVE-25268 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > *Hive 1.2.1*: > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1400-01-14 01:00:00 ICT | > +--+--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1800-01-14 01:00:00 ICT | > +--+--+ > {code} > *Hive 3.1, Hive 4.0:* > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1400-01-06 01:17:56 ICT | > +--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1800-01-14 01:17:56 ICT | > +--+ > {code} > VM timezone is set to 'Asia/Bangkok' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614577 ] ASF GitHub Bot logged work on HIVE-25276: - Author: ASF GitHub Bot Created on: 24/Jun/21 15:38 Start Date: 24/Jun/21 15:38 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2419: URL: https://github.com/apache/hive/pull/2419#discussion_r658062987 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java ## @@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws IOException { Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1)); } + @Test + public void testStatWithInsert() { +TableIdentifier identifier = TableIdentifier.of("default", "customers"); + +shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, true); +testTables.createTable(shell, identifier.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of()); + +if (testTableType != TestTables.TestTableType.HIVE_CATALOG) { + // If the location is set and we have to gather stats, then we have to update the table stats now + shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE STATISTICS FOR COLUMNS"); +} + +String insert = testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, identifier, false); +shell.executeStatement(insert); + +checkColStat(identifier.name(), "customer_id"); Review comment: Are the columns stats gathered for `first_name` and `last_name` as well, we're just saving on the number of describe calls? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614577) Time Spent: 0.5h (was: 20m) > Enable automatic statistics generation for Iceberg tables > - > > Key: HIVE-25276 > URL: https://issues.apache.org/jira/browse/HIVE-25276 > Project: Hive > Issue Type: Improvement >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > During inserts we should have calculate the column statistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25284) There was an issue in the software
[ https://issues.apache.org/jira/browse/HIVE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-25284. --- Resolution: Information Provided Apache Hive project developers can not help with TSplus purchases > There was an issue in the software > --- > > Key: HIVE-25284 > URL: https://issues.apache.org/jira/browse/HIVE-25284 > Project: Hive > Issue Type: Bug > Components: Security >Reporter: GillAdam >Priority: Minor > > i had purchased a software of TSplus using the [TSplus coupon > code|https://discountshelp.com/coupon-store/tsplus-coupon-code/] and then the > code was not applied to my purchase so tell me what is the issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
[ https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614462 ] ASF GitHub Bot logged work on HIVE-25250: - Author: ASF GitHub Bot Created on: 24/Jun/21 12:24 Start Date: 24/Jun/21 12:24 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #2404: URL: https://github.com/apache/hive/pull/2404#issuecomment-867594564 you have modified the default value of retries inside `jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java` in an unrelated changeset - and this changeset follows that pattern; I think going from 5 to 3 would be okayish but going from 1 to anything else is not the same - because you will enable the retries by default. I don't think we should change things like that...that change should have done in a separate patch - because it have touched production code; and it changed the default behaviour as well. It seems like `maxRetries` is not documented here https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients I think the correct approach is to set it back to 1 and change the retry number for the tests thru the jdbc url. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614462) Time Spent: 1h 40m (was: 1.5h) > Fix TestHS2ImpersonationWithRemoteMS.testImpersonation > -- > > Key: HIVE-25250 > URL: https://issues.apache.org/jira/browse/HIVE-25250 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23456) Upgrade Calcite version to 1.25.0
[ https://issues.apache.org/jira/browse/HIVE-23456?focusedWorklogId=614541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614541 ] ASF GitHub Bot logged work on HIVE-23456: - Author: ASF GitHub Bot Created on: 24/Jun/21 14:32 Start Date: 24/Jun/21 14:32 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #2203: URL: https://github.com/apache/hive/pull/2203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614541) Time Spent: 40m (was: 0.5h) > Upgrade Calcite version to 1.25.0 > - > > Key: HIVE-23456 > URL: https://issues.apache.org/jira/browse/HIVE-23456 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23456.01.patch, HIVE-23456.02.patch > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24484) Upgrade Hadoop to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871 ] David Mollitor edited comment on HIVE-24484 at 6/24/21, 1:55 PM: - [HADOOP-17367] is another biting issue for Hive. In 3.1.0, the the return value from {{ProxyUsers# getDefaultImpersonationProvider}} changed. In 3.1.0, the method could return a {{null}} value and then it was up to the caller to create a new one and initialize it. It seems like in 3.3.1, it always returns a value, but it looks like the initialization isn't what Hive is expecting. The initialization {{refreshSuperUserGroupsConfiguration}} creates its own configuration whereas before Hive was passing in its own Configuration. was (Author: belugabehr): [HADOOP-17367] is another biting issue for Hive. In 3.1.0, the the return value from {{ProxyUsers# getDefaultImpersonationProvider}} changed. In 3.1.0, the method could return a {{null} value and then it was up to the caller to create a new one and initialize it. It seems like in 3.3.1, it always returns a value, but it looks like the initialization isn't what Hive is expecting. The initialization {{refreshSuperUserGroupsConfiguration}} creates its own configuration whereas before Hive was passing in its own Configuration. > Upgrade Hadoop to 3.3.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23456) Upgrade Calcite version to 1.25.0
[ https://issues.apache.org/jira/browse/HIVE-23456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-23456: --- Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master, thanks [~soumyakanti.das], [~zabetak]! > Upgrade Calcite version to 1.25.0 > - > > Key: HIVE-23456 > URL: https://issues.apache.org/jira/browse/HIVE-23456 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23456.01.patch, HIVE-23456.02.patch > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-22254) Mappings.NoElementException: no target in mapping, in `MaterializedViewAggregateRule
[ https://issues.apache.org/jira/browse/HIVE-22254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-22254. Fix Version/s: 4.0.0 Resolution: Fixed > Mappings.NoElementException: no target in mapping, in > `MaterializedViewAggregateRule > > > Key: HIVE-22254 > URL: https://issues.apache.org/jira/browse/HIVE-22254 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Minor > Fix For: 4.0.0 > > Attachments: ojoin_full.sql > > > A Mappings.NoElementException happens on an edge condition for a query using > a materialized view. > The query contains a "group by" clause which contains fields from both sides > of a join. There is no real reason to group by this same field twice, but > there is also no reason that this shouldn't succeed. > Attached is a script which causes this failure. The query causing the > problem looks like this: > explain extended select sum(1) > from fact inner join dim1 > on fact.f1 = dim1.pk1 > group by f1, pk1; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones
[ https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368895#comment-17368895 ] Nikhil Gupta commented on HIVE-25104: - I have a list of issues which we can club together: (These are after hive 3.1 release): # HIVE-25093: date_format() UDF is returning output in UTC time zone only (Ashish Sharma, reviewed by Adesh Rao, Nikhil Gupta, Sankar Hariappan) # HIVE-25104: Backward incompatible timestamp serialization in Parquet for certain timezones (Stamatis Zampetakis, reviewed by Jesus Camacho Rodriguez) # HIVE-24113: NPE in GenericUDFToUnixTimeStamp (#1460) (Raj Kumar Singh, reviewed by Zoltan Haindrich and Laszlo Pinter) # HIVE-24074: Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x (Jesus Camacho Rodriguez, reviewed by Prasanth Jayachandran) # HIVE-22840: Race condition in formatters of TimestampColumnVector and DateColumnVector (Shubham Chaurasia, reviewed by Jesus Camacho Rodriguez) # HIVE-22589: Add storage support for ProlepticCalendar in ORC, Parquet, and Avro (Jesus Camacho Rodriguez, reviewed by David Lavati, Lszl Bodor, Prasanth Jayachandran) # HIVE-22405: Add ColumnVector support for ProlepticCalendar (Lszl Bodor via Owen O'Malley, Jesus Camacho Rodriguez) # HIVE-22331: unix_timestamp without argument returns timestamp in millisecond instead of second (Naresh P R, reviewed Jesus Camacho Rodriguez) # HIVE-22170: from_unixtime and unix_timestamp should use user session time zone (Jesus Camacho Rodriguez, reviewed by Vineet Garg) # HIVE-21729: Arrow serializer sometimes shifts timestamp by one second (Shubham Chaurasia, reviewed by Sankar Hariappan) # HIVE-21291: Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time (Karen Coppage, reviewed by Jesus Camacho Rodriguez) > Backward incompatible timestamp serialization in Parquet for certain timezones > -- > > Key: HIVE-25104 > URL: https://issues.apache.org/jira/browse/HIVE-25104 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > HIVE-12192, HIVE-20007 changed the way that timestamp computations are > performed and to some extend how timestamps are serialized and deserialized > in files (Parquet, Avro). > In versions that include HIVE-12192 or HIVE-20007 the serialization in > Parquet files is not backwards compatible. In other words writing timestamps > with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them > with another (not including the previous issues) may lead to different > results depending on the default timezone of the system. > Consider the following scenario where the default system timezone is set to > US/Pacific. > At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET > LOCATION '/tmp/hiveexttbl/employee'; > INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); > INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); > INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); > SELECT * FROM employee; > {code} > |1|1880-01-01 00:00:00| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET > LOCATION '/tmp/hiveexttbl/employee'; > SELECT * FROM employee; > {code} > |1|1879-12-31 23:52:58| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614574 ] ASF GitHub Bot logged work on HIVE-25276: - Author: ASF GitHub Bot Created on: 24/Jun/21 15:36 Start Date: 24/Jun/21 15:36 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2419: URL: https://github.com/apache/hive/pull/2419#discussion_r658061851 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java ## @@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws IOException { Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1)); } + @Test + public void testStatWithInsert() { +TableIdentifier identifier = TableIdentifier.of("default", "customers"); + +shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, true); +testTables.createTable(shell, identifier.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of()); + +if (testTableType != TestTables.TestTableType.HIVE_CATALOG) { + // If the location is set and we have to gather stats, then we have to update the table stats now + shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE STATISTICS FOR COLUMNS"); +} + +String insert = testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, identifier, false); +shell.executeStatement(insert); + +checkColStat(identifier.name(), "customer_id"); + } + + @Test + public void testStatWithInsertOverwrite() { +TableIdentifier identifier = TableIdentifier.of("default", "customers"); + +shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, true); +testTables.createTable(shell, identifier.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, Review comment: If we have test cases for unpartitioned insert, partitioned insert, unpartitioned IOW, should we have a test case for partitioned IOW as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614574) Time Spent: 20m (was: 10m) > Enable automatic statistics generation for Iceberg tables > - > > Key: HIVE-25276 > URL: https://issues.apache.org/jira/browse/HIVE-25276 > Project: Hive > Issue Type: Improvement >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > During inserts we should have calculate the column statistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive
[ https://issues.apache.org/jira/browse/HIVE-25286?focusedWorklogId=614575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614575 ] ASF GitHub Bot logged work on HIVE-25286: - Author: ASF GitHub Bot Created on: 24/Jun/21 15:36 Start Date: 24/Jun/21 15:36 Worklog Time Spent: 10m Work Description: pvary opened a new pull request #2427: URL: https://github.com/apache/hive/pull/2427 ### What changes were proposed in this pull request? - Introduce a new configuration property: `iceberg.hive.keep.stats`. If this property is set then we keep the statistics at a new Iceberg commit. Otherwise we invalidate the stats as we can not make sure that they are correct. - Fix a NullPointerExeption in HiveIcebergMetaHook which happens when we are storing stats. - Adds a new unit tests, and enhances the check that the stat values are accurate Also contains HIVE-25276 as a base. ### Why are the changes needed? When someone modifies the Iceberg table outside of Hive then we should make sure that the stats are invalid ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Extra unit test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614575) Remaining Estimate: 0h Time Spent: 10m > Set stats to inaccurate when an Iceberg table is modified outside Hive > -- > > Key: HIVE-25286 > URL: https://issues.apache.org/jira/browse/HIVE-25286 > Project: Hive > Issue Type: New Feature >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When an Iceberg table is modified outside of Hive then the stats should be > set to inaccurate since there is no way to ensure that the HMS stats are > updated correctly and this could cause incorrect query results. > The proposed solution is only working for HiveCatalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25286) Set stats to inaccurate when an Iceberg table is modified outside Hive
[ https://issues.apache.org/jira/browse/HIVE-25286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25286: -- Labels: pull-request-available (was: ) > Set stats to inaccurate when an Iceberg table is modified outside Hive > -- > > Key: HIVE-25286 > URL: https://issues.apache.org/jira/browse/HIVE-25286 > Project: Hive > Issue Type: New Feature >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When an Iceberg table is modified outside of Hive then the stats should be > set to inaccurate since there is no way to ensure that the HMS stats are > updated correctly and this could cause incorrect query results. > The proposed solution is only working for HiveCatalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC
[ https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614833 ] ASF GitHub Bot logged work on HIVE-25268: - Author: ASF GitHub Bot Created on: 25/Jun/21 04:32 Start Date: 25/Jun/21 04:32 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #2409: URL: https://github.com/apache/hive/pull/2409#discussion_r658156023 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateFormat.java ## @@ -51,18 +50,17 @@ */ @Description(name = "date_format", value = "_FUNC_(date/timestamp/string, fmt) - converts a date/timestamp/string " + "to a value of string in the format specified by the date format fmt.", -extended = "Supported formats are SimpleDateFormat formats - " -+ "https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. " +extended = "Supported formats are DateTimeFormatter formats - " ++ "https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html. " Review comment: @zabetak I think, if these 2 are the only difference, then we can go ahead with DateTimeFormatter. What is your opinion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614833) Time Spent: 8h 40m (was: 8.5h) > date_format udf doesn't work for dates prior to 1900 if the timezone is > different from UTC > -- > > Key: HIVE-25268 > URL: https://issues.apache.org/jira/browse/HIVE-25268 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > *Hive 1.2.1*: > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1400-01-14 01:00:00 ICT | > +--+--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+--+ > | _c0| > +--+--+ > | 1800-01-14 01:00:00 ICT | > +--+--+ > {code} > *Hive 3.1, Hive 4.0:* > {code:java} > select date_format('1400-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1400-01-06 01:17:56 ICT | > +--+ > select date_format('1800-01-14 01:00:00', '-MM-dd HH:mm:ss z'); > +--+ > | _c0| > +--+ > | 1800-01-14 01:17:56 ICT | > +--+ > {code} > VM timezone is set to 'Asia/Bangkok' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25268) date_format udf doesn't work for dates prior to 1900 if the timezone is different from UTC
[ https://issues.apache.org/jira/browse/HIVE-25268?focusedWorklogId=614835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614835 ] ASF GitHub Bot logged work on HIVE-25268: - Author: ASF GitHub Bot Created on: 25/Jun/21 05:05 Start Date: 25/Jun/21 05:05 Worklog Time Spent: 10m Work Description: guptanikhil007 commented on a change in pull request #2409: URL: https://github.com/apache/hive/pull/2409#discussion_r658476821 ## File path: ql/src/test/queries/clientpositive/udf_date_format.q ## @@ -78,3 +78,16 @@ select date_format("2015-04-08 10:30:45","-MM-dd HH:mm:ss.SSS z"); --julian date set hive.local.time.zone=UTC; select date_format("1001-01-05","dd---MM--"); + +--dates prior to 1900 +set hive.local.time.zone=Asia/Bangkok; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); +select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); + +set hive.local.time.zone=Europe/Berlin; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); +select date_format('1800-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); + +set hive.local.time.zone=Africa/Johannesburg; +select date_format('1400-01-14 01:01:10.123', '-MM-dd HH:mm:ss.SSS z'); Review comment: All the timestamps are considered to be in local time zone in both Hive 1.2 and Hive master (patched). Both Hive 1.2 and 3.1 have similar results for the date_format function. E.g.: Hive 1.2.1 (with timezone set to Asia/Bangkok) ``` 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select * from test_tbl_orc; +--+--+--+ | test_tbl_orc.clusterversion |test_tbl_orc.col2 | +--+--+--+ | Hive 4.0 | 1800-01-14 01:01:10.123 | +--+--+--+ 1 row selected (0.372 seconds) 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> desc test_tbl_orc; +-++--+--+ |col_name | data_type | comment | +-++--+--+ | clusterversion | string | | | col2| timestamp | | +-++--+--+ 2 rows selected (0.65 seconds) 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select date_format(col2, "-MM-dd HH:mm:ss.SSS z") from test_tbl_orc; +--+--+ | _c0 | +--+--+ | 1800-01-14 01:01:10.123 ICT | +--+--+ ``` Hive master (using MiniHS2 and same orc file from Hive 1.2) ``` 0: jdbc:hive2://localhost:1/> select * from test_tbl_orc; +--+--+ | test_tbl_orc.clusterversion |test_tbl_orc.col2 | +--+--+ | Hive 4.0 | 1800-01-14 01:01:10.123 | +--+--+ 1 row selected (0.23 seconds) 0: jdbc:hive2://localhost:1/> desc test_tbl_orc; +-++--+ |col_name | data_type | comment | +-++--+ | clusterversion | string | | | col2| timestamp | | +-++--+ 2 rows selected (0.24 seconds) 0: jdbc:hive2://localhost:1/> set hive.local.time.zone=Asia/Bangkok; No rows affected (0.102 seconds) 0: jdbc:hive2://localhost:1/> select date_format(col2, "-MM-dd HH:mm:ss.SSS z") from test_tbl_orc; +--+ | _c0 | +--+ | 1800-01-14 01:01:10.123 ICT | +--+ 1 row selected (0.261 seconds) 0: jdbc:hive2://localhost:1/> set hive.local.time.zone=LOCAL; No rows affected (0.036 seconds) 0: jdbc:hive2://localhost:1/> select date_format(col2, "-MM-dd HH:mm:ss.SSS z") from test_tbl_orc; +--+ | _c0 | +--+ | 1800-01-14 01:01:10.123 PST | +--+ 1 row selected (0.492 seconds) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614835) Time Spent: 8h 50m (was: 8h 40m) > date_format udf doesn't work for dates prior to 1900 if the timezone is > different from UTC >
[jira] [Work logged] (HIVE-25283) Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup conversion fails on output mismatch after alter table
[ https://issues.apache.org/jira/browse/HIVE-25283?focusedWorklogId=614794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614794 ] ASF GitHub Bot logged work on HIVE-25283: - Author: ASF GitHub Bot Created on: 25/Jun/21 00:53 Start Date: 25/Jun/21 00:53 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #2426: URL: https://github.com/apache/hive/pull/2426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614794) Time Spent: 0.5h (was: 20m) > Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup > conversion fails on output mismatch after alter table > --- > > Key: HIVE-25283 > URL: https://issues.apache.org/jira/browse/HIVE-25283 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25283) Schema evolution fails on output mismatch after alter table
[ https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-25283. Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master, thanks [~scarlin]! > Schema evolution fails on output mismatch after alter table > --- > > Key: HIVE-25283 > URL: https://issues.apache.org/jira/browse/HIVE-25283 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25283) Schema evolution fails on output mismatch after alter table
[ https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-25283: --- Summary: Schema evolution fails on output mismatch after alter table (was: Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup conversion fails on output mismatch after alter table) > Schema evolution fails on output mismatch after alter table > --- > > Key: HIVE-25283 > URL: https://issues.apache.org/jira/browse/HIVE-25283 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25283) Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup conversion fails on output mismatch after alter table
[ https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-25283: -- Assignee: Steve Carlin > Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup > conversion fails on output mismatch after alter table > --- > > Key: HIVE-25283 > URL: https://issues.apache.org/jira/browse/HIVE-25283 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=614773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614773 ] ASF GitHub Bot logged work on HIVE-25026: - Author: ASF GitHub Bot Created on: 25/Jun/21 00:07 Start Date: 25/Jun/21 00:07 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2189: URL: https://github.com/apache/hive/pull/2189#issuecomment-868086290 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614773) Time Spent: 1h 20m (was: 1h 10m) > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-25026.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > This issue is the same with hive-24577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-287) support count(*) and count distinct on multiple columns
[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368683#comment-17368683 ] GillAdam commented on HIVE-287: --- i had purchased a software of TSplus using the [TSplus coupon code|https://discountshelp.com/coupon-store/tsplus-coupon-code/] and then the code was not applied to my purchase so tell me what is the issue > support count(*) and count distinct on multiple columns > --- > > Key: HIVE-287 > URL: https://issues.apache.org/jira/browse/HIVE-287 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Namit Jain >Assignee: Arvind Prabhakar >Priority: Major > Fix For: 0.6.0 > > Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, > HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, > HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch > > > The following query does not work: > select count(distinct col1, col2) from Tbl -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614411 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:31 Start Date: 24/Jun/21 09:31 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657787170 ## File path: ql/src/test/queries/clientpositive/materialized_view_partitioned_create_rewrite_agg.q ## @@ -0,0 +1,44 @@ +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; + +CREATE TABLE t1(a int, b int,c int) STORED AS ORC TBLPROPERTIES ('transactional' = 'true'); + +INSERT INTO t1(a, b, c) VALUES +(1, 1, 1), +(1, 1, 4), +(2, 1, 2), +(1, 2, 10), +(2, 2, 11), +(1, 3, 100), +(null, 4, 200); + +CREATE MATERIALIZED VIEW mat1 PARTITIONED ON (a) STORED AS ORC TBLPROPERTIES ("transactional"="true", "transactional_properties"="insert_only") AS Review comment: added test for ii) since it also includes i) ## File path: ql/src/test/results/clientpositive/llap/masking_mv_by_text_2.q.out ## @@ -25,6 +25,7 @@ POSTHOOK: type: CREATE_MATERIALIZED_VIEW POSTHOOK: Input: default@masking_test_n_mv POSTHOOK: Output: database:default POSTHOOK: Output: default@masking_test_view_n_mv +POSTHOOK: Lineage: masking_test_view_n_mv.col0 EXPRESSION [(masking_test_n_mv)masking_test_n_mv.FieldSchema(name:key, type:int, comment:null), (masking_test_n_mv)masking_test_n_mv.FieldSchema(name:value, type:string, comment:null), ] Review comment: reverted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614411) Time Spent: 1h 50m (was: 1h 40m) > Incremental rebuild of partitioned insert only materialized views > - > > Key: HIVE-25253 > URL: https://issues.apache.org/jira/browse/HIVE-25253 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25279) Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229
[ https://issues.apache.org/jira/browse/HIVE-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25279. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you [~pvary]! > Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229 > > > Key: HIVE-25279 > URL: https://issues.apache.org/jira/browse/HIVE-25279 > Project: Hive > Issue Type: Test > Components: Test >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-25240 added new q.out files, HIVE-25229 modified the lineage output. > Both test were successful without the other, but when they were committed > query tests are failing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25279) Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229
[ https://issues.apache.org/jira/browse/HIVE-25279?focusedWorklogId=614395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614395 ] ASF GitHub Bot logged work on HIVE-25279: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:08 Start Date: 24/Jun/21 09:08 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #2424: URL: https://github.com/apache/hive/pull/2424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614395) Time Spent: 0.5h (was: 20m) > Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229 > > > Key: HIVE-25279 > URL: https://issues.apache.org/jira/browse/HIVE-25279 > Project: Hive > Issue Type: Test > Components: Test >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-25240 added new q.out files, HIVE-25229 modified the lineage output. > Both test were successful without the other, but when they were committed > query tests are failing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614397 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:08 Start Date: 24/Jun/21 09:08 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657770038 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java ## @@ -353,6 +375,23 @@ private RelNode toJoinInsertIncremental( basePlan, mdProvider, executorProvider, HiveJoinInsertIncrementalRewritingRule.INSTANCE); } +private RelNode toPartitionInsertOverwrite( +RelNode basePlan, RelMetadataProvider mdProvider, RexExecutor executorProvider, +HiveRelOptMaterialization materialization, RelNode calcitePreMVRewritingPlan) { + + if (materialization.isSourceTablesUpdateDeleteModified()) { +return calcitePreMVRewritingPlan; + } + + RelOptHiveTable hiveTable = (RelOptHiveTable) materialization.tableRel.getTable(); + if (!AcidUtils.isInsertOnlyTable(hiveTable.getHiveTableMD())) { +return applyPreJoinOrderingTransforms(basePlan, mdProvider, executorProvider); + } + + return toIncrementalRebuild( + basePlan, mdProvider, executorProvider, HiveAggregatePartitionIncrementalRewritingRule.INSTANCE); +} + private RelNode toIncrementalRebuild( Review comment: renamed all to* methods to apply* in this class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614397) Time Spent: 40m (was: 0.5h) > Incremental rebuild of partitioned insert only materialized views > - > > Key: HIVE-25253 > URL: https://issues.apache.org/jira/browse/HIVE-25253 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614394 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:08 Start Date: 24/Jun/21 09:08 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657769522 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java ## @@ -261,41 +263,61 @@ protected RelNode applyMaterializedViewRewriting(RelOptPlanner planner, RelNode if (materialization.isSourceTablesCompacted()) { return calcitePreMVRewritingPlan; } -// First we need to check if it is valid to convert to MERGE/INSERT INTO. -// If we succeed, we modify the plan and afterwards the AST. -// MV should be an acid table. -MaterializedViewRewritingRelVisitor visitor = new MaterializedViewRewritingRelVisitor(); -visitor.go(basePlan); -if (visitor.isRewritingAllowed()) { - if (materialization.isSourceTablesUpdateDeleteModified()) { -if (visitor.isContainsAggregate()) { - if (visitor.getCountIndex() < 0) { -// count(*) is necessary for determine which rows should be deleted from the view -// if view definition does not have it incremental rebuild can not be performed, bail out -return calcitePreMVRewritingPlan; - } - return toAggregateInsertDeleteIncremental(basePlan, mdProvider, executorProvider); -} else { - return toJoinInsertDeleteIncremental( - basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan); -} - } else { -// Trigger rewriting to remove UNION branch with MV -if (visitor.isContainsAggregate()) { - return toAggregateInsertIncremental(basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan); -} else { - return toJoinInsertIncremental(basePlan, mdProvider, executorProvider); -} - } -} else if (materialization.isSourceTablesUpdateDeleteModified()) { - return calcitePreMVRewritingPlan; + +RelNode incrementalRebuildPlan = toIncrementalRebuild( +basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan, materialization); +if (mvRebuildMode != MaterializationRebuildMode.INSERT_OVERWRITE_REBUILD) { + return incrementalRebuildPlan; } + +return toPartitionInsertOverwrite( +basePlan, mdProvider, executorProvider, materialization, calcitePreMVRewritingPlan); } // Now we trigger some needed optimization rules again return applyPreJoinOrderingTransforms(basePlan, mdProvider, executorProvider); } +private RelNode toIncrementalRebuild( +RelNode basePlan, +RelMetadataProvider mdProvider, +RexExecutor executorProvider, +RelOptCluster optCluster, +RelNode calcitePreMVRewritingPlan, +HiveRelOptMaterialization materialization) { + // First we need to check if it is valid to convert to MERGE/INSERT INTO. + // If we succeed, we modify the plan and afterwards the AST. + // MV should be an acid table. + MaterializedViewRewritingRelVisitor visitor = new MaterializedViewRewritingRelVisitor(); + visitor.go(basePlan); + if (visitor.isRewritingAllowed()) { +if (materialization.isSourceTablesUpdateDeleteModified()) { + if (visitor.isContainsAggregate()) { +if (visitor.getCountIndex() < 0) { + // count(*) is necessary for determine which rows should be deleted from the view + // if view definition does not have it incremental rebuild can not be performed, bail out + return calcitePreMVRewritingPlan; +} +return toAggregateInsertDeleteIncremental(basePlan, mdProvider, executorProvider); + } else { +return toJoinInsertDeleteIncremental( +basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan); + } +} else { + // Trigger rewriting to remove UNION branch with MV + if (visitor.isContainsAggregate()) { +return toAggregateInsertIncremental(basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan); + } else { +return toJoinInsertIncremental(basePlan, mdProvider, executorProvider); + } +} + } else if
[jira] [Commented] (HIVE-21614) Derby does not support CLOB comparisons
[ https://issues.apache.org/jira/browse/HIVE-21614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368642#comment-17368642 ] Hank Fanchiu commented on HIVE-21614: - I've run into this issue, in an attempt to push the filtering -- for the Iceberg table type – to the Metastore: https://github.com/apache/iceberg/pull/2722. The Iceberg tests using Derby failed for the same reason as described above: https://github.com/apache/iceberg/pull/2722#issuecomment-867363019. > Derby does not support CLOB comparisons > --- > > Key: HIVE-21614 > URL: https://issues.apache.org/jira/browse/HIVE-21614 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.4, 3.0.0 >Reporter: Vlad Rozov >Priority: Major > > HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes > exception with Derby DB: > {noformat} > Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB > (UCS_BASIC)' are not supported. Types must be comparable. String types must > also have matching collation. If collation does not match, a possible > solution is to cast operands to force them to the default collation (e.g. > SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = > 'T1') > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown > Source) > at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown > Source) > at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown > Source) > at > org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown > Source) > at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown > Source) > at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown > Source) > at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source) > at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source) > at > org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown > Source) > ... 42 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
[ https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614383 ] ASF GitHub Bot logged work on HIVE-25250: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:01 Start Date: 24/Jun/21 09:01 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on pull request #2404: URL: https://github.com/apache/hive/pull/2404#issuecomment-867467938 @kgyrtkirk Thanks for pointing to flaky check pipeline. I ran and got the green build there http://ci.hive.apache.org/job/hive-flaky-check/269/ Please review and merge the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614383) Time Spent: 40m (was: 0.5h) > Fix TestHS2ImpersonationWithRemoteMS.testImpersonation > -- > > Key: HIVE-25250 > URL: https://issues.apache.org/jira/browse/HIVE-25250 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
[ https://issues.apache.org/jira/browse/HIVE-25242?focusedWorklogId=614414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614414 ] ASF GitHub Bot logged work on HIVE-25242: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:37 Start Date: 24/Jun/21 09:37 Worklog Time Spent: 10m Work Description: abstractdog merged pull request #2390: URL: https://github.com/apache/hive/pull/2390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614414) Time Spent: 50m (was: 40m) > Query performs extremely slow with hive.vectorized.adaptor.usage.mode = > chosen > --- > > Key: HIVE-25242 > URL: https://issues.apache.org/jira/browse/HIVE-25242 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are > vectorized through the vectorized adaptor. > Queries like this one, performs very slowly because the concat is not chosen > to be vectorized. > {code:java} > select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) > between to_date('2018-12-01') and to_date('2021-03-01'); {code} > The patch whitelists the concat udf so that it uses the vectorized adaptor in > chosen mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
[ https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614427 ] ASF GitHub Bot logged work on HIVE-25250: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:59 Start Date: 24/Jun/21 09:59 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on pull request #2404: URL: https://github.com/apache/hive/pull/2404#issuecomment-867506954 @kgyrtkirk Could you please merge the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614427) Time Spent: 1.5h (was: 1h 20m) > Fix TestHS2ImpersonationWithRemoteMS.testImpersonation > -- > > Key: HIVE-25250 > URL: https://issues.apache.org/jira/browse/HIVE-25250 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614391 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:07 Start Date: 24/Jun/21 09:07 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657768898 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java ## @@ -261,41 +263,61 @@ protected RelNode applyMaterializedViewRewriting(RelOptPlanner planner, RelNode if (materialization.isSourceTablesCompacted()) { return calcitePreMVRewritingPlan; } -// First we need to check if it is valid to convert to MERGE/INSERT INTO. -// If we succeed, we modify the plan and afterwards the AST. -// MV should be an acid table. -MaterializedViewRewritingRelVisitor visitor = new MaterializedViewRewritingRelVisitor(); -visitor.go(basePlan); -if (visitor.isRewritingAllowed()) { - if (materialization.isSourceTablesUpdateDeleteModified()) { -if (visitor.isContainsAggregate()) { - if (visitor.getCountIndex() < 0) { -// count(*) is necessary for determine which rows should be deleted from the view -// if view definition does not have it incremental rebuild can not be performed, bail out -return calcitePreMVRewritingPlan; - } - return toAggregateInsertDeleteIncremental(basePlan, mdProvider, executorProvider); -} else { - return toJoinInsertDeleteIncremental( - basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan); -} - } else { -// Trigger rewriting to remove UNION branch with MV -if (visitor.isContainsAggregate()) { - return toAggregateInsertIncremental(basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan); -} else { - return toJoinInsertIncremental(basePlan, mdProvider, executorProvider); -} - } -} else if (materialization.isSourceTablesUpdateDeleteModified()) { - return calcitePreMVRewritingPlan; + +RelNode incrementalRebuildPlan = toIncrementalRebuild( Review comment: renamed ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java ## @@ -261,41 +263,61 @@ protected RelNode applyMaterializedViewRewriting(RelOptPlanner planner, RelNode if (materialization.isSourceTablesCompacted()) { return calcitePreMVRewritingPlan; } -// First we need to check if it is valid to convert to MERGE/INSERT INTO. -// If we succeed, we modify the plan and afterwards the AST. -// MV should be an acid table. -MaterializedViewRewritingRelVisitor visitor = new MaterializedViewRewritingRelVisitor(); -visitor.go(basePlan); -if (visitor.isRewritingAllowed()) { - if (materialization.isSourceTablesUpdateDeleteModified()) { -if (visitor.isContainsAggregate()) { - if (visitor.getCountIndex() < 0) { -// count(*) is necessary for determine which rows should be deleted from the view -// if view definition does not have it incremental rebuild can not be performed, bail out -return calcitePreMVRewritingPlan; - } - return toAggregateInsertDeleteIncremental(basePlan, mdProvider, executorProvider); -} else { - return toJoinInsertDeleteIncremental( - basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan); -} - } else { -// Trigger rewriting to remove UNION branch with MV -if (visitor.isContainsAggregate()) { - return toAggregateInsertIncremental(basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan); -} else { - return toJoinInsertIncremental(basePlan, mdProvider, executorProvider); -} - } -} else if (materialization.isSourceTablesUpdateDeleteModified()) { - return calcitePreMVRewritingPlan; + +RelNode incrementalRebuildPlan = toIncrementalRebuild( +basePlan, mdProvider, executorProvider, optCluster, calcitePreMVRewritingPlan, materialization); +if (mvRebuildMode != MaterializationRebuildMode.INSERT_OVERWRITE_REBUILD) { + return incrementalRebuildPlan; } + +return
[jira] [Resolved] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
[ https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-25242. -- Resolution: Fixed > Query performs extremely slow with hive.vectorized.adaptor.usage.mode = > chosen > --- > > Key: HIVE-25242 > URL: https://issues.apache.org/jira/browse/HIVE-25242 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are > vectorized through the vectorized adaptor. > Queries like this one, performs very slowly because the concat is not chosen > to be vectorized. > {code:java} > select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) > between to_date('2018-12-01') and to_date('2021-03-01'); {code} > The patch whitelists the concat udf so that it uses the vectorized adaptor in > chosen mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25279) Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229
[ https://issues.apache.org/jira/browse/HIVE-25279?focusedWorklogId=614388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614388 ] ASF GitHub Bot logged work on HIVE-25279: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:03 Start Date: 24/Jun/21 09:03 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #2424: URL: https://github.com/apache/hive/pull/2424#issuecomment-867469912 I think we should merge this sooner than later - it will just cause testfailures in innocent PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614388) Time Spent: 20m (was: 10m) > Fix q.outs caused by concurrent commits of HIVE-25240 and HIVE-25229 > > > Key: HIVE-25279 > URL: https://issues.apache.org/jira/browse/HIVE-25279 > Project: Hive > Issue Type: Test > Components: Test >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-25240 added new q.out files, HIVE-25229 modified the lineage output. > Both test were successful without the other, but when they were committed > query tests are failing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614404 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:23 Start Date: 24/Jun/21 09:23 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657780988 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java ## @@ -1252,4 +1252,18 @@ public static ImmutableBitSet extractRefs(Aggregate aggregate) { } return refs.build(); } + + public static Set findRexTableInputRefs(RexNode rexNode) { Review comment: `RexUtil.gatherTableReferences` returns `Set` and not Set. `RelTableRef` does not contains the index of this InputRef in the TS schema which is required to identify if the `RexTableInputRef` instance refers a partition column or not. `RelOptHiveTable.getPartColInfoMap()` contains the indexes not the instances. The RexTableInputRef index is also required by `HiveCardinalityPreservingJoinOptimization` that is why I extarcted `findRexTableInputRefs`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614404) Time Spent: 1h (was: 50m) > Incremental rebuild of partitioned insert only materialized views > - > > Key: HIVE-25253 > URL: https://issues.apache.org/jira/browse/HIVE-25253 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614405 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:23 Start Date: 24/Jun/21 09:23 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657781122 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java ## @@ -0,0 +1,152 @@ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import com.google.common.collect.ImmutableList; +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Aggregate; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Union; +import org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexBuilder; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.rex.RexTableInputRef; +import org.apache.calcite.rex.RexUtil; +import org.apache.calcite.sql.SqlAggFunction; +import org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.calcite.tools.RelBuilder; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories; +import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +import static org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil.findRexTableInputRefs; + +/** + * Rule to prepare the plan for incremental view maintenance if the view is partitioned and insert only: + * Insert overwrite the partitions which are affected since the last rebuild only and leave the + * rest of the partitions intact. + * + * Assume that we have a materialized view partitioned on column a and writeId was 1 at the last rebuild: + * + * CREATE MATERIALIZED VIEW mat1 PARTITIONED ON (a) STORED AS ORC TBLPROPERTIES ("transactional"="true", "transactional_properties"="insert_only") AS + * SELECT a, b, sum(c) sumc FROM t1 GROUP BY b, a; + * + * 1. Query all rows from source tables since the last rebuild. + * 2. Query all rows from MV which are in any of the partitions queried in 1. + * 3. Take the union of rows from 1. and 2. and perform the same aggregations defined in the MV + * + * SELECT b, sum(sumc), a FROM ( Review comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614405) Time Spent: 1h 10m (was: 1h) > Incremental rebuild of partitioned insert only materialized views > - > > Key: HIVE-25253 > URL: https://issues.apache.org/jira/browse/HIVE-25253 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
[ https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614408 ] ASF GitHub Bot logged work on HIVE-25250: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:24 Start Date: 24/Jun/21 09:24 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #2404: URL: https://github.com/apache/hive/pull/2404#issuecomment-867484416 or the fix was in HIVE-25093? ...in case that's true I still don't see the connection between timestamps and the hs2 connection reset :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614408) Time Spent: 1h (was: 50m) > Fix TestHS2ImpersonationWithRemoteMS.testImpersonation > -- > > Key: HIVE-25250 > URL: https://issues.apache.org/jira/browse/HIVE-25250 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614407 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:24 Start Date: 24/Jun/21 09:24 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657781909 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java ## @@ -0,0 +1,152 @@ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import com.google.common.collect.ImmutableList; +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Aggregate; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Union; +import org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexBuilder; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.rex.RexTableInputRef; +import org.apache.calcite.rex.RexUtil; +import org.apache.calcite.sql.SqlAggFunction; +import org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.calcite.tools.RelBuilder; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories; +import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +import static org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil.findRexTableInputRefs; Review comment: I explained why it is needed above. Please let me know If you know an alternative other than you mentioned earlier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614407) Time Spent: 1h 20m (was: 1h 10m) > Incremental rebuild of partitioned insert only materialized views > - > > Key: HIVE-25253 > URL: https://issues.apache.org/jira/browse/HIVE-25253 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
[ https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614406 ] ASF GitHub Bot logged work on HIVE-25250: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:23 Start Date: 24/Jun/21 09:23 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #2404: URL: https://github.com/apache/hive/pull/2404#issuecomment-867483301 I'm a little bit amazed that changing the default `maxRetries` fixes the issue - could you please give some details about it; I'm really interested :) +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614406) Time Spent: 50m (was: 40m) > Fix TestHS2ImpersonationWithRemoteMS.testImpersonation > -- > > Key: HIVE-25250 > URL: https://issues.apache.org/jira/browse/HIVE-25250 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
[ https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614421 ] ASF GitHub Bot logged work on HIVE-25250: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:51 Start Date: 24/Jun/21 09:51 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma edited a comment on pull request #2404: URL: https://github.com/apache/hive/pull/2404#issuecomment-867501246 @kgyrtkirk Thank you for the quick reply. When I did some changes in date_format UDF (HIVE-25093) and push the PR. In hive-precommit test only test which was failling is TestHS2ImpersonationWithRemoteMS.testImpersonation. After investigation I found that it has no relation with UDF but the test it self was flaky. So instead of created an new jira and raising a new PR for fixing TestHS2ImpersonationWithRemoteMS.testImpersonation. I fixed it as part of date_format UDF(HIVE-25093). Before (HIVE-25093) get merged you created new jira (HIVE-25250) and marked the test ignored and merged to master. Now coming to why test failed. In test testImpersonation it try to connect to miniHs2. DriverManager.getConnection(miniHS2.getJdbcURL(), "foo", null); Since this is a network call there is a retry in place to avoid unwanted network failers. So we have a variable ("maxRetries") which control the number of retries and default value of which was "1" and that variable can be overridden by passing "maxRetries={somevalue}" as part of jdbc URL. Since default value "maxRetries=1" is very less. So I changed it to "maxRetries=5" as part of (HIVE-25093). But since you have marked this ignored so I am rasing this PR to uncomment the test and "maxRetries=5" seems to be at higher side. So I am making it maxRetries=3" and uncommenting the test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614421) Time Spent: 1h 20m (was: 1h 10m) > Fix TestHS2ImpersonationWithRemoteMS.testImpersonation > -- > > Key: HIVE-25250 > URL: https://issues.apache.org/jira/browse/HIVE-25250 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
[ https://issues.apache.org/jira/browse/HIVE-25250?focusedWorklogId=614420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614420 ] ASF GitHub Bot logged work on HIVE-25250: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:50 Start Date: 24/Jun/21 09:50 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on pull request #2404: URL: https://github.com/apache/hive/pull/2404#issuecomment-867501246 @kgyrtkirk Thank you reply. When I did some changes in date_format UDF (HIVE-25093) and push the PR. In hive-precommit test only test which was failling is TestHS2ImpersonationWithRemoteMS.testImpersonation. After investigation I found that it has no relation with UDF but the test it self was flaky. So instead of created an new jira and raising a new PR for fixing TestHS2ImpersonationWithRemoteMS.testImpersonation. I fixed it as part of date_format UDF(HIVE-25093). Before (HIVE-25093) get merged you created new jira (HIVE-25250) and marked the test ignored and merged to master. Now coming to why test failed. In test testImpersonation it try to connect to miniHs2. DriverManager.getConnection(miniHS2.getJdbcURL(), "foo", null); Since this is a network call there is a retry in place to avoid unwanted network failers. So we have a variable ("maxRetries") which control the number of retries and default value of which was "1" and that variable can be overridden by passing "maxRetries={somevalue}" as part of jdbc URL. Since default value "maxRetries=1" is very less. So I changed it to "maxRetries=5" as part of (HIVE-25093). But since you have marked this ignored so I am rasing this PR to uncomment the test and "maxRetries=5" seems to be at higher side. So I am making it maxRetries=3" and uncommenting the test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614420) Time Spent: 1h 10m (was: 1h) > Fix TestHS2ImpersonationWithRemoteMS.testImpersonation > -- > > Key: HIVE-25250 > URL: https://issues.apache.org/jira/browse/HIVE-25250 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614398 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:10 Start Date: 24/Jun/21 09:10 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657771007 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java ## @@ -353,6 +375,23 @@ private RelNode toJoinInsertIncremental( basePlan, mdProvider, executorProvider, HiveJoinInsertIncrementalRewritingRule.INSTANCE); } +private RelNode toPartitionInsertOverwrite( +RelNode basePlan, RelMetadataProvider mdProvider, RexExecutor executorProvider, +HiveRelOptMaterialization materialization, RelNode calcitePreMVRewritingPlan) { + + if (materialization.isSourceTablesUpdateDeleteModified()) { +return calcitePreMVRewritingPlan; Review comment: added TODO to implement a version of the HiveAggregatePartitionIncrementalRewritingRule which can handle delete operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614398) Time Spent: 50m (was: 40m) > Incremental rebuild of partitioned insert only materialized views > - > > Key: HIVE-25253 > URL: https://issues.apache.org/jira/browse/HIVE-25253 > Project: Hive > Issue Type: Improvement > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614409 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:25 Start Date: 24/Jun/21 09:25 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657782155 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java ## @@ -0,0 +1,152 @@ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import com.google.common.collect.ImmutableList; +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Aggregate; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Union; +import org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexBuilder; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.rex.RexTableInputRef; +import org.apache.calcite.rex.RexUtil; +import org.apache.calcite.sql.SqlAggFunction; +import org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.calcite.tools.RelBuilder; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories; +import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +import static org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil.findRexTableInputRefs; + +/** + * Rule to prepare the plan for incremental view maintenance if the view is partitioned and insert only: + * Insert overwrite the partitions which are affected since the last rebuild only and leave the + * rest of the partitions intact. + * + * Assume that we have a materialized view partitioned on column a and writeId was 1 at the last rebuild: + * + * CREATE MATERIALIZED VIEW mat1 PARTITIONED ON (a) STORED AS ORC TBLPROPERTIES ("transactional"="true", "transactional_properties"="insert_only") AS + * SELECT a, b, sum(c) sumc FROM t1 GROUP BY b, a; + * + * 1. Query all rows from source tables since the last rebuild. + * 2. Query all rows from MV which are in any of the partitions queried in 1. + * 3. Take the union of rows from 1. and 2. and perform the same aggregations defined in the MV + * + * SELECT b, sum(sumc), a FROM ( + * SELECT b, sumc, a FROM mat1 + * LEFT SEMI JOIN (SELECT b, sum(c), a FROM t1 WHERE ROW__ID.writeId > 1 GROUP BY b, a) q ON (mat1.a <=> q.a) + * UNION ALL + * SELECT b, sum(c) sumc, a FROM t1 WHERE ROW__ID.writeId > 1 GROUP BY b, a + * ) sub + * GROUP BY a, b + */ +public class HiveAggregatePartitionIncrementalRewritingRule extends RelOptRule { + private static final Logger LOG = LoggerFactory.getLogger(HiveAggregatePartitionIncrementalRewritingRule.class); + + public static final HiveAggregatePartitionIncrementalRewritingRule INSTANCE = + new HiveAggregatePartitionIncrementalRewritingRule(); + + private HiveAggregatePartitionIncrementalRewritingRule() { +super(operand(Aggregate.class, operand(Union.class, any())), +HiveRelFactories.HIVE_BUILDER, "HiveJoinPartitionIncrementalRewritingRule"); Review comment: fixed. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java ## @@ -0,0 +1,152 @@ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF
[jira] [Work logged] (HIVE-25253) Incremental rebuild of partitioned insert only materialized views
[ https://issues.apache.org/jira/browse/HIVE-25253?focusedWorklogId=614410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614410 ] ASF GitHub Bot logged work on HIVE-25253: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:25 Start Date: 24/Jun/21 09:25 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2401: URL: https://github.com/apache/hive/pull/2401#discussion_r657782662 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregatePartitionIncrementalRewritingRule.java ## @@ -0,0 +1,152 @@ +package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import com.google.common.collect.ImmutableList; +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Aggregate; +import org.apache.calcite.rel.core.JoinRelType; +import org.apache.calcite.rel.core.Union; +import org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexBuilder; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.rex.RexTableInputRef; +import org.apache.calcite.rex.RexUtil; +import org.apache.calcite.sql.SqlAggFunction; +import org.apache.calcite.sql.fun.SqlStdOperatorTable; +import org.apache.calcite.tools.RelBuilder; +import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories; +import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +import static org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil.findRexTableInputRefs; + +/** + * Rule to prepare the plan for incremental view maintenance if the view is partitioned and insert only: + * Insert overwrite the partitions which are affected since the last rebuild only and leave the + * rest of the partitions intact. + * + * Assume that we have a materialized view partitioned on column a and writeId was 1 at the last rebuild: + * + * CREATE MATERIALIZED VIEW mat1 PARTITIONED ON (a) STORED AS ORC TBLPROPERTIES ("transactional"="true", "transactional_properties"="insert_only") AS + * SELECT a, b, sum(c) sumc FROM t1 GROUP BY b, a; + * + * 1. Query all rows from source tables since the last rebuild. + * 2. Query all rows from MV which are in any of the partitions queried in 1. + * 3. Take the union of rows from 1. and 2. and perform the same aggregations defined in the MV + * + * SELECT b, sum(sumc), a FROM ( + * SELECT b, sumc, a FROM mat1 + * LEFT SEMI JOIN (SELECT b, sum(c), a FROM t1 WHERE ROW__ID.writeId > 1 GROUP BY b, a) q ON (mat1.a <=> q.a) + * UNION ALL + * SELECT b, sum(c) sumc, a FROM t1 WHERE ROW__ID.writeId > 1 GROUP BY b, a + * ) sub + * GROUP BY a, b + */ +public class HiveAggregatePartitionIncrementalRewritingRule extends RelOptRule { + private static final Logger LOG = LoggerFactory.getLogger(HiveAggregatePartitionIncrementalRewritingRule.class); + + public static final HiveAggregatePartitionIncrementalRewritingRule INSTANCE = + new HiveAggregatePartitionIncrementalRewritingRule(); + + private HiveAggregatePartitionIncrementalRewritingRule() { +super(operand(Aggregate.class, operand(Union.class, any())), +HiveRelFactories.HIVE_BUILDER, "HiveJoinPartitionIncrementalRewritingRule"); + } + + @Override + public void onMatch(RelOptRuleCall call) { +RexBuilder rexBuilder = call.builder().getRexBuilder(); + +final Aggregate aggregate = call.rel(0); +final Union union = call.rel(1); +final RelNode queryBranch = union.getInput(0); +final RelNode mvBranch = union.getInput(1); + +// find Partition col indexes in mvBranch top operator row schema +// mvBranch can be more complex than just a TS on the MV and the partition columns indexes in the top Operator's +
[jira] [Work logged] (HIVE-25272) READ transactions are getting logged in NOTIFICATION LOG
[ https://issues.apache.org/jira/browse/HIVE-25272?focusedWorklogId=614341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614341 ] ASF GitHub Bot logged work on HIVE-25272: - Author: ASF GitHub Bot Created on: 24/Jun/21 06:31 Start Date: 24/Jun/21 06:31 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2413: URL: https://github.com/apache/hive/pull/2413#discussion_r657662547 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -182,10 +183,20 @@ public boolean accept(Path path) { public static final int MAX_STATEMENTS_PER_TXN = 1; public static final Pattern LEGACY_BUCKET_DIGIT_PATTERN = Pattern.compile("^[0-9]{6}"); public static final Pattern BUCKET_PATTERN = Pattern.compile("bucket_([0-9]+)(_[0-9]+)?$"); + private static final Set READ_TXN_TOKENS = new HashSet(); private static Cache dirCache; private static AtomicBoolean dirCacheInited = new AtomicBoolean(); + static { +READ_TXN_TOKENS.addAll(Arrays.asList( Review comment: Yes, they are covered -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614341) Time Spent: 2h 40m (was: 2.5h) > READ transactions are getting logged in NOTIFICATION LOG > > > Key: HIVE-25272 > URL: https://issues.apache.org/jira/browse/HIVE-25272 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > While READ transactions are already skipped from getting logged in > NOTIFICATION logs, few are still getting logged. Need to skip those > transactions as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
[ https://issues.apache.org/jira/browse/HIVE-25242?focusedWorklogId=614415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614415 ] ASF GitHub Bot logged work on HIVE-25242: - Author: ASF GitHub Bot Created on: 24/Jun/21 09:38 Start Date: 24/Jun/21 09:38 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #2390: URL: https://github.com/apache/hive/pull/2390#issuecomment-867493230 merged, thanks @zeroflag for the patch and @pgaref for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 614415) Time Spent: 1h (was: 50m) > Query performs extremely slow with hive.vectorized.adaptor.usage.mode = > chosen > --- > > Key: HIVE-25242 > URL: https://issues.apache.org/jira/browse/HIVE-25242 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are > vectorized through the vectorized adaptor. > Queries like this one, performs very slowly because the concat is not chosen > to be vectorized. > {code:java} > select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) > between to_date('2018-12-01') and to_date('2021-03-01'); {code} > The patch whitelists the concat udf so that it uses the vectorized adaptor in > chosen mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)