[jira] [Created] (HIVE-27187) Incremental rebuild of materialized view stored by iceberg
Krisztian Kasa created HIVE-27187: - Summary: Incremental rebuild of materialized view stored by iceberg Key: HIVE-27187 URL: https://issues.apache.org/jira/browse/HIVE-27187 Project: Hive Issue Type: Improvement Components: Iceberg integration, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Currently incremental rebuild of materialized view stored by iceberg which definition query contains aggregate operator is transformed to an insert overwrite statement which contains a union operator if the source tables contains insert operations only. One branch of the union scans the view the other produces the delta. This can be improved further: transform the statement to a multi insert statement representing a merge statement to insert new aggregations and update existing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27101) Support incremental materialized view rebuild when Iceberg source tables have insert operation only.
Krisztian Kasa created HIVE-27101: - Summary: Support incremental materialized view rebuild when Iceberg source tables have insert operation only. Key: HIVE-27101 URL: https://issues.apache.org/jira/browse/HIVE-27101 Project: Hive Issue Type: Improvement Components: Iceberg integration, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27073) Apply SerDe properties when creating materialized view
Krisztian Kasa created HIVE-27073: - Summary: Apply SerDe properties when creating materialized view Key: HIVE-27073 URL: https://issues.apache.org/jira/browse/HIVE-27073 Project: Hive Issue Type: Improvement Components: Iceberg integration, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create table tbl_ice(a int, b string, c int) stored by iceberg stored as orc tblproperties ('format-version'='1'); create materialized view mat1 stored by iceberg stored as orc tblproperties ('format-version'='1') as select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52; {code} Materialized view {{mat1}} should use {{ORC}} file format. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26967) Deadlock when enabling/disabling Materialized view stored by Iceberg
Krisztian Kasa created HIVE-26967: - Summary: Deadlock when enabling/disabling Materialized view stored by Iceberg Key: HIVE-26967 URL: https://issues.apache.org/jira/browse/HIVE-26967 Project: Hive Issue Type: Bug Components: Iceberg integration Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; create table all100k( t int, si int, i int, b bigint, f float, d double, s string, dc decimal(38,18), bo boolean, v string, c string, ts timestamp, dt date) partitioned by spec (BUCKET(16, t)) stored by iceberg stored as parquet; create materialized view mv_rewrite stored by iceberg as select t, si from all100k where t>115; explain select si,t from all100k where t>116 and t<120; alter materialized view mv_rewrite disable rewrite; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26922) Deadlock when rebuilding Materialized view stored by Iceberg
Krisztian Kasa created HIVE-26922: - Summary: Deadlock when rebuilding Materialized view stored by Iceberg Key: HIVE-26922 URL: https://issues.apache.org/jira/browse/HIVE-26922 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create table tbl_ice(a int, b string, c int) stored by iceberg stored as orc tblproperties ('format-version'='1'); insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54); create materialized view mat1 stored by iceberg stored as orc tblproperties ('format-version'='1') as select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52; insert into tbl_ice values (10, 'ten', 60); alter materialized view mat1 rebuild; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26864) Incremental rebuild of non-transaction materialized view fails
Krisztian Kasa created HIVE-26864: - Summary: Incremental rebuild of non-transaction materialized view fails Key: HIVE-26864 URL: https://issues.apache.org/jira/browse/HIVE-26864 Project: Hive Issue Type: Bug Components: CBO, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create table t1 (a int, b int) stored as orc TBLPROPERTIES ('transactional'='true'); insert into t1 values (1,1), (2,1), (3,3); create materialized view mv1 as select a, b from t1 where b = 1; delete from t1 where a = 2; explain alter materialized view mv1 rebuild; {code} {code} org.apache.hadoop.hive.ql.parse.SemanticException: Attempt to do update or delete on table mv1 that is not transactional at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2400) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2176) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2168) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:630) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12790) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:464) at org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:132) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:474) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:439) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:433) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:727) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:697) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at
[jira] [Created] (HIVE-26817) Set column names in result schema when plan has Values root
Krisztian Kasa created HIVE-26817: - Summary: Set column names in result schema when plan has Values root Key: HIVE-26817 URL: https://issues.apache.org/jira/browse/HIVE-26817 Project: Hive Issue Type: Improvement Components: CBO Reporter: Krisztian Kasa Assignee: Krisztian Kasa The query {code} select b1, count(a1) count1 from (select a1, b1 from t1) s where 1=0 group by b1; {code} should have a result with column names {code} b1 count1 {code} but it is {code} $f0 $f1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26795) Iceberg integration: clean up temporary files in case of statement cancel
Krisztian Kasa created HIVE-26795: - Summary: Iceberg integration: clean up temporary files in case of statement cancel Key: HIVE-26795 URL: https://issues.apache.org/jira/browse/HIVE-26795 Project: Hive Issue Type: Bug Components: Iceberg integration Reporter: Krisztian Kasa Iceberg write operations are performed in the Tez task but the Iceberg commit of these writes are happening in the move task. To inform the MoveTask what writes has to be committed temp files are created with the path of the actual datafiles. Also in case of ctas statements the table is created by the ddl task is serialized into a temp file to be available for the Tez task which does the writes into the newly created table. Normally the cleanup of these temp files are happening in the move task but this task is not executed in case of cancel or an error in tez task. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26771) Use DDLTask to created Iceberg table when running ctas statement
Krisztian Kasa created HIVE-26771: - Summary: Use DDLTask to created Iceberg table when running ctas statement Key: HIVE-26771 URL: https://issues.apache.org/jira/browse/HIVE-26771 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: Krisztian Kasa Assignee: Krisztian Kasa When Iceberg table is created via ctas statement the table is created in HiveIcebergSerDe and no DDL task is executed. Negative effects of this workflow: * Default privileges of the new table are not granted. * The new Iceberg table can be seen by other transactions at compile time of ctas. * Table creation and table properties are not shown in explain ctas output. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26747) Remove implementor from HiveRelNode
Krisztian Kasa created HIVE-26747: - Summary: Remove implementor from HiveRelNode Key: HIVE-26747 URL: https://issues.apache.org/jira/browse/HIVE-26747 Project: Hive Issue Type: Task Components: CBO Reporter: Krisztian Kasa Assignee: Krisztian Kasa Calcite's VolcanoPlanner [1] relies on calling convention [2]. In Hive this is represented by the [HiveRelNode|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveRelNode.java] interface's {{CONVENTION}} field. This interface has to be implemented by all Hive operators to have the Hive calling convention behavior. The interface also defines the {code:java} void implement(Implementor implementor); {code} method but none of the operators gives an implementation and the method is never called. [1] [https://15721.courses.cs.cmu.edu/spring2017/papers/14-optimizer1/graefe-icde1993.pdf] [2] [https://arxiv.org/pdf/1802.10233.pdf] (Section 4, traits) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26628) Iceberg table is created when running explain ctas command
Krisztian Kasa created HIVE-26628: - Summary: Iceberg table is created when running explain ctas command Key: HIVE-26628 URL: https://issues.apache.org/jira/browse/HIVE-26628 Project: Hive Issue Type: Bug Components: StorageHandler Reporter: Krisztian Kasa Fix For: 4.0.0 {code} create table source(a int, b string, c int); explain create table tbl_ice stored by iceberg stored as orc tblproperties ('format-version'='2') as select a, b, c from source; create table tbl_ice stored by iceberg stored as orc tblproperties ('format-version'='2') as select a, b, c from source; {code} {code} org.apache.hadoop.hive.ql.parse.SemanticException: org.apache.hadoop.hive.ql.parse.SemanticException: Table already exists: default.tbl_ice at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:13963) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12528) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12693) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:474) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:439) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:433) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:200) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:126) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:727) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:697) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestIcebergLlapLocalCliDriver.testCliDriver(TestIcebergLlapLocalCliDriver.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.
[jira] [Created] (HIVE-26618) Add setting to turn on/off removing sections of a query plan known never produces rows
Krisztian Kasa created HIVE-26618: - Summary: Add setting to turn on/off removing sections of a query plan known never produces rows Key: HIVE-26618 URL: https://issues.apache.org/jira/browse/HIVE-26618 Project: Hive Issue Type: Improvement Components: CBO Reporter: Krisztian Kasa Assignee: Krisztian Kasa HIVE-26524 introduced an optimization to remove sections of query plan known never produces rows. Add a setting into hive conf to turn on/off this optimization. When the optimization is turned off restore the legacy behavior: * represent empty result operator with {{HiveSortLimit}} 0 * disable {{HiveRemoveEmptySingleRules}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26578) Enable Iceberg storage format for materialized views
Krisztian Kasa created HIVE-26578: - Summary: Enable Iceberg storage format for materialized views Key: HIVE-26578 URL: https://issues.apache.org/jira/browse/HIVE-26578 Project: Hive Issue Type: Improvement Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 {code} create materialized view mat1 stored by iceberg stored as orc tblproperties ('format-version'='1') as select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26524) Use Calcite to remove sections of a query plan known never produces rows
Krisztian Kasa created HIVE-26524: - Summary: Use Calcite to remove sections of a query plan known never produces rows Key: HIVE-26524 URL: https://issues.apache.org/jira/browse/HIVE-26524 Project: Hive Issue Type: Improvement Components: CBO Reporter: Krisztian Kasa Assignee: Krisztian Kasa Calcite has a set of rules to remove sections of a query plan known never produces any rows. In some cases the whole plan can be removed. Such plans are represented with a single {{Values}} operators with no tuples. ex.: {code} select y + 1 from (select a1 y, b1 z from t1 where b1 > 10) q WHERE 1=0 {code} {code} HiveValues(tuples=[[]]) {code} Other cases when plan has outer join or set operators some branches can be replaced with empty values moving forward the join/set operator can be removed {code} select a2, b2 from t2 where 1=0 union select a1, b1 from t1 {code} {code} HiveAggregate(group=[{0, 1}]) HiveTableScan(table=[[default, t1]], table:alias=[t1]) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26498) Implement MV maintenance with Iceberg sources using full rebuild
Krisztian Kasa created HIVE-26498: - Summary: Implement MV maintenance with Iceberg sources using full rebuild Key: HIVE-26498 URL: https://issues.apache.org/jira/browse/HIVE-26498 Project: Hive Issue Type: Sub-task Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; create external table tbl_ice(a int, b string, c int) stored by iceberg stored as orc tblproperties ('format-version'='2'); insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54); create materialized view mat1 as select b, c from tbl_ice where c > 52; insert into tbl_ice values (111, 'one', 55), (333, 'two', 56); explain cbo alter materialized view mat1 rebuild; alter materialized view mat1 rebuild; {code} MV full rebuild plan {code} CBO PLAN: HiveProject(b=[$1], c=[$2]) HiveFilter(condition=[>($2, 52)]) HiveTableScan(table=[[default, tbl_ice]], table:alias=[tbl_ice]) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26497) Support materialized views on Iceberg source tables
Krisztian Kasa created HIVE-26497: - Summary: Support materialized views on Iceberg source tables Key: HIVE-26497 URL: https://issues.apache.org/jira/browse/HIVE-26497 Project: Hive Issue Type: New Feature Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; create external table tbl_ice(a int, b string, c int) stored by iceberg stored as orc tblproperties ('format-version'='2'); insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54); create materialized view mat1 as select b, c from tbl_ice where c > 52; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26452) NPE when converting join to mapjoin and join column referenced more than once
Krisztian Kasa created HIVE-26452: - Summary: NPE when converting join to mapjoin and join column referenced more than once Key: HIVE-26452 URL: https://issues.apache.org/jira/browse/HIVE-26452 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} explain select count(*) from LU_CUSTOMER pa11 joinORDER_FACTa15 on (pa11.CUSTOMER_ID = a15.CUSTOMER_ID) joinLU_CUSTOMERa16 on (a15.CUSTOMER_ID = a16.CUSTOMER_ID and pa11.CUSTOMER_ID = a16.CUSTOMER_ID); {code} {{a16.CUSTOMER_ID}} is referenced more than once in the join condition. Hive generates Reduce sink operators for the join's children and one of the RS row schema contains only one instance of the join keys (customer_id). {code} RS[13] result = {HashMap@16092} size = 2 "KEY.reducesinkkey0" -> {ExprNodeColumnDesc@16083} "Column[_col0]" "KEY.reducesinkkey1" -> {ExprNodeColumnDesc@16102} "Column[_col0]" result = {RowSchema@16104} "(KEY.reducesinkkey0: int|{$hdt$_2}customer_id)" signature = {ArrayList@16110} size = 1 0 = {ColumnInfo@16087} "KEY.reducesinkkey0: int" {code} {{KEY.reducesinkkey1}} is missing from the schema. When converting the join to mapjoin the converter algorithm fails looking up both join key column instances. https://github.com/apache/hive/blob/2aaba3c79e740ef27fc263b5a8ff33ad679c5a12/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java#L538 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26417) Iceberg integration: disable update and merge iceberg table when split update is off
Krisztian Kasa created HIVE-26417: - Summary: Iceberg integration: disable update and merge iceberg table when split update is off Key: HIVE-26417 URL: https://issues.apache.org/jira/browse/HIVE-26417 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 Iceberg table update and merge is implemented using split update early by HIVE-26319 and HIVE-26385. Without split update early deleted records has to be buffered in memory when updating iceberg tables. With split update early deleted records are processed by a separate reducer and no buffering is required. The ReduceSink operator also sorts the records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26385) Iceberg integration: Implement merge into iceberg table
Krisztian Kasa created HIVE-26385: - Summary: Iceberg integration: Implement merge into iceberg table Key: HIVE-26385 URL: https://issues.apache.org/jira/browse/HIVE-26385 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 {code} create external table target_ice(a int, b string, c int) partitioned by spec (bucket(16, a), truncate(3, b)) stored by iceberg stored as orc tblproperties ('format-version'='2'); create table source(a int, b string, c int); ... merge into target_ice as t using source src ON t.a = src.a when matched and t.a > 100 THEN DELETE when matched then update set b = 'Merged', c = t.c + 10 when not matched then insert values (src.a, src.b, src.c); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26375) Invalid materialized view after rebuild if source table was compacted
Krisztian Kasa created HIVE-26375: - Summary: Invalid materialized view after rebuild if source table was compacted Key: HIVE-26375 URL: https://issues.apache.org/jira/browse/HIVE-26375 Project: Hive Issue Type: Bug Components: Materialized views, Transactions Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 After HIVE-25656 MV state depends on the number of rows deleted/updated in the source tables of the view. However if one of the source tables are major compacted the delete delta files are no longer available and reproducing the rows should be deleted from the MV is no longer possible. {code} create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES ('transactional'='true'); insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, NULL); create materialized view mv1 stored as orc TBLPROPERTIES ('transactional'='true') as select a,b,c from t1 where a > 0 or a is null; update t1 set b = 'Changed' where a = 1; alter table t1 compact 'major'; alter materialized view t1 rebuild; select * from mv1; {code} Select should result {code} "1\tChanged\t1.1", "2\ttwo\t2.2", "NULL\tNULL\tNULL" {code} but was {code} "1\tone\t1.1", "2\ttwo\t2.2", "NULL\tNULL\tNULL", "1\tChanged\t1.1" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26372) QTests depend on mysql docker image are fail
Krisztian Kasa created HIVE-26372: - Summary: QTests depend on mysql docker image are fail Key: HIVE-26372 URL: https://issues.apache.org/jira/browse/HIVE-26372 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 When QTest framework launches a mysql docker container checks whether the mysql instance is ready for receiving connections. It search for the text {code} ready for connections {code} in the stderr: https://github.com/apache/hive/blob/2f619988f69a569bfcdc2bef5d35a9ecabb2ef13/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/MySQLExternalDB.java#L56 Seems that this behavior is changed at MySql side and QTest framework enters into a infinite loo then times out after 300 sec. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26371) Constant propagation does not evaluate constraint expressions at merge when CBO is enabled
Krisztian Kasa created HIVE-26371: - Summary: Constant propagation does not evaluate constraint expressions at merge when CBO is enabled Key: HIVE-26371 URL: https://issues.apache.org/jira/browse/HIVE-26371 Project: Hive Issue Type: Bug Components: CBO, Logical Optimizer Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 {code} CREATE TABLE t_target( name string CHECK (length(name)<=20), age int, gpa double CHECK (gpa BETWEEN 0.0 AND 4.0)) stored as orc TBLPROPERTIES ('transactional'='true'); CREATE TABLE t_source( name string, age int, gpa double); insert into t_source(name, age, gpa) values ('student1', 16, null); insert into t_target(name, age, gpa) values ('student1', 16, 2.0); merge into t_target using t_source source on source.age=t_target.age when matched then update set gpa=6; {code} Currently CBO can not handle constraint checks when merging so the filter operator with the {{enforce_constraint}} call is added to the Hive operator plan after CBO is succeeded and {{ConstantPropagate}} optimization is called only from TezCompiler with {{ConstantPropagateOption.SHORTCUT}}. With this option {{ConstantPropagate}} does not evaluate deterministic functions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26370) Check stats are up-to-date when getting materialized view state
Krisztian Kasa created HIVE-26370: - Summary: Check stats are up-to-date when getting materialized view state Key: HIVE-26370 URL: https://issues.apache.org/jira/browse/HIVE-26370 Project: Hive Issue Type: Bug Components: Materialized views, Statistics, Transactions Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 Since HIVE-25656 materialized view state depends on the number of affected rows of transactions made on the source tables. If {code} hive.stats.autogather=false; {code} the number of affected rows of transactions are not collected and it can cause invalid stats of source tables which leads to false indications about MV status. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26340) Vectorized PTF operator fails if query has upper case window function
Krisztian Kasa created HIVE-26340: - Summary: Vectorized PTF operator fails if query has upper case window function Key: HIVE-26340 URL: https://issues.apache.org/jira/browse/HIVE-26340 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 {code} SELECT ROW_NUMBER() OVER(order by age) AS rn FROM studentnull100; {code} {code} 2022-06-16T14:18:57,728 ERROR [pool-4-thread-1] jdbc.TestDriver: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 7, vertexId=vertex_1655217967697_0062_1_01, diagnostics=[Task failed, taskId=task_1655217967697_0062_1_01_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1655217967697_0062_1_01_00_0:java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluator(VectorPTFDesc.java:165) at org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluators(VectorPTFDesc.java:381) at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.initializeOp(VectorPTFOperator.java:317) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:571) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:523) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:211) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:268) ... 16 more {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26319) Iceberg integration: Perform update split early
Krisztian Kasa created HIVE-26319: - Summary: Iceberg integration: Perform update split early Key: HIVE-26319 URL: https://issues.apache.org/jira/browse/HIVE-26319 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Assignee: Krisztian Kasa Extend update split early to iceberg tables like in HIVE-21160 for native acid tables -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26274) No vectorization if query has upper case window function
Krisztian Kasa created HIVE-26274: - Summary: No vectorization if query has upper case window function Key: HIVE-26274 URL: https://issues.apache.org/jira/browse/HIVE-26274 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} CREATE TABLE t1 (a int, b int); EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1; {code} {code} PLAN VECTORIZATION: enabled: true enabledConditionsMet: [hive.vectorized.execution.enabled IS true] STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) Vertices: Map 1 Execution mode: vectorized, llap LLAP IO: all inputs Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vector.serde.deserialize IS true inputFormatFeatureSupport: [DECIMAL_64] featureSupportInUse: [DECIMAL_64] inputFileFormats: org.apache.hadoop.mapred.TextInputFormat allNative: true usesVectorUDFAdaptor: false vectorized: true Reducer 2 Execution mode: llap Reduce Vectorization: enabled: true enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine tez IN [tez] IS true notVectorizedReason: PTF operator: ROW_NUMBER not in supported functions [avg, count, dense_rank, first_value, lag, last_value, lead, max, min, rank, row_number, sum] vectorized: false Stage: Stage-0 Fetch Operator {code} {code} notVectorizedReason: PTF operator: ROW_NUMBER not in supported functions [avg, count, dense_rank, first_value, lag, last_value, lead, max, min, rank, row_number, sum] {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
Krisztian Kasa created HIVE-26264: - Summary: Iceberg integration: Fetch virtual columns on demand Key: HIVE-26264 URL: https://issues.apache.org/jira/browse/HIVE-26264 Project: Hive Issue Type: Bug Components: File Formats Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 Currently virtual columns are fetched from iceberg tables if the statement being executed is a delete or update statement and the setting is global. It means it affects all tables affected by the statement. Also the read and write schema depends on the operation setting. Some statements fails due to invalid schema: {code} create external table tbl_ice(a int, b string, c int) stored by iceberg stored as orc tblproperties ('format-version'='2'); insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); {code} {code} See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs. org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:574) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) ... 18 more Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector.get(JavaIntObjectInspector.java:40) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPLessThan.evaluate(GenericUDFOPLessThan.java:127) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:235) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:92) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:70) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEva
[jira] [Created] (HIVE-26160) Materialized View rewrite does not check tables scanned in sub-query expressions
Krisztian Kasa created HIVE-26160: - Summary: Materialized View rewrite does not check tables scanned in sub-query expressions Key: HIVE-26160 URL: https://issues.apache.org/jira/browse/HIVE-26160 Project: Hive Issue Type: Bug Components: CBO, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Materialized View rewrite based on exact sql text match uses the initial CBO plan to explore possibilities to change the query plan or part of the plan to an MV scan. This algorithm requires the tables scanned by the original query plan. If the query contains sub query expressions the tables scanned by the sub query are not listed which can lead to rewrite the original plan to scan an outdated MV. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26043) Use constraint info when creating RexNodes
Krisztian Kasa created HIVE-26043: - Summary: Use constraint info when creating RexNodes Key: HIVE-26043 URL: https://issues.apache.org/jira/browse/HIVE-26043 Project: Hive Issue Type: Improvement Components: CBO Reporter: Krisztian Kasa Assignee: Krisztian Kasa Prior HIVE-23100 Not null constraints affected newly created RexNode type nullability. Nullability enables the subquery rewrite algorithm to generate more optimal plan. [https://github.com/apache/hive/blob/1213ad3f0ae0e21e7519dc28b8b6d1401cdd1441/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L324] Example: {code:java} explain cbo select ws_sales_price from web_sales, customer, item where ws_bill_customer_sk = c_customer_sk and ws_item_sk = i_item_sk and ( c_customer_sk = 1 or i_item_id in (select i_item_id from item where i_item_sk in (2, 3) ) ); {code} Without not null constraints {code:java} HiveProject(ws_sales_price=[$2]) HiveFilter(condition=[OR(AND(<>($6, 0), IS NOT NULL($8)), =($3, 1))]) HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], c_customer_sk=[$8], i_item_sk=[$3], i_item_id=[$4], c=[$5], i_item_id0=[$6], literalTrue=[$7]) HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], ws_sales_price=[$20]) HiveFilter(condition=[IS NOT NULL($3)]) HiveTableScan(table=[[default, web_sales]], table:alias=[web_sales]) HiveJoin(condition=[=($1, $3)], joinType=[left], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(c=[$0]) HiveAggregate(group=[{}], c=[COUNT()]) HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(i_item_id=[$0], literalTrue=[true]) HiveAggregate(group=[{1}]) HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(c_customer_sk=[$0]) HiveTableScan(table=[[default, customer]], table:alias=[customer]) {code} With not null constraints {code:java} HiveProject(ws_sales_price=[$2]) HiveFilter(condition=[OR(IS NOT NULL($7), =($3, 1))]) HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], c_customer_sk=[$7], i_item_sk=[$3], i_item_id=[$4], i_item_id0=[$5], literalTrue=[$6]) HiveJoin(condition=[=($1, $7)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], ws_sales_price=[$20]) HiveFilter(condition=[IS NOT NULL($3)]) HiveTableScan(table=[[default, web_sales]], table:alias=[web_sales]) HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], cost=[not available]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(i_item_id=[$0], literalTrue=[true]) HiveAggregate(group=[{1}]) HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(c_customer_sk=[$0]) HiveTableScan(table=[[default, customer]], table:alias=[customer]) {code} In the first plan when not null constraints was ignored there is an extra {{item}} table join without join condition: {code:java} HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(c=[$0]) HiveAggregate(group=[{}], c=[COUNT()]) HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) HiveTableScan(table=[[default, item]], table:alias=[item]) {code} The planner is not aware that the {{i_item_id}} column has {{not null}} defined and it expects null values which needs the extra join. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25979) Order of Lineage is flaky in qtest output
Krisztian Kasa created HIVE-25979: - Summary: Order of Lineage is flaky in qtest output Key: HIVE-25979 URL: https://issues.apache.org/jira/browse/HIVE-25979 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa When running {code:java} mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=stats_part_multi_insert_acid.q -pl itests/qtest -Pitests {code} The lineage output of statement: {code:java} from source insert into stats_part select key, value, p insert into stats_part select key, value, p {code} is expected to be {code:java} POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE [(source)source.FieldSchema(name:key, type:int, comment:null), ] POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE [(source)source.FieldSchema(name:key, type:int, comment:null), ] POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE [(source)source.FieldSchema(name:value, type:string, comment:null), ] POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE [(source)source.FieldSchema(name:value, type:string, comment:null), ] {code} but sometimes it is {code:java} POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE [(source)source.FieldSchema(name:key, type:int, comment:null), ] POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE [(source)source.FieldSchema(name:value, type:string, comment:null), ] POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE [(source)source.FieldSchema(name:key, type:int, comment:null), ] POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE [(source)source.FieldSchema(name:value, type:string, comment:null), ] {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25969) Unable to reference table column named default
Krisztian Kasa created HIVE-25969: - Summary: Unable to reference table column named default Key: HIVE-25969 URL: https://issues.apache.org/jira/browse/HIVE-25969 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} CREATE TABLE t1 (a int, `default` int) stored as orc TBLPROPERTIES ('transactional'='true'); insert into t1 values (1, 2), (10, 11); update t1 set a = `default`; select * from t1; {code} result is {code} NULLNULL NULLNULL {code} but it should be {code} 11 11 2 2 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite
Krisztian Kasa created HIVE-25941: - Summary: Long compilation time of complex query due to analysis for materialized view rewrite Key: HIVE-25941 URL: https://issues.apache.org/jira/browse/HIVE-25941 Project: Hive Issue Type: Bug Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa When compiling query the optimizer tries to rewrite the query plan or subtrees of the plan to use materialized view scans. If {code} set hive.materializedview.rewriting.sql.subquery=false; {code} the compilation succeed in less then 10 sec otherwise it takes several minutes (~ 5min) depending on the hardware. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25937) Create view fails when definition contains a materialized view definition
Krisztian Kasa created HIVE-25937: - Summary: Create view fails when definition contains a materialized view definition Key: HIVE-25937 URL: https://issues.apache.org/jira/browse/HIVE-25937 Project: Hive Issue Type: Bug Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa View definition contains the materialized view definition as subquery: {code} create materialized view mv1 as select * from t1 where col0 > 2 union select * from t1 where col0 = 0; explain cbo create view v1 as select sub.* from (select * from t1 where col0 > 2 union select * from t1 where col0 = 0) sub where sub.col0 = 10 {code} {code} See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs. org.apache.hadoop.hive.ql.parse.SemanticException: View definition references materialized view default@mv1 at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211) at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:501) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:727) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:697) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(Parent
[jira] [Created] (HIVE-25918) Invalid stats after multi inserting into the same partition
Krisztian Kasa created HIVE-25918: - Summary: Invalid stats after multi inserting into the same partition Key: HIVE-25918 URL: https://issues.apache.org/jira/browse/HIVE-25918 Project: Hive Issue Type: Bug Components: Statistics Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create table source(p int, key int,value string); insert into source(p, key, value) values (101,42,'string42'); create table stats_part(key int,value string) partitioned by (p int); from source insert into stats_part select key, value, p insert into stats_part select key, value, p; select count(*) from stats_part; {code} In this case {{StatsOptimizer}} helps serving this query because the result should be {{rowNum}} of the partition {{p=101}}. The result is {code} 1 {code} however it shloud be {code} 2 {code} because both insert branches inserts 1-1 records. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25906) Clean MaterializedViewCache after q test run
Krisztian Kasa created HIVE-25906: - Summary: Clean MaterializedViewCache after q test run Key: HIVE-25906 URL: https://issues.apache.org/jira/browse/HIVE-25906 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25900) Materialized view registry does not clean non existing views at refresh
Krisztian Kasa created HIVE-25900: - Summary: Materialized view registry does not clean non existing views at refresh Key: HIVE-25900 URL: https://issues.apache.org/jira/browse/HIVE-25900 Project: Hive Issue Type: Bug Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa CBO plans of materialized views which are enabled for query rewrite are cached in HS2 (MaterializedViewsCache, HiveMaterializedViewsRegistry) The registry is refreshed periodically from HMS: {code:java} set hive.server2.materializedviews.registry.refresh.period=1500s; {code} This functionality is required when multiple HS2 instances are used in a cluster: MV drop operation is served by one of the HS2 instances and the registry is updated at that time in that instance. However other HS2 instances still cache the non-existent view and need to be refreshed by the updater thread. Currently the updater thread adds new entries, refresh existing ones but does not remove the outdated entries. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25899) Materialized view registry does not clean dropped views
Krisztian Kasa created HIVE-25899: - Summary: Materialized view registry does not clean dropped views Key: HIVE-25899 URL: https://issues.apache.org/jira/browse/HIVE-25899 Project: Hive Issue Type: Bug Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa CBO plans of materialized views which are enabled for query rewrite are cached in HS2 (MaterializedViewsCache) Dropping a materialized views should remove the entry from the cache however the entry keys are not removed. Cache state after running a whole PTest split: {code} this = {HiveMaterializedViewsRegistry@20858} materializedViewsCache = {MaterializedViewsCache@20913} materializedViews = {ConcurrentHashMap@67654} size = 3 "default" -> {ConcurrentHashMap@28568} size = 8 key = "default" value = {ConcurrentHashMap@28568} size = 8 "cluster_mv_2" -> {HiveRelOptMaterialization@67786} "cluster_mv_1" -> {HiveRelOptMaterialization@67788} "cluster_mv_4" -> {HiveRelOptMaterialization@67790} "cluster_mv_3" -> {HiveRelOptMaterialization@67792} "cmv_mat_view_n10" -> {HiveRelOptMaterialization@67794} "distribute_mv_1" -> {HiveRelOptMaterialization@67796} "distribute_mv_3" -> {HiveRelOptMaterialization@67798} "distribute_mv_2" -> {HiveRelOptMaterialization@67800} "db2" -> {ConcurrentHashMap@67772} size = 2 key = "db2" value = {ConcurrentHashMap@67772} size = 2 "cmv_mat_view_n7" -> {HiveRelOptMaterialization@67806} "cmv_mat_view2_n2" -> {HiveRelOptMaterialization@67808} "count_distinct" -> {ConcurrentHashMap@67774} size = 0 key = "count_distinct" value = {ConcurrentHashMap@67774} size = 0 sqlToMaterializedView = {ConcurrentHashMap@20915} size = 36 "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n FROM `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n WHERE `cmv_basetable_2_n100`.`c` > 10.0\n GROUP BY `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`" -> {ArrayList@67694} size = 0 key = "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n FROM `default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON (`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n WHERE `cmv_basetable_2_n100`.`c` > 10.0\n GROUP BY `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`" value = {ArrayList@67694} size = 0 "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno`" -> {ArrayList@67696} size = 0 key = "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from `default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno`" value = {ArrayList@67696} size = 0 "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" -> {ArrayList@67698} size = 1 key = "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from `db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" value = {ArrayList@67698} size = 1 "SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 100 as `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM `default`.`src_txn`, `default`.`src_txn_2`\nWHERE `src_txn`.`key` = `src_txn_2`.`key`\n AND `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) `cluster_mv_3`" -> {ArrayList@67700} size = 1 key = "SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 100 as `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM `default`.`src_txn`, `default`.`src_txn_2`\nWHERE `src_txn`.`key` = `src_txn_2`.`key`\n AND `src_txn`.`key` > 200 AND `src_txn`.`key` < 250) `cluster_mv_3`" value = {ArrayList@67700} size = 1 "SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n FROM `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n WHERE `cmv_basetable_2_n3`.`c` > 10.0" -> {ArrayList@67702} size = 0 key = "SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n FROM `default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON (`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n WHERE `cmv_basetable_2_n3`.`c` > 10.0" value = {ArrayList@67702} size = 0 "SELECT `src_txn`.`key`, `src_txn`.`value` FROM `default`.`src_txn` where `src_txn`.`key` > 200 and `src_txn`.`key` < 250" -> {ArrayList@67704} size = 1 key = "SELECT `src_txn`.`key`, `src_txn`.`value` FROM `default`.`src_txn` where `src_txn`.`key` > 200 and `src_txn`.`key` < 250" value = {ArrayList@67704} size = 1 "select `cmv_basetable_n9`.`a`, `cmv_basetable_2_n4`.`c`\n from `default`.`cmv_basetable_n9` join `default`.`cmv_basetable_2_n4` on
[jira] [Created] (HIVE-25878) Unable to compile cpp metastore thrift client
Krisztian Kasa created HIVE-25878: - Summary: Unable to compile cpp metastore thrift client Key: HIVE-25878 URL: https://issues.apache.org/jira/browse/HIVE-25878 Project: Hive Issue Type: Bug Components: Thrift API Reporter: Krisztian Kasa Assignee: Krisztian Kasa The following structs definition contains circular dependency: {code:java} struct SourceTable { 1: required Table table, ... } struct CreationMetadata { ... 7: optional set sourceTables } struct Table { ... 16: optional CreationMetadata creationMetadata, // only for MVs, it stores table names used and txn list at MV creation ... } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25858) DISTINCT with ORDER BY on ordinals fails with NPE
Krisztian Kasa created HIVE-25858: - Summary: DISTINCT with ORDER BY on ordinals fails with NPE Key: HIVE-25858 URL: https://issues.apache.org/jira/browse/HIVE-25858 Project: Hive Issue Type: Bug Components: CBO Reporter: Krisztian Kasa Assignee: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25818) Values query with order by position clause fails
Krisztian Kasa created HIVE-25818: - Summary: Values query with order by position clause fails Key: HIVE-25818 URL: https://issues.apache.org/jira/browse/HIVE-25818 Project: Hive Issue Type: Bug Components: CBO, Query Planning Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} values(1+1, 2, 5.0, 'a') order by 1 limit 2; {code} {code} java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.getFieldIndexFromColumnNumber(CalcitePlanner.java:4146) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.beginGenOBLogicalPlan(CalcitePlanner.java:4028) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genOBLogicalPlan(CalcitePlanner.java:3933) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5148) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1651) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1593) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1345) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:563) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12565) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:726) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:696) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runn
[jira] [Created] (HIVE-25805) Wrong result when rebuilding MV with count(col) incremental
Krisztian Kasa created HIVE-25805: - Summary: Wrong result when rebuilding MV with count(col) incremental Key: HIVE-25805 URL: https://issues.apache.org/jira/browse/HIVE-25805 Project: Hive Issue Type: Bug Components: CBO, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code:java} create table t1(a char(15), b int) stored as orc TBLPROPERTIES ('transactional'='true'); insert into t1(a, b) values ('old', 1); create materialized view mat1 stored as orc TBLPROPERTIES ('transactional'='true') as select t1.a, count(t1.b), count(*) from t1 group by t1.a; delete from t1 where b = 1; insert into t1(a,b) values ('new', null); alter materialized view mat1 rebuild; select * from mat1; {code} returns {code:java} new 1 1 {code} but, should be {code:java} new 0 1 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25771) Stats may be incorrect under concurrent inserts if direct-insert is Off
Krisztian Kasa created HIVE-25771: - Summary: Stats may be incorrect under concurrent inserts if direct-insert is Off Key: HIVE-25771 URL: https://issues.apache.org/jira/browse/HIVE-25771 Project: Hive Issue Type: Bug Components: Statistics Reporter: Krisztian Kasa Assignee: Krisztian Kasa Table statistics value Number of rows may be invalid after inserting into the same partition concurrently using multiple user sessions. This can also lead to invalid query results because count(*) may served from stats. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25747) Make a cost base decision when rebuilding materialized views
Krisztian Kasa created HIVE-25747: - Summary: Make a cost base decision when rebuilding materialized views Key: HIVE-25747 URL: https://issues.apache.org/jira/browse/HIVE-25747 Project: Hive Issue Type: Improvement Components: CBO, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Choose between full insert-overwrite and partition based incremental rebuild plan when rebuilding partitioned materialized views. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25745) Print transactional stats of materialized view source tables
Krisztian Kasa created HIVE-25745: - Summary: Print transactional stats of materialized view source tables Key: HIVE-25745 URL: https://issues.apache.org/jira/browse/HIVE-25745 Project: Hive Issue Type: Improvement Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Print the number of rows affected by transactions of materialized view source tables since the last rebuild of the view when using the command {code:java} DESCRIBE FORMATTED ; {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25744) Support backward compatibility of thrift struct CreationMetadata
Krisztian Kasa created HIVE-25744: - Summary: Support backward compatibility of thrift struct CreationMetadata Key: HIVE-25744 URL: https://issues.apache.org/jira/browse/HIVE-25744 Project: Hive Issue Type: Task Components: Materialized views, Thrift API Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 Old {code} struct CreationMetadata { 1: required string catName 2: required string dbName, 3: required string tblName, 4: required set tablesUsed, 5: optional string validTxnList, 6: optional i64 materializationTime }HIVE-25656 introduced a breaking change in the HiveServer2 <-> Metastore thrift api: {code} New {code} struct CreationMetadata { 1: required string catName 2: required string dbName, 3: required string tblName, 4: required set tablesUsed, 5: optional string validTxnList, 6: optional i64 materializationTime } {code} 4th field type changed -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25656) Get materialized view state based on number of affected rows by transactions
Krisztian Kasa created HIVE-25656: - Summary: Get materialized view state based on number of affected rows by transactions Key: HIVE-25656 URL: https://issues.apache.org/jira/browse/HIVE-25656 Project: Hive Issue Type: Improvement Components: Materialized views, Transactions Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 To enable the faster incremental rebuild of materialized views presence of update/delete operations on the source tables of the view since the last rebuild must be checked. Based on the outcome different plan is generated for scenarios in presence of update/delete and insert only operations. Currently this is done by querying the COMPLETED_TXN_COMPONENTS table however the records from this table is cleaned when MV source tables are compacted. This reduces the chances of incremental MV rebuild. The goal of this patch is to find an alternative way to store and retrieve this information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25654) Stats of transactional table updated when transaction is rolled back
Krisztian Kasa created HIVE-25654: - Summary: Stats of transactional table updated when transaction is rolled back Key: HIVE-25654 URL: https://issues.apache.org/jira/browse/HIVE-25654 Project: Hive Issue Type: Bug Components: Statistics Reporter: Krisztian Kasa {code:java} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; create table t1(a int) stored as orc TBLPROPERTIES ('transactional'='true'); describe formatted t1; -- simulate rollback set hive.test.rollbacktxn=true; insert into t1(a) values (1),(2),(3); describe formatted t1; select count(1) from t1; {code} {code} POSTHOOK: query: describe formatted t1 ... numFiles1 numRows 3 rawDataSize 0 totalSize 632 transactional true ... POSTHOOK: query: select count(1) from t1 POSTHOOK: type: QUERY POSTHOOK: Input: default@t1 0 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25590) Able to create views referencing temporary tables and materialized views
Krisztian Kasa created HIVE-25590: - Summary: Able to create views referencing temporary tables and materialized views Key: HIVE-25590 URL: https://issues.apache.org/jira/browse/HIVE-25590 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 Creating views/materialized views referencing temporary tables and materialized views are disabled in Hive. However the verification algorithm fails to recognize temporary tables and materialized views in subqueries. The verification also fails when the view definition contains joins because CBO transforms join branches to subqueries. Example1: {code} create temporary table tmp1 (c1 string, c2 string); create view tmp1_view as select subq.c1 from (select c1, c2 from tmp1) subq; {code} Example2: {code} create table t1 (a int) stored as orc tblproperties ('transactional'='true'); create table t2 (a int) stored as orc tblproperties ('transactional'='true'); create materialized view mv1 as select a from t1 where a = 10; create materialized view mv2 as select t2.a from mv1 join t2 on (mv1.a = t2.a); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25574) Replace clob with varchar in JDO
Krisztian Kasa created HIVE-25574: - Summary: Replace clob with varchar in JDO Key: HIVE-25574 URL: https://issues.apache.org/jira/browse/HIVE-25574 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Krisztian Kasa Assignee: Krisztian Kasa Follow up of HIVE-21940. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25572) Exception while querying materialized view invalidation info
Krisztian Kasa created HIVE-25572: - Summary: Exception while querying materialized view invalidation info Key: HIVE-25572 URL: https://issues.apache.org/jira/browse/HIVE-25572 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code:java} 2021-09-29T00:33:02,971 WARN [main] txn.TxnHandler: Unable to retrieve materialization invalidation information: completed transaction components. java.sql.SQLSyntaxErrorException: Syntax error: Encountered "" at line 1, column 234. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.EmbedPreparedStatement.(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.EmbedPreparedStatement42.(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.jdbc.Driver42.newEmbedPreparedStatement(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source) ~[derby-10.14.1.0.jar:?] at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source) ~[derby-10.14.1.0.jar:?] at com.zaxxer.hikari.pool.ProxyConnection.prepareStatement(ProxyConnection.java:311) ~[HikariCP-2.6.1.jar:?] at com.zaxxer.hikari.pool.HikariProxyConnection.prepareStatement(HikariProxyConnection.java) ~[HikariCP-2.6.1.jar:?] at org.apache.hadoop.hive.metastore.tools.SQLGenerator.prepareStmtWithParameters(SQLGenerator.java:169) ~[classes/:?] at org.apache.hadoop.hive.metastore.txn.TxnHandler.executeBoolean(TxnHandler.java:2598) [classes/:?] at org.apache.hadoop.hive.metastore.txn.TxnHandler.getMaterializationInvalidationInfo(TxnHandler.java:2575) [classes/:?] at org.apache.hadoop.hive.metastore.txn.TestTxnHandler.testGetMaterializationInvalidationInfo(TestTxnHandler.java:1910) [test-classes/:?] at org.apache.hadoop.hive.metastore.txn.TestTxnHandler.testGetMaterializationInvalidationInfo(TestTxnHandler.java:1875) [test-classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) [junit-4.13.jar:4.13] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.13.jar:4.13] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) [junit-4.13.jar:4.13] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.13.jar:4.13] at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) [junit-4.13.jar:4.13] at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) [junit-4.13.jar:4.13] at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) [junit-4.13.jar:4.13] at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) [junit-4.13.jar:4.13] at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) [junit-4.13.jar:4.13] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) [junit-4.13.jar:4.13] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) [junit-4.13.jar:4.13] at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) [junit-4.13.jar:4.13] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) [junit-4.13.jar:4.13] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) [junit-4.13.jar:4.13] at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) [junit-4.13.jar:4.13] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) [junit-4.13.jar:4.13] at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) [junit-4.13.
[jira] [Created] (HIVE-25568) Estimate TopNKey operator statistics.
Krisztian Kasa created HIVE-25568: - Summary: Estimate TopNKey operator statistics. Key: HIVE-25568 URL: https://issues.apache.org/jira/browse/HIVE-25568 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Currently TopNKey operator has the same statistics as it's parent operator: {code} TableScan alias: src Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Top N Key Operator sort order: + keys: key (type: string) null sort order: z Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE top n: 5 {code} This operator filters out rows and this should be indicated in statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25546) Enable incremental rebuild of Materialized view with insert only source tables
Krisztian Kasa created HIVE-25546: - Summary: Enable incremental rebuild of Materialized view with insert only source tables Key: HIVE-25546 URL: https://issues.apache.org/jira/browse/HIVE-25546 Project: Hive Issue Type: Improvement Components: CBO, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only'); create materialized view mat1 stored as orc TBLPROPERTIES ('transactional'='true') as select a, b, c from t1 where a > 10; {code} Currently materialized view *mat1* can not be rebuilt incrementally because it has an insert only source table (t1). Such tables does not have ROW_ID.write_id which is required to identify newly inserted records since the last rebuild. HIVE-25406 adds the ability to query write_id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25512) Merge statement does not enforce check constraints
Krisztian Kasa created HIVE-25512: - Summary: Merge statement does not enforce check constraints Key: HIVE-25512 URL: https://issues.apache.org/jira/browse/HIVE-25512 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; CREATE TABLE table_check_merge( name string CHECK (length(name)<=20), age int, gpa double CHECK (gpa BETWEEN 0.0 AND 4.0) ) stored as orc TBLPROPERTIES ('transactional'='true'); CREATE TABLE table_source( name string, age int, gpa double); insert into table_source(name, age, gpa) values ('student1', 16, null), (null, 20, 4.0); insert into table_check_merge(name, age, gpa) values ('student1', 16, 2.0); merge into table_check_merge using (select age from table_source)source on source.age=table_check_merge.age when matched then update set gpa=6; {code} Merge statement tries to update gpa to 6 which is not between 0.0 and 4.0. However the update succeeds. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25475) TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable
Krisztian Kasa created HIVE-25475: - Summary: TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable Key: HIVE-25475 URL: https://issues.apache.org/jira/browse/HIVE-25475 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa http://ci.hive.apache.org/job/hive-flaky-check/389/ {code} 16:19:18 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 141.73 s <<< FAILURE! - in org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios 16:19:18 [ERROR] org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad Time elapsed: 122.979 s <<< ERROR! 16:19:18 org.apache.hadoop.hive.ql.metadata.HiveException 16:19:18at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:5032) 16:19:18at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3348) 16:19:18at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:429) 16:19:18at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 16:19:18at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 16:19:18at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) 16:19:18at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) 16:19:18at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245) 16:19:18at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108) 16:19:18at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348) 16:19:18at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204) 16:19:18at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153) 16:19:18at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148) 16:19:18at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164) 16:19:18at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230) 16:19:18at org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:235) 16:19:18at org.apache.hadoop.hive.ql.parse.WarehouseInstance.load(WarehouseInstance.java:309) 16:19:18at org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.dumpLoadVerify(TestStatsReplicationScenarios.java:359) 16:19:18at org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testStatsReplicationCommon(TestStatsReplicationScenarios.java:663) 16:19:18at org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad(TestStatsReplicationScenarios.java:688) 16:19:18at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 16:19:18at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 16:19:18at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 16:19:18at java.lang.reflect.Method.invoke(Method.java:498) 16:19:18at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) 16:19:18at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 16:19:18at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) 16:19:18at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 16:19:18at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 16:19:18at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 16:19:18at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) 16:19:18at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 16:19:18at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) 16:19:18at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 16:19:18at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) 16:19:18at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) 16:19:18at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 16:19:18at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 16:19:18at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 16:19:18at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 16:19:18at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 16:19:18at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 16:19:18at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 16:19:18at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 16:19:18at org.junit.runners.ParentRunner.run(ParentRunner.java:413) 1
[jira] [Created] (HIVE-25406) Fetch writeId from insert-only tables
Krisztian Kasa created HIVE-25406: - Summary: Fetch writeId from insert-only tables Key: HIVE-25406 URL: https://issues.apache.org/jira/browse/HIVE-25406 Project: Hive Issue Type: Improvement Components: ORC, Parquet, Reader, Vectorization Reporter: Krisztian Kasa Assignee: Krisztian Kasa When generating plan for incremental materialized view rebuild a filter operator is inserted on top of each source table scans. The predicates contain a filter for writeId since we want to get all the rows inserted/deleted from the source tables since the last rebuild only. WriteId is part of the ROW_ID virtual column and only available for fully-ACID ORC tables. The goal of this jira is to populate a writeId when fetching from insert-only transactional tables. {code:java} create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only'); ... SELECT t1.ROW__ID.writeId, a, b FROM t1; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25388) Fix TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
Krisztian Kasa created HIVE-25388: - Summary: Fix TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites Key: HIVE-25388 URL: https://issues.apache.org/jira/browse/HIVE-25388 Project: Hive Issue Type: Test Components: repl, Test Reporter: Krisztian Kasa http://ci.hive.apache.org/job/hive-flaky-check/339/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25369) Handle Sum0 when rebuilding materialized view incrementally
Krisztian Kasa created HIVE-25369: - Summary: Handle Sum0 when rebuilding materialized view incrementally Key: HIVE-25369 URL: https://issues.apache.org/jira/browse/HIVE-25369 Project: Hive Issue Type: Improvement Components: CBO, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa When rewriting MV insert overwrite plan to incremental rebuild plan a Sum0 aggregate function is used when aggregating count function subresults coming from the existing MV data and the aggregated newly inserted/deleted records since the last rebuild {code} create materialized view mat1 stored as orc TBLPROPERTIES ('transactional'='true') as select t1.a, count(*) from t1 {code} Insert overwrite plan: {code} HiveAggregate(group=[{0}], agg#0=[$SUM0($1)]) HiveUnion(all=[true]) HiveAggregate(group=[{0}], agg#0=[count()]) HiveProject($f0=[$0]) HiveFilter(condition=[<(2, $5.writeid)]) HiveTableScan(table=[[default, t1]], table:alias=[t1]) HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25353) Incremental rebuild of partitioned insert only MV in presence of delete operations
Krisztian Kasa created HIVE-25353: - Summary: Incremental rebuild of partitioned insert only MV in presence of delete operations Key: HIVE-25353 URL: https://issues.apache.org/jira/browse/HIVE-25353 Project: Hive Issue Type: Improvement Components: CBO, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25253) Incremental rewrite of partitioned insert only materialized views
Krisztian Kasa created HIVE-25253: - Summary: Incremental rewrite of partitioned insert only materialized views Key: HIVE-25253 URL: https://issues.apache.org/jira/browse/HIVE-25253 Project: Hive Issue Type: Improvement Components: CBO, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25240) Query Text based MaterializedView rewrite if subqueries
Krisztian Kasa created HIVE-25240: - Summary: Query Text based MaterializedView rewrite if subqueries Key: HIVE-25240 URL: https://issues.apache.org/jira/browse/HIVE-25240 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create materialized view mat1 as select col0 from t1 where col0 > 1; explain cbo select col0 from (select col0 from t1 where col0 > 1) sub where col0 = 10; {code} {code} HiveProject(col0=[CAST(10):INTEGER]) HiveFilter(condition=[=($0, 10)]) HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25220) Query with union fails CBO with OOM
Krisztian Kasa created HIVE-25220: - Summary: Query with union fails CBO with OOM Key: HIVE-25220 URL: https://issues.apache.org/jira/browse/HIVE-25220 Project: Hive Issue Type: Bug Components: CBO Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25166) Query with multiple count(distinct) fails
Krisztian Kasa created HIVE-25166: - Summary: Query with multiple count(distinct) fails Key: HIVE-25166 URL: https://issues.apache.org/jira/browse/HIVE-25166 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} select count(distinct 0), count(distinct null) from alltypes; {code} {code} org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not in GROUP BY key 'TOK_NULL' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(Parent
[jira] [Created] (HIVE-25109) CBO fails when updating table has constraints defined
Krisztian Kasa created HIVE-25109: - Summary: CBO fails when updating table has constraints defined Key: HIVE-25109 URL: https://issues.apache.org/jira/browse/HIVE-25109 Project: Hive Issue Type: Bug Components: CBO, Logical Optimizer Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create table acid_uami_n0(i int, de decimal(5,2) constraint nn1 not null enforced, vc varchar(128) constraint ch2 CHECK (de >= cast(i as decimal(5,2))) enforced) clustered by (i) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); -- update explain cbo update acid_uami_n0 set de = 893.14 where de = 103.00; {code} hive.log {code} 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result Schema didn't match Optimized Op Tree Schema at org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeUpdate(UpdateDeleteSemanticAnalyzer.java:63) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyze(UpdateDeleteSemanticAnalyzer.java:53) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) [hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) [hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.proce
[jira] [Created] (HIVE-25089) Move Materialized View rebuild code to AlterMaterializedViewRebuildAnalyzer
Krisztian Kasa created HIVE-25089: - Summary: Move Materialized View rebuild code to AlterMaterializedViewRebuildAnalyzer Key: HIVE-25089 URL: https://issues.apache.org/jira/browse/HIVE-25089 Project: Hive Issue Type: Task Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25071) Number of reducers limited to fixed 1 when updating/deleting
Krisztian Kasa created HIVE-25071: - Summary: Number of reducers limited to fixed 1 when updating/deleting Key: HIVE-25071 URL: https://issues.apache.org/jira/browse/HIVE-25071 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa When updating/deleting bucketed tables an extra ReduceSink operator is created to enforce bucketing. After HIVE-22538 number of reducers limited to fixed 1 in these RS operators. This can lead to performance degradation. Prior HIVE-22538 multiple reducers was available such cases. The reason for limiting the number of reducers is to ensure RowId ascending order in delete delta files produced by the update/delete statements. This is the plan of delete statement like: {code} DELETE FROM t1 WHERE a = 1; {code} {code} TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7] {code} RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of reducers were limited to bucket number in the table or hive.exec.reducers.max. However RS[5] does not provide any ordering so above plan may generate unsorted deleted deltas which leads to corrupted data reads. Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication and the resulting RS kept the ordering and enabled multiple reducers. It could do because ReduceSinkDeduplication was prepared for ACID writes. This was removed by HIVE-22538 to get a more generic ReduceSinkDeduplication. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25066) Show whether a materialized view supports incremental review or not
Krisztian Kasa created HIVE-25066: - Summary: Show whether a materialized view supports incremental review or not Key: HIVE-25066 URL: https://issues.apache.org/jira/browse/HIVE-25066 Project: Hive Issue Type: Improvement Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 Add information about whether a materialized view supports incremental rebuild or not in an additional column in {code:java} SHOW MATERIALIZED VIEWS {code} statement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25063) Enforce hive.default.nulls.last when enforce bucketing
Krisztian Kasa created HIVE-25063: - Summary: Enforce hive.default.nulls.last when enforce bucketing Key: HIVE-25063 URL: https://issues.apache.org/jira/browse/HIVE-25063 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa When creating ReduceSink operator for bucketing the sort key null sort order is hardcoded: {code} for (int sortOrder : sortOrders) { order.append(DirectionUtils.codeToSign(sortOrder)); nullOrder.append(sortOrder == DirectionUtils.ASCENDING_CODE ? 'a' : 'z'); } {code} It should depend on both the setting hive.default.nulls.last and the order direction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25012) Parsing table alias is failing if query has table properties specified
Krisztian Kasa created HIVE-25012: - Summary: Parsing table alias is failing if query has table properties specified Key: HIVE-25012 URL: https://issues.apache.org/jira/browse/HIVE-25012 Project: Hive Issue Type: Bug Components: CBO, Parser Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from t1('acid.fetch.deleted.rows'='true') join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a; {code} When creating Join RelNode the aliases are used to lookup left and right input RelNodes. Aliases are extracted from the AST subtree of the left and right inputs of the join AST node. In case of a table reference: {code} TOK_TABREF TOK_TABNAME t1 TOK_TABLEPROPERTIES TOK_TABLEPROPLIST TOK_TABLEPROPERTY 'acid.fetch.deleted.rows' 'true' {code} Prior HIVE-24854 queries mentioned above failed because existing solution was not expect TOK_TABLEPROPERTIES. The goal of this patch is to parse TOK_TABREF properly using existing solution also used in SemanticAnalyser.doPhase1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24993) AssertionError when referencing ROW__ID.writeId
Krisztian Kasa created HIVE-24993: - Summary: AssertionError when referencing ROW__ID.writeId Key: HIVE-24993 URL: https://issues.apache.org/jira/browse/HIVE-24993 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} SELECT t1.ROW__ID FROM t1 WHERE t1.ROW__ID.writeid > 1 {code} {code} java.lang.AssertionError at org.apache.hadoop.hive.ql.parse.UnparseTranslator.addTranslation(UnparseTranslator.java:123) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5680) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5570) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5530) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3385) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3706) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3717) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5281) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1839) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1785) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1546) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:563) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12582) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.j
[jira] [Created] (HIVE-24992) Incremental rebuild of MV having aggregate in presence of delete operation
Krisztian Kasa created HIVE-24992: - Summary: Incremental rebuild of MV having aggregate in presence of delete operation Key: HIVE-24992 URL: https://issues.apache.org/jira/browse/HIVE-24992 Project: Hive Issue Type: Improvement Components: Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Extension of HIVE-24854: handle cases when the Materialized view definition has aggregation like {code} CREATE MATERIALIZED VIEW cmv_mat_view_n5 DISABLE REWRITE TBLPROPERTIES ('transactional'='true') AS SELECT cmv_basetable_n5.a, cmv_basetable_2_n2.c, sum(cmv_basetable_2_n2.d) FROM cmv_basetable_n5 JOIN cmv_basetable_2_n2 ON (cmv_basetable_n5.a = cmv_basetable_2_n2.a) WHERE cmv_basetable_2_n2.c > 10.0 GROUP BY cmv_basetable_n5.a, cmv_basetable_2_n2.c; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24991) Enable fetching deleted rows in vectorized mode
Krisztian Kasa created HIVE-24991: - Summary: Enable fetching deleted rows in vectorized mode Key: HIVE-24991 URL: https://issues.apache.org/jira/browse/HIVE-24991 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa HIVE-24855 enables loading deleted rows from ORC tables when table property *acid.fetch.deleted.rows* is true. The goal of this jira is to enable this feature in vectorized orc batch reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24990) Support distinct in window aggregation in vectorized mode
Krisztian Kasa created HIVE-24990: - Summary: Support distinct in window aggregation in vectorized mode Key: HIVE-24990 URL: https://issues.apache.org/jira/browse/HIVE-24990 Project: Hive Issue Type: Improvement Components: UDF, Vectorization Reporter: Krisztian Kasa PTF operator can not be vectorized if query has windowing function with *distinct* because this version of the aggregate functions are not implemented yet. {code} SELECT sum(DISTINCT a) OVER (PARTITION BY b) FROM t1; {code} The only exception is *count*. List of functions has vectorized version but does not have vectorized distinct version {code} row_number rank dense_rank min max sum avg first_value last_value {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24935) Remove outdated check for correlated exists subqueries with full aggregate
Krisztian Kasa created HIVE-24935: - Summary: Remove outdated check for correlated exists subqueries with full aggregate Key: HIVE-24935 URL: https://issues.apache.org/jira/browse/HIVE-24935 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Assignee: Krisztian Kasa Since HIVE-24929 QBSubQuery.subqueryRestrictionsCheck is no longer called. Check for exists subqueries with full aggregate moved to QBSubQueryParseInfo.hasFullAggregate() and QBSubQueryParseInfo.getOperator() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24929) Allow correlated exists subqueries with windowing clause
Krisztian Kasa created HIVE-24929: - Summary: Allow correlated exists subqueries with windowing clause Key: HIVE-24929 URL: https://issues.apache.org/jira/browse/HIVE-24929 Project: Hive Issue Type: Improvement Components: Query Planning Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 Currently queries which has windowing clause with subqueries are not supported by Hive: Hive rewrites subqueries to joins and the rewritten plan would lead to incorrect results such cases. However this restriction can be lifted in case of Exists/Not exists subqueries since those cases we don not interested in the result of the window function call but the existence of any record. {code} select id, int_col from alltypesagg a where exists (select sum(int_col) over (partition by bool_col) from alltypestiny b where a.id = b.id); {code} {code} select id, int_col from alltypestiny t where not exists (select sum(int_col) over (partition by bool_col) from alltypesagg a where t.id = a.int_col); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24925) Query materialized view invalidation info can cause ORA-01795
Krisztian Kasa created HIVE-24925: - Summary: Query materialized view invalidation info can cause ORA-01795 Key: HIVE-24925 URL: https://issues.apache.org/jira/browse/HIVE-24925 Project: Hive Issue Type: Bug Components: Metastore Reporter: Krisztian Kasa Fix For: 4.0.0 Query materialized view invalidation info assemble direct sql query to pull update/delete completed transactions on source tables since the last rebuild of the materialized view. Invalid writeIds are also used to filter the result. These writeIds are passed using an *in* operator. Depend on the size of invalid writeId list the operands of the *in* operator or the overall query text can exceed limitations. Example: in case of Oracle backend db maximum number of expressions in a list is 1000. {code} SELECT "CTC_UPDATE_DELETE" FROM "COMPLETED_TXN_COMPONENTS" WHERE "CTC_UPDATE_DELETE" ='Y' AND ( ("CTC_DATABASE"=? AND "CTC_TABLE"=? AND ("CTC_WRITEID" > 1 OR "CTC_WRITEID" IN (, ... )) ) OR ("CTC_DATABASE"=? AND "CTC_TABLE"=? AND ("CTC_WRITEID" > 1) ) ) AND "CTC_TXNID" <= 16 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24908) Adding Respect/Ignore nulls as a UDAF parameter is ambiguous
Krisztian Kasa created HIVE-24908: - Summary: Adding Respect/Ignore nulls as a UDAF parameter is ambiguous Key: HIVE-24908 URL: https://issues.apache.org/jira/browse/HIVE-24908 Project: Hive Issue Type: Bug Components: UDF Reporter: Krisztian Kasa Assignee: Krisztian Kasa Both function calls translated to the same UDAF call: {code} SELECT lead(a, 2, true) ... SELECT lead(a, 2) IGNORE NULLS ... {code} IGNORE NULLS is passed as an extra constant boolean parameter to the UDAF https://github.com/apache/hive/blob/eed78dfdcb6dfc2de400397a60de12e6f62b96e2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L743 However the semantics of the two function calls has different semantics: * *lead(a, 2, true)* - 'true' is the default value: "The value of DEFAULT is returned as the result if there is no row corresponding to the OFFSET number of rows before R within P (for the lag function) or after R within P (for the lead function)" * *lead(a, 2) IGNORE NULLS* - For each row in the current window find the 2nd not-NULL value starting directly after the current row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24894) transform_acid is unstable
Krisztian Kasa created HIVE-24894: - Summary: transform_acid is unstable Key: HIVE-24894 URL: https://issues.apache.org/jira/browse/HIVE-24894 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa [http://ci.hive.apache.org/job/hive-flaky-check/217] {code} Client execution failed with error code = 2 running SELECT transform(*) USING 'transform_acid_grep.sh' AS (col string) FROM transform_acid fname=transform_acid.q {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24869) Implement Respect/Ignore Nulls in lag/lead
Krisztian Kasa created HIVE-24869: - Summary: Implement Respect/Ignore Nulls in lag/lead Key: HIVE-24869 URL: https://issues.apache.org/jira/browse/HIVE-24869 Project: Hive Issue Type: Improvement Components: UDF Reporter: Krisztian Kasa {code} ::= [ [ ] ] [ ] ::= LEAD | LAG ::= ::= ::= ::= RESPECT NULLS | IGNORE NULLS {code} Example: get the a column value from the previous and the next row or return 0 if there is no previous/next row corresponding to the current row. Respect/Ignore nulls control whether null values should be preserved or eliminated. {code} SELECT a, LAG(a, 1, 0) OVER (ORDER BY a) IGNORE NULLS, LEAD(a, 1, 0) OVER (ORDER BY a) RESPECT NULLS FROM ... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24868) Support specifying Respect/Ignore Nulls in function parameter list
Krisztian Kasa created HIVE-24868: - Summary: Support specifying Respect/Ignore Nulls in function parameter list Key: HIVE-24868 URL: https://issues.apache.org/jira/browse/HIVE-24868 Project: Hive Issue Type: Improvement Components: Parser Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} select last_value(b, ignore nulls) over(partition by a order by b) from t1; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value
Krisztian Kasa created HIVE-24865: - Summary: Implement Respect/Ignore Nulls in first/last_value Key: HIVE-24865 URL: https://issues.apache.org/jira/browse/HIVE-24865 Project: Hive Issue Type: New Feature Components: Parser, UDF Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code:java} ::= RESPECT NULLS | IGNORE NULLS ::= [ ] ::= FIRST_VALUE | LAST_VALUE {code} Example: {code:java} select last_value(b) ignore nulls over(partition by a order by b) from t1; {code} Existing non-standard implementation: {code:java} select last_value(b, true) over(partition by a order by b) from t1; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24863) Wrong property value in UDAF percentile_cont/disc description
Krisztian Kasa created HIVE-24863: - Summary: Wrong property value in UDAF percentile_cont/disc description Key: HIVE-24863 URL: https://issues.apache.org/jira/browse/HIVE-24863 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24859) TestZookeeperLockManager#testMetrics fails intermittently
Krisztian Kasa created HIVE-24859: - Summary: TestZookeeperLockManager#testMetrics fails intermittently Key: HIVE-24859 URL: https://issues.apache.org/jira/browse/HIVE-24859 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa http://ci.hive.apache.org/job/hive-flaky-check/198/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24855) Introduce virtual colum ROW__IS__DELETED
Krisztian Kasa created HIVE-24855: - Summary: Introduce virtual colum ROW__IS__DELETED Key: HIVE-24855 URL: https://issues.apache.org/jira/browse/HIVE-24855 Project: Hive Issue Type: New Feature Reporter: Krisztian Kasa Assignee: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations
Krisztian Kasa created HIVE-24854: - Summary: Incremental Materialized view refresh in presence of update/delete operations Key: HIVE-24854 URL: https://issues.apache.org/jira/browse/HIVE-24854 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Assignee: Krisztian Kasa Current implementation of incremental Materialized can not be used if any of the Materialized view source tables has update or delete operation since the last rebuild. In such cases a full rebuild should be performed. Steps to enable incremental rebuild: 1. Introduce a new virtual column to mark a row deleted 2. Execute the query in the view definition 2.a. Add filter to each table scan in order to pull only the rows from each source table which has a higher writeId than the writeId of the last rebuild - this is already implemented by current incremental rebuild 2.b Add row is deleted virtual column to each table scan. In join nodes if any of the branches has a deleted row the result row is also deleted. We should distinguish two type of view definition queries: with and without Aggregate. 3.a No aggregate path: Rewrite the plan of the full rebuild to a multi insert statement with two insert branches. One branch to insert new rows into the materialized view table and the second one for insert deleted rows to the materialized view delete delta. 3.b Aggregate path: TBD -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction
Krisztian Kasa created HIVE-24840: - Summary: Materialized View incremental rebuild produces wrong result set after compaction Key: HIVE-24840 URL: https://issues.apache.org/jira/browse/HIVE-24840 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES ('transactional'='true'); insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, NULL); create materialized view mat1 stored as orc TBLPROPERTIES ('transactional'='true') as select a,b,c from t1 where a > 0 or a is null; delete from t1 where a = 1; alter table t1 compact 'major'; -- Wait until compaction finished. alter materialized view mat1 rebuild; {code} Expected result of query {code} select * from mat1; {code} {code} 2 two 2 NULL NULL NULL {code} but if incremental rebuild is enabled the result is {code} 1 one 1 2 two 2 NULL NULL NULL {code} Cause: Incremental rebuild queries whether the source tables of a materialized view has delete or update transaction since the last rebuild from metastore from COMPLETED_TXN_COMPONENTS table. However when a major compaction is performed on the source tables the records related to these tables are deleted from COMPLETED_TXN_COMPONENTS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24822) Materialized View rebuild loses materializationTime property value
Krisztian Kasa created HIVE-24822: - Summary: Materialized View rebuild loses materializationTime property value Key: HIVE-24822 URL: https://issues.apache.org/jira/browse/HIVE-24822 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa Materialized View rebuild like {code} alter materialized view mat1 rebuild; {code} updates the CreationMetadata of a org.apache.hadoop.hive.ql.metadata.Table object of the materialized view but it does not copy the materializationTime property value from the original CreationMetadata object and updates the entry in the MaterializedViewCache: {code} } else if (desc.isUpdateCreationMetadata()) { // We need to update the status of the creation signature Table mvTable = context.getDb().getTable(desc.getName()); CreationMetadata cm = new CreationMetadata(MetaStoreUtils.getDefaultCatalog(context.getConf()), mvTable.getDbName(), mvTable.getTableName(), ImmutableSet.copyOf(mvTable.getCreationMetadata().getTablesUsed())); cm.setValidTxnList(context.getConf().get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY)); context.getDb().updateCreationMetadata(mvTable.getDbName(), mvTable.getTableName(), cm); mvTable.setCreationMetadata(cm); HiveMaterializedViewsRegistry.get().createMaterializedView(context.getDb().getConf(), mvTable); } {code} Later when loading Materializetions using {code} Hive.getValidMaterializedViews(List materializedViewTables ...) {code} the materialization stored in the cache and in the metastore will be not the same because of the lost materializationTime. Cache tried to be refreshed {code} HiveMaterializedViewsRegistry.get().refreshMaterializedView(conf, null, materializedViewTable); {code} by passing null as oldMaterializedViewTable which leads to NullPointerException and CBO failure because the registry expect a non null oldMaterializedViewTable value: It may drops the old MV in Metastore and also tries to update the cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24820) MaterializeViewCache enables adding multiple entries of the same Materialization instance
Krisztian Kasa created HIVE-24820: - Summary: MaterializeViewCache enables adding multiple entries of the same Materialization instance Key: HIVE-24820 URL: https://issues.apache.org/jira/browse/HIVE-24820 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24775) Incorrect null handling when rebuilding Materialized view incrementally
Krisztian Kasa created HIVE-24775: - Summary: Incorrect null handling when rebuilding Materialized view incrementally Key: HIVE-24775 URL: https://issues.apache.org/jira/browse/HIVE-24775 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} CREATE TABLE t1 (a int, b varchar(256), c decimal(10,2), d int) STORED AS orc TBLPROPERTIES ('transactional'='true'); INSERT INTO t1 VALUES (NULL, 'null_value', 100.77, 7), (1, 'calvin', 978.76, 3), (1, 'charlie', 9.8, 1); CREATE MATERIALIZED VIEW mat1 TBLPROPERTIES ('transactional'='true') AS SELECT a, b, sum(d) FROM t1 WHERE c > 10.0 GROUP BY a, b; INSERT INTO t1 VALUES (NULL, 'null_value', 100.88, 8), (1, 'charlie', 15.8, 1); ALTER MATERIALIZED VIEW mat1 REBUILD; SELECT * FROM mat1 ORDER BY a, b; {code} View contains: {code} 1 calvin 3 1 charlie 1 NULLnull_value 8 NULLnull_value 7 {code} but it should contain: {code} 1 calvin 3 1 charlie 1 NULLnull_value 15 {code} Rows with aggregate key columns having NULL values are not aggregated because incremental materialized view rebuild plan is altered by [applyPreJoinOrderingTransforms|https://github.com/apache/hive/blob/76732ad27e139fbdef25b820a07cf35934771083/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1975]: IS NOT NULL filter added for each of these columns on top of the view scan when joining with the branch pulls the rows inserted after the last rebuild: {code} HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(AND(IS NULL($0), IS NULL($1)), $5, +($5, $2))]) HiveFilter(condition=[OR(AND(IS NULL($0), IS NULL($1)), AND(=($0, $3), =($1, $4)))]) HiveJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[right], algorithm=[none], cost=[not available]) HiveProject(a=[$0], b=[$1], _c2=[$2]) HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))]) HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1]) HiveProject(a=[$0], b=[$1], $f2=[$2]) HiveAggregate(group=[{0, 1}], agg#0=[sum($3)]) HiveFilter(condition=[AND(<(1, $6.writeid), >($2, 10))]) HiveTableScan(table=[[default, t1]], table:alias=[t1]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24763) Incremental rebuild of Materialized view fails
Krisztian Kasa created HIVE-24763: - Summary: Incremental rebuild of Materialized view fails Key: HIVE-24763 URL: https://issues.apache.org/jira/browse/HIVE-24763 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa Exception is thrown when Materialized view definition contains aggregate operator with only one key: {code} CREATE MATERIALIZED VIEW cmv_mat_view_n5 TBLPROPERTIES ('transactional'='true') AS SELECT cmv_basetable_n5.a, sum(cmv_basetable_2_n2.d) FROM cmv_basetable_n5 JOIN cmv_basetable_2_n2 ON (cmv_basetable_n5.a = cmv_basetable_2_n2.a) WHERE cmv_basetable_2_n2.c > 10.0 GROUP BY cmv_basetable_n5.a; ... ALTER MATERIALIZED VIEW cmv_mat_view_n5 REBUILD; {code} {code} java.lang.AssertionError: wrong operand count 1 for AND at org.apache.calcite.util.Litmus$1.fail(Litmus.java:31) at org.apache.calcite.sql.SqlBinaryOperator.validRexOperands(SqlBinaryOperator.java:219) at org.apache.calcite.rex.RexCall.(RexCall.java:86) at org.apache.calcite.rex.RexBuilder.makeCall(RexBuilder.java:251) at org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveAggregateIncrementalRewritingRule.onMatch(HiveAggregateIncrementalRewritingRule.java:124) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319) at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419) at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256) at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215) at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2715) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2681) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyMaterializedViewRewriting(CalcitePlanner.java:2318) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1934) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1810) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1571) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:562) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12538) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456) at org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:89) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:315) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:315) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) at org.apache.hadoop.hi
[jira] [Created] (HIVE-24664) Support column aliases in Values clause
Krisztian Kasa created HIVE-24664: - Summary: Support column aliases in Values clause Key: HIVE-24664 URL: https://issues.apache.org/jira/browse/HIVE-24664 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Assignee: Krisztian Kasa Enable explicitly specify column aliases in the first row of Values clause. If not all the columns has alias specified generate one. {code:java} values(1, 2 b, 3 c),(4, 5, 6); {code} {code:java} _col1 b c 1 2 3 4 5 6 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24644) QueryResultCache parses the query twice
Krisztian Kasa created HIVE-24644: - Summary: QueryResultCache parses the query twice Key: HIVE-24644 URL: https://issues.apache.org/jira/browse/HIVE-24644 Project: Hive Issue Type: Improvement Components: HiveServer2, Parser Reporter: Krisztian Kasa Assignee: Krisztian Kasa Query result cache lookup results by query text which has fully resolved table references. In order to generate this query text currently implementation * transforms the AST tree back to String * parses the String generated in above step * traverse the new AST and replaces the table references to the fully qualified ones * transforms the new AST tree back to String -> this will be the cache key -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24635) Support Values clause as operand of set operation
Krisztian Kasa created HIVE-24635: - Summary: Support Values clause as operand of set operation Key: HIVE-24635 URL: https://issues.apache.org/jira/browse/HIVE-24635 Project: Hive Issue Type: Improvement Components: Parser Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} VALUES (1,2),(3,4) UNION ALL VALUES (1,2),(7,8); {code} {code} 1 2 3 4 1 2 7 8 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24633) Support CTE with column labels
Krisztian Kasa created HIVE-24633: - Summary: Support CTE with column labels Key: HIVE-24633 URL: https://issues.apache.org/jira/browse/HIVE-24633 Project: Hive Issue Type: Improvement Components: Parser Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} with cte1(a, b) as (select int_col x, bigint_col y from t1) select a, b from cte1{code} {code} a b 1 2 3 4 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24613) Support Values clause without Insert
Krisztian Kasa created HIVE-24613: - Summary: Support Values clause without Insert Key: HIVE-24613 URL: https://issues.apache.org/jira/browse/HIVE-24613 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Assignee: Krisztian Kasa Standalone: {code} VALUES(1,2,3),(4,5,6); {code} {code} 1 2 3 4 5 6 {code} In subquery: {code} SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO; {code} {code} 1 2 3 4 5 6 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24599) Add support vectorized two parameter trim functions
Krisztian Kasa created HIVE-24599: - Summary: Add support vectorized two parameter trim functions Key: HIVE-24599 URL: https://issues.apache.org/jira/browse/HIVE-24599 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa HIVE-24565 introduces a version of trim/ltrim/rtrim functions when the characters to trim can be specified as a second parameter of the function. Two parameter version of these functions has some vectorized scenarios: * source: COLUMN - trim characters: SCALAR - this is supported by HIVE-24565 {code} SELECT trim(col0, 'a'); {code} * source: COLUMN - trim characters: COLUMN {code} SELECT trim(col0, col1); {code} * source: SCALAR - trim characters: COLUMN {code} SELECT trim('string to trim', col0); {code} The scope of this jira is the support of the last two scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24598) Trim function should return null if any of its parameter is null
Krisztian Kasa created HIVE-24598: - Summary: Trim function should return null if any of its parameter is null Key: HIVE-24598 URL: https://issues.apache.org/jira/browse/HIVE-24598 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Hive throws exception when null is passed as a parameter of trim/rtrim/ltrim However null should be returned. From SQL11 standard: a) Let S be the value of the . b) If is specified, then let SC be the value of ; otherwise, let SC be . c) If at least one of S and SC is the null value, then the result of the is the null value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24565) Implement standard trim function
Krisztian Kasa created HIVE-24565: - Summary: Implement standard trim function Key: HIVE-24565 URL: https://issues.apache.org/jira/browse/HIVE-24565 Project: Hive Issue Type: Improvement Components: Parser, UDF Reporter: Krisztian Kasa Assignee: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities
Krisztian Kasa created HIVE-24564: - Summary: Extend PPD filter transitivity to be able to discover new opportunities Key: HIVE-24564 URL: https://issues.apache.org/jira/browse/HIVE-24564 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Krisztian Kasa Assignee: Krisztian Kasa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24479) Introduce setting to set lower bound of hash aggregation reduction.
Krisztian Kasa created HIVE-24479: - Summary: Introduce setting to set lower bound of hash aggregation reduction. Key: HIVE-24479 URL: https://issues.apache.org/jira/browse/HIVE-24479 Project: Hive Issue Type: Improvement Components: Physical Optimizer Affects Versions: 4.0.0 Reporter: Krisztian Kasa Assignee: Krisztian Kasa Fix For: 4.0.0 * Default setting of hash group by min reduction % is 0.99. * During compilation, we check its effectiveness and adjust it accordingly in {{SetHashGroupByMinReduction}}: {code} float defaultMinReductionHashAggrFactor = desc.getMinReductionHashAggr(); float minReductionHashAggrFactor = 1f - ((float) ndvProduct / numRows); if (minReductionHashAggrFactor < defaultMinReductionHashAggrFactor) { desc.setMinReductionHashAggr(minReductionHashAggrFactor); } {code} For certain queries, this computation turns out to be "0". This forces operator to skip HashAggregates completely and always ends up choosing streaming mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24446) Materialized View plan remove explicit cast from query
Krisztian Kasa created HIVE-24446: - Summary: Materialized View plan remove explicit cast from query Key: HIVE-24446 URL: https://issues.apache.org/jira/browse/HIVE-24446 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES ('transactional'='true') as select total_views `total_views`, sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile, program from tv_view_data; {code} {code} LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1]) HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], table:alias=[mv_tv_view_data_av1]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)