from:"Krisztian Kasa \(JIRA\)"

[jira] [Created] (HIVE-27187) Incremental rebuild of materialized view stored by iceberg

2023-03-28 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-27187:
-

 Summary: Incremental rebuild of materialized view stored by iceberg
 Key: HIVE-27187
 URL: https://issues.apache.org/jira/browse/HIVE-27187
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Currently incremental rebuild of materialized view stored by iceberg which 
definition query contains aggregate operator is transformed to an insert 
overwrite statement which contains a union operator if the source tables 
contains insert operations only. One branch of the union scans the view the 
other produces the delta.

This can be improved further: transform the statement to a multi insert 
statement representing a merge statement to insert new aggregations and update 
existing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27101) Support incremental materialized view rebuild when Iceberg source tables have insert operation only.

2023-02-23 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-27101:
-

 Summary: Support incremental materialized view rebuild when 
Iceberg source tables have insert operation only.
 Key: HIVE-27101
 URL: https://issues.apache.org/jira/browse/HIVE-27101
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27073) Apply SerDe properties when creating materialized view

2023-02-13 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-27073:
-

 Summary: Apply SerDe properties when creating materialized view
 Key: HIVE-27073
 URL: https://issues.apache.org/jira/browse/HIVE-27073
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table tbl_ice(a int, b string, c int) stored by iceberg stored as orc 
tblproperties ('format-version'='1');

create materialized view mat1 stored by iceberg stored as orc tblproperties 
('format-version'='1') as
select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52;
{code}

Materialized view {{mat1}} should use {{ORC}} file format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26967) Deadlock when enabling/disabling Materialized view stored by Iceberg

2023-01-19 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26967:
-

 Summary: Deadlock when enabling/disabling Materialized view stored 
by Iceberg
 Key: HIVE-26967
 URL: https://issues.apache.org/jira/browse/HIVE-26967
 Project: Hive
  Issue Type: Bug
  Components: Iceberg integration
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table all100k(
t int,
si int,
i int,
b bigint,
f float,
d double,
s string,
dc decimal(38,18),
bo boolean,
v string,
c string,
ts timestamp,
dt date)
partitioned by spec (BUCKET(16, t))
stored by iceberg
stored as parquet;

create materialized view mv_rewrite stored by iceberg as select t, si from 
all100k where t>115;

explain select si,t from all100k where t>116 and t<120;

alter materialized view mv_rewrite disable rewrite;
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26922) Deadlock when rebuilding Materialized view stored by Iceberg

2023-01-10 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26922:
-

 Summary: Deadlock when rebuilding Materialized view stored by 
Iceberg
 Key: HIVE-26922
 URL: https://issues.apache.org/jira/browse/HIVE-26922
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table tbl_ice(a int, b string, c int) stored by iceberg stored as orc 
tblproperties ('format-version'='1');
insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
(4, 'four', 53), (5, 'five', 54);

create materialized view mat1 stored by iceberg stored as orc tblproperties 
('format-version'='1') as
select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52;

insert into tbl_ice values (10, 'ten', 60);

alter materialized view mat1 rebuild;
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26864) Incremental rebuild of non-transaction materialized view fails

2022-12-16 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26864:
-

 Summary: Incremental rebuild of non-transaction materialized view 
fails
 Key: HIVE-26864
 URL: https://issues.apache.org/jira/browse/HIVE-26864
 Project: Hive
  Issue Type: Bug
  Components: CBO, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table t1 (a int, b int) stored as orc TBLPROPERTIES 
('transactional'='true');

insert into t1 values (1,1), (2,1), (3,3);

create materialized view mv1 as
select a, b from t1 where b = 1;

delete from t1 where a = 2;

explain
alter materialized view mv1 rebuild;
{code}

{code}
org.apache.hadoop.hive.ql.parse.SemanticException: Attempt to do update or 
delete on table mv1 that is not transactional
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2400)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2176)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2168)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:630)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12790)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:464)
at 
org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:132)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:474)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:439)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:433)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:727)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:697)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at

[jira] [Created] (HIVE-26817) Set column names in result schema when plan has Values root

2022-12-07 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26817:
-

 Summary: Set column names in result schema when plan has Values 
root
 Key: HIVE-26817
 URL: https://issues.apache.org/jira/browse/HIVE-26817
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


The query
{code}
select b1, count(a1) count1 from (select a1, b1 from t1) s where 1=0 group by 
b1;
{code}
should have a result with column names
{code}
b1  count1
{code}
but it is
{code}
$f0 $f1
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26795) Iceberg integration: clean up temporary files in case of statement cancel

2022-11-30 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26795:
-

 Summary: Iceberg integration: clean up temporary files in case of 
statement cancel
 Key: HIVE-26795
 URL: https://issues.apache.org/jira/browse/HIVE-26795
 Project: Hive
  Issue Type: Bug
  Components: Iceberg integration
Reporter: Krisztian Kasa


Iceberg write operations are performed in the Tez task but the Iceberg commit 
of these writes are happening in the move task. To inform the MoveTask what 
writes has to be committed temp files are created with the path of the actual 
datafiles.

Also in case of ctas statements the table is created by the ddl task is 
serialized into a temp file to be available for the Tez task which does the 
writes into the newly created table.

Normally the cleanup of these temp files are happening in the move task but 
this task is not executed in case of cancel or an error in tez task.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26771) Use DDLTask to created Iceberg table when running ctas statement

2022-11-23 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26771:
-

 Summary: Use DDLTask to created Iceberg table when running ctas 
statement
 Key: HIVE-26771
 URL: https://issues.apache.org/jira/browse/HIVE-26771
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When Iceberg table is created via ctas statement the table is created in 
HiveIcebergSerDe and no DDL task is executed.
Negative effects of this workflow:
* Default privileges of the new table are not granted.
* The new Iceberg table can be seen by other transactions at compile time of 
ctas.
* Table creation and table properties are not shown in explain ctas output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26747) Remove implementor from HiveRelNode

2022-11-16 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26747:
-

 Summary: Remove implementor from HiveRelNode
 Key: HIVE-26747
 URL: https://issues.apache.org/jira/browse/HIVE-26747
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Calcite's VolcanoPlanner [1] relies on calling convention [2]. In Hive this is 
represented by the 
[HiveRelNode|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveRelNode.java]
 interface's {{CONVENTION}} field.

This interface has to be implemented by all Hive operators to have the Hive 
calling convention behavior. The interface also defines the
{code:java}
 void implement(Implementor implementor);
{code}
method but none of the operators gives an implementation and the method is 
never called.

[1] 
[https://15721.courses.cs.cmu.edu/spring2017/papers/14-optimizer1/graefe-icde1993.pdf]
[2] [https://arxiv.org/pdf/1802.10233.pdf] (Section 4, traits)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26628) Iceberg table is created when running explain ctas command

2022-10-13 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26628:
-

 Summary: Iceberg table is created when running explain ctas command
 Key: HIVE-26628
 URL: https://issues.apache.org/jira/browse/HIVE-26628
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Reporter: Krisztian Kasa
 Fix For: 4.0.0


{code}
create table source(a int, b string, c int);

explain
create table tbl_ice stored by iceberg stored as orc tblproperties 
('format-version'='2') as
select a, b, c from source;

create table tbl_ice stored by iceberg stored as orc tblproperties 
('format-version'='2') as
select a, b, c from source;
{code}
{code}
 org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hive.ql.parse.SemanticException: Table already exists: 
default.tbl_ice
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:13963)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12528)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12693)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:474)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:439)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:433)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:200)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:126)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:727)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:697)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestIcebergLlapLocalCliDriver.testCliDriver(TestIcebergLlapLocalCliDriver.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.

[jira] [Created] (HIVE-26618) Add setting to turn on/off removing sections of a query plan known never produces rows

2022-10-11 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26618:
-

 Summary: Add setting to turn on/off removing sections of a query 
plan known never produces rows
 Key: HIVE-26618
 URL: https://issues.apache.org/jira/browse/HIVE-26618
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


HIVE-26524 introduced an optimization to remove sections of query plan known 
never produces rows.

Add a setting into hive conf to turn on/off this optimization.

When the optimization is turned off restore the legacy behavior:
* represent empty result operator with {{HiveSortLimit}} 0
* disable {{HiveRemoveEmptySingleRules}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26578) Enable Iceberg storage format for materialized views

2022-09-30 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26578:
-

 Summary: Enable Iceberg storage format for materialized views
 Key: HIVE-26578
 URL: https://issues.apache.org/jira/browse/HIVE-26578
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


{code}
create materialized view mat1 stored by iceberg stored as orc tblproperties 
('format-version'='1') as
select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52;
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26524) Use Calcite to remove sections of a query plan known never produces rows

2022-09-08 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26524:
-

 Summary: Use Calcite to remove sections of a query plan known 
never produces rows
 Key: HIVE-26524
 URL: https://issues.apache.org/jira/browse/HIVE-26524
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Calcite has a set of rules to remove sections of a query plan known never 
produces any rows. In some cases the whole plan can be removed. Such plans are 
represented with a single {{Values}} operators with no tuples.  ex.:
{code}
select y + 1 from (select a1 y, b1 z from t1 where b1 > 10) q WHERE 1=0
{code}
{code}
HiveValues(tuples=[[]])
{code}

Other cases when plan has outer join or set operators some branches can be 
replaced with empty values moving forward the join/set operator can be removed
{code}
select a2, b2 from t2 where 1=0
union
select a1, b1 from t1
{code}

{code}
HiveAggregate(group=[{0, 1}])
  HiveTableScan(table=[[default, t1]], table:alias=[t1])
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26498) Implement MV maintenance with Iceberg sources using full rebuild

2022-08-25 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26498:
-

 Summary: Implement MV maintenance with Iceberg sources using full 
rebuild
 Key: HIVE-26498
 URL: https://issues.apache.org/jira/browse/HIVE-26498
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create external table tbl_ice(a int, b string, c int) stored by iceberg stored 
as orc tblproperties ('format-version'='2');

insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
(4, 'four', 53), (5, 'five', 54);

create materialized view mat1 as
select b, c from tbl_ice where c > 52;

insert into tbl_ice values (111, 'one', 55), (333, 'two', 56);

explain cbo
alter materialized view mat1 rebuild;

alter materialized view mat1 rebuild;
{code}
MV full rebuild plan
{code}
CBO PLAN:
HiveProject(b=[$1], c=[$2])
  HiveFilter(condition=[>($2, 52)])
HiveTableScan(table=[[default, tbl_ice]], table:alias=[tbl_ice])
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26497) Support materialized views on Iceberg source tables

2022-08-25 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26497:
-

 Summary: Support materialized views on Iceberg source tables
 Key: HIVE-26497
 URL: https://issues.apache.org/jira/browse/HIVE-26497
 Project: Hive
  Issue Type: New Feature
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create external table tbl_ice(a int, b string, c int) stored by iceberg stored 
as orc tblproperties ('format-version'='2');

insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
(4, 'four', 53), (5, 'five', 54);

create materialized view mat1 as
select b, c from tbl_ice where c > 52;
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26452) NPE when converting join to mapjoin and join column referenced more than once

2022-08-05 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26452:
-

 Summary: NPE when converting join to mapjoin and join column 
referenced more than once
 Key: HIVE-26452
 URL: https://issues.apache.org/jira/browse/HIVE-26452
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
explain
select count(*)
from LU_CUSTOMER pa11
  joinORDER_FACTa15
  on (pa11.CUSTOMER_ID = a15.CUSTOMER_ID)
  joinLU_CUSTOMERa16
  on (a15.CUSTOMER_ID = a16.CUSTOMER_ID and pa11.CUSTOMER_ID = 
a16.CUSTOMER_ID);
{code}
{{a16.CUSTOMER_ID}} is referenced more than once in the join condition.

Hive generates Reduce sink operators for the join's children and one of the RS 
row schema contains only one instance of the join keys (customer_id).
{code}
RS[13]
result = {HashMap@16092}  size = 2
 "KEY.reducesinkkey0" -> {ExprNodeColumnDesc@16083} "Column[_col0]"
 "KEY.reducesinkkey1" -> {ExprNodeColumnDesc@16102} "Column[_col0]" 
   
 
 
result = {RowSchema@16104} "(KEY.reducesinkkey0: int|{$hdt$_2}customer_id)"
 signature = {ArrayList@16110}  size = 1
  0 = {ColumnInfo@16087} "KEY.reducesinkkey0: int"
{code}

{{KEY.reducesinkkey1}} is missing from the schema.

When converting the join to mapjoin the converter algorithm fails looking up 
both join key column instances.

https://github.com/apache/hive/blob/2aaba3c79e740ef27fc263b5a8ff33ad679c5a12/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java#L538



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26417) Iceberg integration: disable update and merge iceberg table when split update is off

2022-07-20 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26417:
-

 Summary: Iceberg integration: disable update and merge iceberg 
table when split update is off
 Key: HIVE-26417
 URL: https://issues.apache.org/jira/browse/HIVE-26417
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Iceberg table update and merge is implemented using split update early by 
HIVE-26319 and HIVE-26385.

Without split update early deleted records has to be buffered in memory  when 
updating iceberg tables. With split update early deleted records are processed 
by a separate reducer and no buffering is required. The ReduceSink operator 
also sorts the records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26385) Iceberg integration: Implement merge into iceberg table

2022-07-12 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26385:
-

 Summary: Iceberg integration: Implement merge into iceberg table
 Key: HIVE-26385
 URL: https://issues.apache.org/jira/browse/HIVE-26385
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


{code}
create external table target_ice(a int, b string, c int) partitioned by spec 
(bucket(16, a), truncate(3, b)) stored by iceberg stored as orc tblproperties 
('format-version'='2');
create table source(a int, b string, c int);

...

merge into target_ice as t using source src ON t.a = src.a
when matched and t.a > 100 THEN DELETE
when matched then update set b = 'Merged', c = t.c + 10
when not matched then insert values (src.a, src.b, src.c);

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26375) Invalid materialized view after rebuild if source table was compacted

2022-07-06 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26375:
-

 Summary: Invalid materialized view after rebuild if source table 
was compacted
 Key: HIVE-26375
 URL: https://issues.apache.org/jira/browse/HIVE-26375
 Project: Hive
  Issue Type: Bug
  Components: Materialized views, Transactions
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


After HIVE-25656 MV state depends on the number of rows deleted/updated in the 
source tables of the view. However if one of the source tables are major 
compacted the delete delta files are no longer available and reproducing the 
rows should be deleted from the MV is no longer possible.

{code}
create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
('transactional'='true');
insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
NULL);
create materialized view mv1 stored as orc TBLPROPERTIES 
('transactional'='true') as select a,b,c from t1 where a > 0 or a is null;
update t1 set b = 'Changed' where a = 1;
alter table t1 compact 'major';
alter materialized view t1 rebuild;
select * from mv1;
{code}
Select should result 
{code}
  "1\tChanged\t1.1",
  "2\ttwo\t2.2",
  "NULL\tNULL\tNULL"
{code}
but was
{code}
  "1\tone\t1.1",  
  "2\ttwo\t2.2",
  "NULL\tNULL\tNULL",
  "1\tChanged\t1.1"
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26372) QTests depend on mysql docker image are fail

2022-07-06 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26372:
-

 Summary: QTests depend on mysql docker image are fail
 Key: HIVE-26372
 URL: https://issues.apache.org/jira/browse/HIVE-26372
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


When QTest framework launches a mysql docker container checks whether the mysql 
instance is ready for receiving connections. It search for the text 
{code}
ready for connections
{code}
 in the stderr:
https://github.com/apache/hive/blob/2f619988f69a569bfcdc2bef5d35a9ecabb2ef13/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/MySQLExternalDB.java#L56

Seems that this behavior is changed at MySql side and QTest framework enters 
into a infinite loo then times out after 300 sec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26371) Constant propagation does not evaluate constraint expressions at merge when CBO is enabled

2022-07-06 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26371:
-

 Summary: Constant propagation does not evaluate constraint 
expressions at merge when CBO is enabled
 Key: HIVE-26371
 URL: https://issues.apache.org/jira/browse/HIVE-26371
 Project: Hive
  Issue Type: Bug
  Components: CBO, Logical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


{code}
CREATE TABLE t_target(
name string CHECK (length(name)<=20),
age int,
gpa double CHECK (gpa BETWEEN 0.0 AND 4.0))
stored as orc TBLPROPERTIES ('transactional'='true');

CREATE TABLE t_source(
name string,
age int,
gpa double);

insert into t_source(name, age, gpa) values ('student1', 16, null);

insert into t_target(name, age, gpa) values ('student1', 16, 2.0);

merge into t_target using t_source source on source.age=t_target.age when 
matched then update set gpa=6;
{code}

Currently CBO can not handle constraint checks when merging so the filter 
operator with the {{enforce_constraint}} call is added to the Hive operator 
plan after CBO is succeeded and {{ConstantPropagate}} optimization is called 
only from TezCompiler with {{ConstantPropagateOption.SHORTCUT}}. 
With this option {{ConstantPropagate}} does not evaluate deterministic 
functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26370) Check stats are up-to-date when getting materialized view state

2022-07-05 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26370:
-

 Summary: Check stats are up-to-date when getting materialized view 
state
 Key: HIVE-26370
 URL: https://issues.apache.org/jira/browse/HIVE-26370
 Project: Hive
  Issue Type: Bug
  Components: Materialized views, Statistics, Transactions
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Since HIVE-25656 materialized view state depends on the number of affected rows 
of transactions made on the source tables.

If 
{code}
hive.stats.autogather=false;
{code}
the number of affected rows of transactions are not collected and it can cause 
invalid stats of source tables which leads to false indications about MV status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26340) Vectorized PTF operator fails if query has upper case window function

2022-06-17 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26340:
-

 Summary: Vectorized PTF operator fails if query has upper case 
window function
 Key: HIVE-26340
 URL: https://issues.apache.org/jira/browse/HIVE-26340
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


{code}
SELECT ROW_NUMBER() OVER(order by age) AS rn FROM studentnull100;
{code}
{code}
2022-06-16T14:18:57,728 ERROR [pool-4-thread-1] jdbc.TestDriver: Error while 
compiling statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
7, vertexId=vertex_1655217967697_0062_1_01, diagnostics=[Task failed, 
taskId=task_1655217967697_0062_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1655217967697_0062_1_01_00_0:java.lang.RuntimeException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluator(VectorPTFDesc.java:165)
at 
org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluators(VectorPTFDesc.java:381)
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.initializeOp(VectorPTFOperator.java:317)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:571)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:523)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:211)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:268)
... 16 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-13 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26319:
-

 Summary: Iceberg integration: Perform update split early
 Key: HIVE-26319
 URL: https://issues.apache.org/jira/browse/HIVE-26319
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Extend update split early to iceberg tables like in HIVE-21160 for native acid 
tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26274) No vectorization if query has upper case window function

2022-05-31 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26274:
-

 Summary: No vectorization if query has upper case window function
 Key: HIVE-26274
 URL: https://issues.apache.org/jira/browse/HIVE-26274
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
CREATE TABLE t1 (a int, b int);

EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1;
{code}
{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  Vertices:
Map 1 
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: true
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2 
Execution mode: llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
notVectorizedReason: PTF operator: ROW_NUMBER not in supported 
functions [avg, count, dense_rank, first_value, lag, last_value, lead, max, 
min, rank, row_number, sum]
vectorized: false

  Stage: Stage-0
Fetch Operator
{code}
{code}
notVectorizedReason: PTF operator: ROW_NUMBER not in supported 
functions [avg, count, dense_rank, first_value, lag, last_value, lead, max, 
min, rank, row_number, sum]
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26264:
-

 Summary: Iceberg integration: Fetch virtual columns on demand
 Key: HIVE-26264
 URL: https://issues.apache.org/jira/browse/HIVE-26264
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Currently virtual columns are fetched from iceberg tables if the statement 
being executed is a delete or update statement and the setting is global. It 
means it affects all tables affected by the statement. Also the read and write 
schema depends on the operation setting.

Some statements fails due to invalid schema:
{code}
create external table tbl_ice(a int, b string, c int) stored by iceberg stored 
as orc tblproperties ('format-version'='2');

insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
(4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);

update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
{code}
{code}
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 0 
failed, info=[Error: Error while running task ( failure ) : 
attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:574)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
... 18 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
java.lang.Integer
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector.get(JavaIntObjectInspector.java:40)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPLessThan.evaluate(GenericUDFOPLessThan.java:127)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:235)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:92)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:70)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEva

[jira] [Created] (HIVE-26160) Materialized View rewrite does not check tables scanned in sub-query expressions

2022-04-20 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26160:
-

 Summary: Materialized View rewrite does not check tables scanned 
in sub-query expressions
 Key: HIVE-26160
 URL: https://issues.apache.org/jira/browse/HIVE-26160
 Project: Hive
  Issue Type: Bug
  Components: CBO, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Materialized View rewrite based on exact sql text match uses the initial CBO 
plan to explore possibilities to change the query plan or part of the plan to 
an MV scan.

This algorithm requires the tables scanned by the original query plan. If the 
query contains sub query expressions the tables scanned by the sub query are 
not listed which can lead to rewrite the original plan to scan an outdated MV.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26043) Use constraint info when creating RexNodes

2022-03-17 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-26043:
-

 Summary: Use constraint info when creating RexNodes
 Key: HIVE-26043
 URL: https://issues.apache.org/jira/browse/HIVE-26043
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Prior HIVE-23100 Not null constraints affected newly created RexNode type 
nullability.
Nullability enables the subquery rewrite algorithm to generate more optimal 
plan.
[https://github.com/apache/hive/blob/1213ad3f0ae0e21e7519dc28b8b6d1401cdd1441/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L324]

Example:
{code:java}
explain cbo
select ws_sales_price
 from web_sales, customer, item
 where ws_bill_customer_sk = c_customer_sk
and ws_item_sk = i_item_sk
and ( c_customer_sk = 1
  or
  i_item_id in (select i_item_id
 from item
 where i_item_sk in (2, 3)
 )
);
{code}
Without not null constraints
{code:java}
HiveProject(ws_sales_price=[$2])
  HiveFilter(condition=[OR(AND(<>($6, 0), IS NOT NULL($8)), =($3, 1))])
HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], 
c_customer_sk=[$8], i_item_sk=[$3], i_item_id=[$4], c=[$5], i_item_id0=[$6], 
literalTrue=[$7])
  HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], 
cost=[not available])
HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], 
ws_sales_price=[$20])
HiveFilter(condition=[IS NOT NULL($3)])
  HiveTableScan(table=[[default, web_sales]], 
table:alias=[web_sales])
  HiveJoin(condition=[=($1, $3)], joinType=[left], algorithm=[none], 
cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveProject(i_item_sk=[$0], i_item_id=[$1])
HiveTableScan(table=[[default, item]], table:alias=[item])
  HiveProject(c=[$0])
HiveAggregate(group=[{}], c=[COUNT()])
  HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
HiveTableScan(table=[[default, item]], table:alias=[item])
HiveProject(i_item_id=[$0], literalTrue=[true])
  HiveAggregate(group=[{1}])
HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
  HiveTableScan(table=[[default, item]], table:alias=[item])
HiveProject(c_customer_sk=[$0])
  HiveTableScan(table=[[default, customer]], table:alias=[customer])
{code}
With not null constraints
{code:java}
HiveProject(ws_sales_price=[$2])
  HiveFilter(condition=[OR(IS NOT NULL($7), =($3, 1))])
HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], 
c_customer_sk=[$7], i_item_sk=[$3], i_item_id=[$4], i_item_id0=[$5], 
literalTrue=[$6])
  HiveJoin(condition=[=($1, $7)], joinType=[inner], algorithm=[none], 
cost=[not available])
HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], 
ws_sales_price=[$20])
HiveFilter(condition=[IS NOT NULL($3)])
  HiveTableScan(table=[[default, web_sales]], 
table:alias=[web_sales])
  HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], 
cost=[not available])
HiveProject(i_item_sk=[$0], i_item_id=[$1])
  HiveTableScan(table=[[default, item]], table:alias=[item])
HiveProject(i_item_id=[$0], literalTrue=[true])
  HiveAggregate(group=[{1}])
HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
  HiveTableScan(table=[[default, item]], table:alias=[item])
HiveProject(c_customer_sk=[$0])
  HiveTableScan(table=[[default, customer]], table:alias=[customer])
{code}
In the first plan when not null constraints was ignored there is an extra 
{{item}} table join without join condition:
{code:java}
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveProject(i_item_sk=[$0], i_item_id=[$1])
HiveTableScan(table=[[default, item]], table:alias=[item])
  HiveProject(c=[$0])
HiveAggregate(group=[{}], c=[COUNT()])
  HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
HiveTableScan(table=[[default, item]], table:alias=[item])

{code}
The planner is not aware that the {{i_item_id}} column has {{not null}} defined 
and it expects null values which needs the extra join.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25979) Order of Lineage is flaky in qtest output

2022-02-24 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25979:
-

 Summary: Order of Lineage is flaky in qtest output
 Key: HIVE-25979
 URL: https://issues.apache.org/jira/browse/HIVE-25979
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When running
{code:java}
mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=stats_part_multi_insert_acid.q -pl itests/qtest -Pitests
{code}
The lineage output of statement:
{code:java}
from source
insert into stats_part select key, value, p
insert into stats_part select key, value, p
{code}
is expected to be
{code:java}
POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
[(source)source.FieldSchema(name:key, type:int, comment:null), ]
POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
[(source)source.FieldSchema(name:key, type:int, comment:null), ]
POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
[(source)source.FieldSchema(name:value, type:string, comment:null), ]
POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
[(source)source.FieldSchema(name:value, type:string, comment:null), ]
{code}
but sometimes it is
{code:java}
POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
[(source)source.FieldSchema(name:key, type:int, comment:null), ]
POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
[(source)source.FieldSchema(name:value, type:string, comment:null), ]
POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
[(source)source.FieldSchema(name:key, type:int, comment:null), ]
POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
[(source)source.FieldSchema(name:value, type:string, comment:null), ]
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25969) Unable to reference table column named default

2022-02-22 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25969:
-

 Summary: Unable to reference table column named default
 Key: HIVE-25969
 URL: https://issues.apache.org/jira/browse/HIVE-25969
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
CREATE TABLE t1 (a int, `default` int) stored as orc TBLPROPERTIES 
('transactional'='true');

insert into t1 values (1, 2), (10, 11);

update t1 set a = `default`;

select * from t1;
{code}
result is
{code}
NULLNULL
NULLNULL
{code}
but it should be
{code}
11  11
2   2
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

2022-02-09 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25941:
-

 Summary: Long compilation time of complex query due to analysis 
for materialized view rewrite
 Key: HIVE-25941
 URL: https://issues.apache.org/jira/browse/HIVE-25941
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When compiling query the optimizer tries to rewrite the query plan or subtrees 
of the plan to use materialized view scans.

If
{code}
set hive.materializedview.rewriting.sql.subquery=false;
{code}
the compilation succeed in less then 10 sec otherwise it takes several minutes 
(~ 5min) depending on the hardware.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25937) Create view fails when definition contains a materialized view definition

2022-02-08 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25937:
-

 Summary: Create view fails when definition contains a materialized 
view definition
 Key: HIVE-25937
 URL: https://issues.apache.org/jira/browse/HIVE-25937
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


View definition contains the materialized view definition as subquery:
{code}
create materialized view mv1 as
select * from t1 where col0 > 2 union select * from t1 where col0 = 0;

explain cbo
create view v1 as
select sub.* from (select * from t1 where col0 > 2 union select * from t1 where 
col0 = 0) sub
where sub.col0 = 10 
{code}
{code}
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.parse.SemanticException: View definition references 
materialized view default@mv1
at 
org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211)
at 
org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:501)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:727)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:697)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(Parent

[jira] [Created] (HIVE-25918) Invalid stats after multi inserting into the same partition

2022-02-01 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25918:
-

 Summary: Invalid stats after multi inserting into the same 
partition
 Key: HIVE-25918
 URL: https://issues.apache.org/jira/browse/HIVE-25918
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table source(p int, key int,value string);
insert into source(p, key, value) values (101,42,'string42');

create table stats_part(key int,value string) partitioned by (p int);

from source
insert into stats_part select key, value, p
insert into stats_part select key, value, p;

select count(*) from stats_part;
{code}

In this case {{StatsOptimizer}} helps serving this query because the result 
should be {{rowNum}} of the partition {{p=101}}. The result is
{code}
1
{code}
however it shloud be
{code}
2
{code}
because both insert branches inserts 1-1 records.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25906) Clean MaterializedViewCache after q test run

2022-01-27 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25906:
-

 Summary: Clean MaterializedViewCache after q test run
 Key: HIVE-25906
 URL: https://issues.apache.org/jira/browse/HIVE-25906
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25900) Materialized view registry does not clean non existing views at refresh

2022-01-26 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25900:
-

 Summary: Materialized view registry does not clean non existing 
views at refresh
 Key: HIVE-25900
 URL: https://issues.apache.org/jira/browse/HIVE-25900
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


CBO plans of materialized views which are enabled for query rewrite are cached 
in HS2 (MaterializedViewsCache, HiveMaterializedViewsRegistry)

The registry is refreshed periodically from HMS:
{code:java}
set hive.server2.materializedviews.registry.refresh.period=1500s;
{code}
This functionality is required when multiple HS2 instances are used in a 
cluster: MV drop operation is served by one of the HS2 instances and the 
registry is updated at that time in that instance. However other HS2 instances 
still cache the non-existent view and need to be refreshed by the updater 
thread.
Currently the updater thread adds new entries, refresh existing ones but does 
not remove the outdated entries.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25899) Materialized view registry does not clean dropped views

2022-01-26 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25899:
-

 Summary: Materialized view registry does not clean dropped views
 Key: HIVE-25899
 URL: https://issues.apache.org/jira/browse/HIVE-25899
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


CBO plans of materialized views which are enabled for query rewrite are cached 
in HS2 (MaterializedViewsCache)

Dropping a materialized views should remove the entry from the cache however 
the entry  keys are not removed.

Cache state after running a whole PTest split:
{code}
this = {HiveMaterializedViewsRegistry@20858} 
 materializedViewsCache = {MaterializedViewsCache@20913} 
  materializedViews = {ConcurrentHashMap@67654}  size = 3
   "default" -> {ConcurrentHashMap@28568}  size = 8
key = "default"
value = {ConcurrentHashMap@28568}  size = 8
 "cluster_mv_2" -> {HiveRelOptMaterialization@67786} 
 "cluster_mv_1" -> {HiveRelOptMaterialization@67788} 
 "cluster_mv_4" -> {HiveRelOptMaterialization@67790} 
 "cluster_mv_3" -> {HiveRelOptMaterialization@67792} 
 "cmv_mat_view_n10" -> {HiveRelOptMaterialization@67794} 
 "distribute_mv_1" -> {HiveRelOptMaterialization@67796} 
 "distribute_mv_3" -> {HiveRelOptMaterialization@67798} 
 "distribute_mv_2" -> {HiveRelOptMaterialization@67800} 
   "db2" -> {ConcurrentHashMap@67772}  size = 2
key = "db2"
value = {ConcurrentHashMap@67772}  size = 2
 "cmv_mat_view_n7" -> {HiveRelOptMaterialization@67806} 
 "cmv_mat_view2_n2" -> {HiveRelOptMaterialization@67808} 
   "count_distinct" -> {ConcurrentHashMap@67774}  size = 0
key = "count_distinct"
value = {ConcurrentHashMap@67774}  size = 0
  sqlToMaterializedView = {ConcurrentHashMap@20915}  size = 36
   "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n  FROM 
`default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON 
(`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n  WHERE 
`cmv_basetable_2_n100`.`c` > 10.0\n  GROUP BY `cmv_basetable_n100`.`a`, 
`cmv_basetable_2_n100`.`c`" -> {ArrayList@67694}  size = 0
key = "SELECT `cmv_basetable_n100`.`a`, `cmv_basetable_2_n100`.`c`\n  FROM 
`default`.`cmv_basetable_n100` JOIN `default`.`cmv_basetable_2_n100` ON 
(`cmv_basetable_n100`.`a` = `cmv_basetable_2_n100`.`a`)\n  WHERE 
`cmv_basetable_2_n100`.`c` > 10.0\n  GROUP BY `cmv_basetable_n100`.`a`, 
`cmv_basetable_2_n100`.`c`"
value = {ArrayList@67694}  size = 0
   "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from 
`default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, 
`emps_parquet_n3`.`deptno`" -> {ArrayList@67696}  size = 0
key = "select `emps_parquet_n3`.`empid`, `emps_parquet_n3`.`deptno` from 
`default`.`emps_parquet_n3` group by `emps_parquet_n3`.`empid`, 
`emps_parquet_n3`.`deptno`"
value = {ArrayList@67696}  size = 0
   "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from 
`db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3" -> {ArrayList@67698} 
 size = 1
key = "select `cmv_basetable_n7`.`a`, `cmv_basetable_n7`.`c` from 
`db1`.`cmv_basetable_n7` where `cmv_basetable_n7`.`a` = 3"
value = {ArrayList@67698}  size = 1
   "SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 100 as 
`partkey`, `src_txn`.`value`, `src_txn`.`key` FROM `default`.`src_txn`, 
`default`.`src_txn_2`\nWHERE `src_txn`.`key` = `src_txn_2`.`key`\n  AND 
`src_txn`.`key` > 200 AND `src_txn`.`key` < 250) `cluster_mv_3`" -> 
{ArrayList@67700}  size = 1
key = "SELECT `value`, `key`, `partkey` FROM (SELECT `src_txn`.`key` + 100 
as `partkey`, `src_txn`.`value`, `src_txn`.`key` FROM `default`.`src_txn`, 
`default`.`src_txn_2`\nWHERE `src_txn`.`key` = `src_txn_2`.`key`\n  AND 
`src_txn`.`key` > 200 AND `src_txn`.`key` < 250) `cluster_mv_3`"
value = {ArrayList@67700}  size = 1
   "SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n  FROM 
`default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON 
(`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n  WHERE 
`cmv_basetable_2_n3`.`c` > 10.0" -> {ArrayList@67702}  size = 0
key = "SELECT `cmv_basetable_n6`.`a`, `cmv_basetable_2_n3`.`c`\n  FROM 
`default`.`cmv_basetable_n6` JOIN `default`.`cmv_basetable_2_n3` ON 
(`cmv_basetable_n6`.`a` = `cmv_basetable_2_n3`.`a`)\n  WHERE 
`cmv_basetable_2_n3`.`c` > 10.0"
value = {ArrayList@67702}  size = 0
   "SELECT `src_txn`.`key`, `src_txn`.`value` FROM `default`.`src_txn` where 
`src_txn`.`key` > 200 and `src_txn`.`key` < 250" -> {ArrayList@67704}  size = 1
key = "SELECT `src_txn`.`key`, `src_txn`.`value` FROM `default`.`src_txn` 
where `src_txn`.`key` > 200 and `src_txn`.`key` < 250"
value = {ArrayList@67704}  size = 1
   "select `cmv_basetable_n9`.`a`, `cmv_basetable_2_n4`.`c`\n   from 
`default`.`cmv_basetable_n9` join `default`.`cmv_basetable_2_n4` on

[jira] [Created] (HIVE-25878) Unable to compile cpp metastore thrift client

2022-01-19 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25878:
-

 Summary: Unable to compile cpp metastore thrift client
 Key: HIVE-25878
 URL: https://issues.apache.org/jira/browse/HIVE-25878
 Project: Hive
  Issue Type: Bug
  Components: Thrift API
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


The following structs definition contains circular dependency:
{code:java}
struct SourceTable {
1: required Table table,
   ...
}

struct CreationMetadata {
...
7: optional set sourceTables
}

struct Table {
  ...
  16: optional CreationMetadata creationMetadata,   // only for MVs, it stores 
table names used and txn list at MV creation
  ...
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25858) DISTINCT with ORDER BY on ordinals fails with NPE

2022-01-10 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25858:
-

 Summary: DISTINCT with ORDER BY on ordinals fails with NPE
 Key: HIVE-25858
 URL: https://issues.apache.org/jira/browse/HIVE-25858
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25818) Values query with order by position clause fails

2021-12-16 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25818:
-

 Summary: Values query with order by position clause fails
 Key: HIVE-25818
 URL: https://issues.apache.org/jira/browse/HIVE-25818
 Project: Hive
  Issue Type: Bug
  Components: CBO, Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
values(1+1, 2, 5.0, 'a') order by 1 limit 2;
{code}
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.getFieldIndexFromColumnNumber(CalcitePlanner.java:4146)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.beginGenOBLogicalPlan(CalcitePlanner.java:4028)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genOBLogicalPlan(CalcitePlanner.java:3933)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5148)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1651)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1593)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1345)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:563)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12565)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:726)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:696)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runn

[jira] [Created] (HIVE-25805) Wrong result when rebuilding MV with count(col) incremental

2021-12-14 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25805:
-

 Summary: Wrong result when rebuilding MV with count(col) 
incremental
 Key: HIVE-25805
 URL: https://issues.apache.org/jira/browse/HIVE-25805
 Project: Hive
  Issue Type: Bug
  Components: CBO, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code:java}
create table t1(a char(15), b int) stored as orc TBLPROPERTIES 
('transactional'='true');

insert into t1(a, b) values ('old', 1);

create materialized view mat1 stored as orc TBLPROPERTIES 
('transactional'='true') as
select t1.a, count(t1.b), count(*) from t1 group by t1.a;

delete from t1 where b = 1;

insert into t1(a,b) values
('new', null);

alter materialized view mat1 rebuild;

select * from mat1;
{code}
returns
{code:java}
new 1   1
{code}
but, should be
{code:java}
new 0   1
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25771) Stats may be incorrect under concurrent inserts if direct-insert is Off

2021-12-03 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25771:
-

 Summary: Stats may be incorrect under concurrent inserts if 
direct-insert is Off
 Key: HIVE-25771
 URL: https://issues.apache.org/jira/browse/HIVE-25771
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Table statistics value Number of rows may be invalid after inserting into the 
same partition concurrently using multiple user sessions.

This can also lead to invalid query results because count(*) may served from 
stats.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25747) Make a cost base decision when rebuilding materialized views

2021-11-29 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25747:
-

 Summary: Make a cost base decision when rebuilding materialized 
views
 Key: HIVE-25747
 URL: https://issues.apache.org/jira/browse/HIVE-25747
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Choose between full insert-overwrite and partition based incremental rebuild 
plan when rebuilding partitioned materialized views.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25745) Print transactional stats of materialized view source tables

2021-11-29 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25745:
-

 Summary: Print transactional stats of materialized view source 
tables
 Key: HIVE-25745
 URL: https://issues.apache.org/jira/browse/HIVE-25745
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Print the number of rows affected by transactions of materialized view source 
tables since the last rebuild of the view when using the command
{code:java}
DESCRIBE FORMATTED ;
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25744) Support backward compatibility of thrift struct CreationMetadata

2021-11-29 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25744:
-

 Summary: Support backward compatibility of thrift struct 
CreationMetadata
 Key: HIVE-25744
 URL: https://issues.apache.org/jira/browse/HIVE-25744
 Project: Hive
  Issue Type: Task
  Components: Materialized views, Thrift API
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Old
{code}
struct CreationMetadata {
1: required string catName
2: required string dbName,
3: required string tblName,
4: required set tablesUsed,
5: optional string validTxnList,
6: optional i64 materializationTime
}HIVE-25656 introduced a breaking change in the HiveServer2 <-> Metastore 
thrift api:
{code}
New
{code}
struct CreationMetadata {
1: required string catName
2: required string dbName,
3: required string tblName,
4: required set tablesUsed,
5: optional string validTxnList,
6: optional i64 materializationTime
}
{code}
4th field type changed



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25656) Get materialized view state based on number of affected rows by transactions

2021-10-28 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25656:
-

 Summary: Get materialized view state based on number of affected 
rows by transactions
 Key: HIVE-25656
 URL: https://issues.apache.org/jira/browse/HIVE-25656
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views, Transactions
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


To enable the faster incremental rebuild of materialized views presence of 
update/delete operations on the source tables of the view since the last 
rebuild must be checked. Based on the outcome different plan is generated for 
scenarios in presence of update/delete and insert only operations.

Currently this is done by querying the COMPLETED_TXN_COMPONENTS table however 
the records from this table is cleaned when MV source tables are compacted. 
This reduces the chances of incremental MV rebuild.

The goal of this patch is to find an alternative way to store and retrieve this 
information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25654) Stats of transactional table updated when transaction is rolled back

2021-10-27 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25654:
-

 Summary: Stats of transactional table updated when transaction is 
rolled back
 Key: HIVE-25654
 URL: https://issues.apache.org/jira/browse/HIVE-25654
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Krisztian Kasa


{code:java}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table t1(a int) stored as orc TBLPROPERTIES ('transactional'='true');

describe formatted t1;

-- simulate rollback
set hive.test.rollbacktxn=true;

insert into t1(a) values (1),(2),(3);

describe formatted t1;

select count(1) from t1;
{code}
{code}
POSTHOOK: query: describe formatted t1
...
numFiles1   
numRows 3   
rawDataSize 0   
totalSize   632 
transactional   true
...
POSTHOOK: query: select count(1) from t1
POSTHOOK: type: QUERY
POSTHOOK: Input: default@t1
0
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25590) Able to create views referencing temporary tables and materialized views

2021-10-05 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25590:
-

 Summary: Able to create views referencing temporary tables and 
materialized views
 Key: HIVE-25590
 URL: https://issues.apache.org/jira/browse/HIVE-25590
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Creating views/materialized views referencing temporary tables and materialized 
views are disabled in Hive. However the verification algorithm fails to 
recognize temporary tables and materialized views in subqueries. 
The verification also fails when the view definition contains joins because CBO 
transforms join branches to subqueries.
Example1:
{code}
create temporary table tmp1 (c1 string, c2 string);

create view tmp1_view as
select subq.c1 from (select c1, c2 from tmp1) subq;
{code}
Example2:
{code}
create table t1 (a int) stored as orc tblproperties ('transactional'='true');
create table t2 (a int) stored as orc tblproperties ('transactional'='true');

create materialized view mv1 as
select a from t1 where a = 10;

create materialized view mv2 as
select t2.a from mv1
join t2 on (mv1.a = t2.a);
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25574) Replace clob with varchar in JDO

2021-09-29 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25574:
-

 Summary: Replace clob with varchar in JDO
 Key: HIVE-25574
 URL: https://issues.apache.org/jira/browse/HIVE-25574
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Follow up of HIVE-21940.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25572) Exception while querying materialized view invalidation info

2021-09-29 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25572:
-

 Summary: Exception while querying materialized view invalidation 
info
 Key: HIVE-25572
 URL: https://issues.apache.org/jira/browse/HIVE-25572
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code:java}
2021-09-29T00:33:02,971  WARN [main] txn.TxnHandler: Unable to retrieve 
materialization invalidation information: completed transaction components.
java.sql.SQLSyntaxErrorException: Syntax error: Encountered "" at line 1, 
column 234.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
~[derby-10.14.1.0.jar:?]
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedPreparedStatement.(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedPreparedStatement42.(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at org.apache.derby.jdbc.Driver42.newEmbedPreparedStatement(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown 
Source) ~[derby-10.14.1.0.jar:?]
at 
com.zaxxer.hikari.pool.ProxyConnection.prepareStatement(ProxyConnection.java:311)
 ~[HikariCP-2.6.1.jar:?]
at 
com.zaxxer.hikari.pool.HikariProxyConnection.prepareStatement(HikariProxyConnection.java)
 ~[HikariCP-2.6.1.jar:?]
at 
org.apache.hadoop.hive.metastore.tools.SQLGenerator.prepareStmtWithParameters(SQLGenerator.java:169)
 ~[classes/:?]
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.executeBoolean(TxnHandler.java:2598)
 [classes/:?]
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.getMaterializationInvalidationInfo(TxnHandler.java:2575)
 [classes/:?]
at 
org.apache.hadoop.hive.metastore.txn.TestTxnHandler.testGetMaterializationInvalidationInfo(TestTxnHandler.java:1910)
 [test-classes/:?]
at 
org.apache.hadoop.hive.metastore.txn.TestTxnHandler.testGetMaterializationInvalidationInfo(TestTxnHandler.java:1875)
 [test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_112]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_112]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 [junit-4.13.jar:4.13]
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 [junit-4.13.jar:4.13]
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 [junit-4.13.jar:4.13]
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 [junit-4.13.jar:4.13]
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
[junit-4.13.jar:4.13]
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
[junit-4.13.jar:4.13]
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
[junit-4.13.jar:4.13]
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 [junit-4.13.jar:4.13]
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
[junit-4.13.jar:4.13]
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 [junit-4.13.jar:4.13]
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 [junit-4.13.jar:4.13]
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
[junit-4.13.jar:4.13]
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
[junit-4.13.jar:4.13]
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
[junit-4.13.jar:4.13]
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
[junit-4.13.jar:4.13]
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
[junit-4.13.jar:4.13]
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
[junit-4.13.

[jira] [Created] (HIVE-25568) Estimate TopNKey operator statistics.

2021-09-28 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25568:
-

 Summary: Estimate TopNKey operator statistics.
 Key: HIVE-25568
 URL: https://issues.apache.org/jira/browse/HIVE-25568
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa


Currently TopNKey operator has the same statistics as it's parent operator:
{code}
 TableScan
  alias: src
  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Top N Key Operator
sort order: +
keys: key (type: string)
null sort order: z
Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
top n: 5
{code}
This operator filters out rows and this should be indicated in statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25546) Enable incremental rebuild of Materialized view with insert only source tables

2021-09-22 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25546:
-

 Summary: Enable incremental rebuild of Materialized view with 
insert only source tables
 Key: HIVE-25546
 URL: https://issues.apache.org/jira/browse/HIVE-25546
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
('transactional'='true', 'transactional_properties'='insert_only');

create materialized view mat1 stored as orc TBLPROPERTIES 
('transactional'='true') as
select a, b, c from t1 where a > 10;
{code}

Currently materialized view *mat1* can not be rebuilt incrementally because it 
has an insert only source table (t1). Such tables does not have ROW_ID.write_id 
which is required to identify newly inserted records since the last rebuild.
HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25512) Merge statement does not enforce check constraints

2021-09-10 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25512:
-

 Summary: Merge statement does not enforce check constraints
 Key: HIVE-25512
 URL: https://issues.apache.org/jira/browse/HIVE-25512
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

CREATE TABLE table_check_merge(
name string CHECK (length(name)<=20),
age int,
gpa double CHECK (gpa BETWEEN 0.0 AND 4.0)
) stored as orc TBLPROPERTIES ('transactional'='true');

CREATE TABLE table_source( name string, age int, gpa double);

insert into table_source(name, age, gpa) values ('student1', 16, null), (null, 
20, 4.0);

insert into table_check_merge(name, age, gpa) values ('student1', 16, 2.0);

merge into table_check_merge using (select age from table_source)source
on source.age=table_check_merge.age
when matched then update set gpa=6;
{code}

Merge statement tries to update gpa to 6 which is not between 0.0 and 4.0.
However the update succeeds.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25475) TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable

2021-08-23 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25475:
-

 Summary: 
TestStatsReplicationScenarios.testForParallelBootstrapLoad is unstable
 Key: HIVE-25475
 URL: https://issues.apache.org/jira/browse/HIVE-25475
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa


http://ci.hive.apache.org/job/hive-flaky-check/389/

{code}
16:19:18  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 141.73 s <<< FAILURE! - in 
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios
16:19:18  [ERROR] 
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad
  Time elapsed: 122.979 s  <<< ERROR!
16:19:18  org.apache.hadoop.hive.ql.metadata.HiveException
16:19:18at 
org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:5032)
16:19:18at 
org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3348)
16:19:18at 
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:429)
16:19:18at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
16:19:18at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
16:19:18at 
org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
16:19:18at 
org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
16:19:18at 
org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
16:19:18at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:108)
16:19:18at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348)
16:19:18at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204)
16:19:18at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153)
16:19:18at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148)
16:19:18at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
16:19:18at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
16:19:18at 
org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:235)
16:19:18at 
org.apache.hadoop.hive.ql.parse.WarehouseInstance.load(WarehouseInstance.java:309)
16:19:18at 
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.dumpLoadVerify(TestStatsReplicationScenarios.java:359)
16:19:18at 
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testStatsReplicationCommon(TestStatsReplicationScenarios.java:663)
16:19:18at 
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testForParallelBootstrapLoad(TestStatsReplicationScenarios.java:688)
16:19:18at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16:19:18at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
16:19:18at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:19:18at java.lang.reflect.Method.invoke(Method.java:498)
16:19:18at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
16:19:18at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
16:19:18at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
16:19:18at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
16:19:18at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
16:19:18at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
16:19:18at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
16:19:18at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
16:19:18at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
16:19:18at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
16:19:18at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
16:19:18at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
16:19:18at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
16:19:18at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
16:19:18at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
16:19:18at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
16:19:18at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
16:19:18at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
16:19:18at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
16:19:18at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
16:19:18at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
1

[jira] [Created] (HIVE-25406) Fetch writeId from insert-only tables

2021-07-29 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25406:
-

 Summary: Fetch writeId from insert-only tables
 Key: HIVE-25406
 URL: https://issues.apache.org/jira/browse/HIVE-25406
 Project: Hive
  Issue Type: Improvement
  Components: ORC, Parquet, Reader, Vectorization
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When generating plan for incremental materialized view rebuild a filter 
operator is inserted on top of each source table scans. The predicates contain 
a filter for writeId since we want to get all the rows inserted/deleted from 
the source tables since the last rebuild only.

WriteId is part of the ROW_ID virtual column and only available for fully-ACID 
ORC tables.

The goal of this jira is to populate a writeId when fetching from insert-only 
transactional tables.
{code:java}
create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc 
TBLPROPERTIES ('transactional'='true', 
'transactional_properties'='insert_only');

...

SELECT t1.ROW__ID.writeId, a, b FROM t1;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25388) Fix TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites

2021-07-26 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25388:
-

 Summary: Fix 
TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
 Key: HIVE-25388
 URL: https://issues.apache.org/jira/browse/HIVE-25388
 Project: Hive
  Issue Type: Test
  Components: repl, Test
Reporter: Krisztian Kasa


http://ci.hive.apache.org/job/hive-flaky-check/339/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25369) Handle Sum0 when rebuilding materialized view incrementally

2021-07-22 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25369:
-

 Summary: Handle Sum0 when rebuilding materialized view 
incrementally
 Key: HIVE-25369
 URL: https://issues.apache.org/jira/browse/HIVE-25369
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When rewriting MV insert overwrite plan to incremental rebuild plan a Sum0 
aggregate function is used when aggregating count function subresults coming 
from the existing MV data and the aggregated newly inserted/deleted records 
since the last rebuild
{code}
create materialized view mat1 stored as orc TBLPROPERTIES 
('transactional'='true') as
select t1.a, count(*) from t1
{code}
Insert overwrite plan:
{code}
HiveAggregate(group=[{0}], agg#0=[$SUM0($1)])
  HiveUnion(all=[true])
HiveAggregate(group=[{0}], agg#0=[count()])
  HiveProject($f0=[$0])
HiveFilter(condition=[<(2, $5.writeid)])
  HiveTableScan(table=[[default, t1]], table:alias=[t1])
HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25353) Incremental rebuild of partitioned insert only MV in presence of delete operations

2021-07-20 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25353:
-

 Summary: Incremental rebuild of partitioned insert only MV in 
presence of delete operations
 Key: HIVE-25353
 URL: https://issues.apache.org/jira/browse/HIVE-25353
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25253) Incremental rewrite of partitioned insert only materialized views

2021-06-16 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25253:
-

 Summary: Incremental rewrite of partitioned insert only 
materialized views
 Key: HIVE-25253
 URL: https://issues.apache.org/jira/browse/HIVE-25253
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25240) Query Text based MaterializedView rewrite if subqueries

2021-06-10 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25240:
-

 Summary: Query Text based MaterializedView rewrite if subqueries
 Key: HIVE-25240
 URL: https://issues.apache.org/jira/browse/HIVE-25240
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create materialized view mat1 as
select col0 from t1 where col0 > 1;

explain cbo
select col0 from
  (select col0 from t1 where col0 > 1) sub
where col0 = 10;
{code}
{code}
HiveProject(col0=[CAST(10):INTEGER])
  HiveFilter(condition=[=($0, 10)])
HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
{code}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25220) Query with union fails CBO with OOM

2021-06-08 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25220:
-

 Summary: Query with union fails CBO with OOM
 Key: HIVE-25220
 URL: https://issues.apache.org/jira/browse/HIVE-25220
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25166) Query with multiple count(distinct) fails

2021-05-26 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25166:
-

 Summary: Query with multiple count(distinct) fails
 Key: HIVE-25166
 URL: https://issues.apache.org/jira/browse/HIVE-25166
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
select count(distinct 0), count(distinct null) from alltypes;
{code}
{code}
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not in 
GROUP BY key 'TOK_NULL'
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(Parent

[jira] [Created] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-13 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25109:
-

 Summary: CBO fails when updating table has constraints defined
 Key: HIVE-25109
 URL: https://issues.apache.org/jira/browse/HIVE-25109
 Project: Hive
  Issue Type: Bug
  Components: CBO, Logical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table acid_uami_n0(i int,
 de decimal(5,2) constraint nn1 not null enforced,
 vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
decimal(5,2))) enforced)
 clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
('transactional'='true');

-- update
explain cbo
update acid_uami_n0 set de = 893.14 where de = 103.00;
{code}

hive.log
{code}
2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
Schema didn't match Optimized Op Tree Schema
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeUpdate(UpdateDeleteSemanticAnalyzer.java:63)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyze(UpdateDeleteSemanticAnalyzer.java:53)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) 
[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.proce

[jira] [Created] (HIVE-25089) Move Materialized View rebuild code to AlterMaterializedViewRebuildAnalyzer

2021-05-03 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25089:
-

 Summary: Move Materialized View rebuild code to 
AlterMaterializedViewRebuildAnalyzer
 Key: HIVE-25089
 URL: https://issues.apache.org/jira/browse/HIVE-25089
 Project: Hive
  Issue Type: Task
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25071) Number of reducers limited to fixed 1 when updating/deleting

2021-04-28 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25071:
-

 Summary: Number of reducers limited to fixed 1 when 
updating/deleting
 Key: HIVE-25071
 URL: https://issues.apache.org/jira/browse/HIVE-25071
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When updating/deleting bucketed tables an extra ReduceSink operator is created 
to enforce bucketing. After HIVE-22538 number of reducers limited to fixed 1 in 
these RS operators.

This can lead to performance degradation.

Prior HIVE-22538 multiple reducers was available such cases. The reason for 
limiting the number of reducers is to ensure RowId ascending order in delete 
delta files produced by the update/delete statements.

This is the plan of delete statement like:

{code}
DELETE FROM t1 WHERE a = 1;
{code}
{code}
TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7]
{code}

RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of 
reducers were limited to bucket number in the table or hive.exec.reducers.max. 
However RS[5] does not provide any ordering so above plan may generate unsorted 
deleted deltas which leads to corrupted data reads.

Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication and 
the resulting RS kept the ordering and enabled multiple reducers. It could do 
because ReduceSinkDeduplication was prepared for ACID writes. This was removed 
by HIVE-22538 to get a more generic ReduceSinkDeduplication.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25066) Show whether a materialized view supports incremental review or not

2021-04-28 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25066:
-

 Summary: Show whether a materialized view supports incremental 
review or not
 Key: HIVE-25066
 URL: https://issues.apache.org/jira/browse/HIVE-25066
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Add information about whether a materialized view supports incremental rebuild 
or not in an additional column in
{code:java}
SHOW MATERIALIZED VIEWS
{code}
statement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25063) Enforce hive.default.nulls.last when enforce bucketing

2021-04-27 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25063:
-

 Summary: Enforce hive.default.nulls.last when enforce bucketing
 Key: HIVE-25063
 URL: https://issues.apache.org/jira/browse/HIVE-25063
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When creating ReduceSink operator for bucketing the sort key null sort order is 
hardcoded:
{code}
  for (int sortOrder : sortOrders) {
order.append(DirectionUtils.codeToSign(sortOrder));
nullOrder.append(sortOrder == DirectionUtils.ASCENDING_CODE ? 'a' : 
'z');
  }
{code}

It should depend on both the setting hive.default.nulls.last and the order 
direction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25012) Parsing table alias is failing if query has table properties specified

2021-04-13 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-25012:
-

 Summary: Parsing table alias is failing if query has table 
properties specified
 Key: HIVE-25012
 URL: https://issues.apache.org/jira/browse/HIVE-25012
 Project: Hive
  Issue Type: Bug
  Components: CBO, Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from 
t1('acid.fetch.deleted.rows'='true')
join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a;
{code}

When creating Join RelNode the aliases are used to lookup left and right input 
RelNodes. Aliases are extracted from the AST subtree of the left and right 
inputs of the join AST node. In case of a table reference:
{code}
TOK_TABREF
   TOK_TABNAME
  t1
   TOK_TABLEPROPERTIES
  TOK_TABLEPROPLIST
 TOK_TABLEPROPERTY
'acid.fetch.deleted.rows'
'true'
{code} 

Prior HIVE-24854 queries mentioned above failed because existing solution was 
not expect TOK_TABLEPROPERTIES.

The goal of this patch is to parse TOK_TABREF properly using existing solution 
also used in SemanticAnalyser.doPhase1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24993) AssertionError when referencing ROW__ID.writeId

2021-04-08 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24993:
-

 Summary: AssertionError when referencing ROW__ID.writeId
 Key: HIVE-24993
 URL: https://issues.apache.org/jira/browse/HIVE-24993
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
SELECT t1.ROW__ID
FROM t1
WHERE t1.ROW__ID.writeid > 1
{code}
{code}
java.lang.AssertionError
at 
org.apache.hadoop.hive.ql.parse.UnparseTranslator.addTranslation(UnparseTranslator.java:123)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5680)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5570)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5530)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3385)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3706)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3717)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5281)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1839)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1785)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1546)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:563)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12582)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.j

[jira] [Created] (HIVE-24992) Incremental rebuild of MV having aggregate in presence of delete operation

2021-04-07 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24992:
-

 Summary: Incremental rebuild of MV having aggregate in presence of 
delete operation
 Key: HIVE-24992
 URL: https://issues.apache.org/jira/browse/HIVE-24992
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Extension of HIVE-24854: handle cases when the Materialized view definition has 
aggregation like
{code}
CREATE MATERIALIZED VIEW cmv_mat_view_n5 DISABLE REWRITE TBLPROPERTIES 
('transactional'='true') AS
  SELECT cmv_basetable_n5.a, cmv_basetable_2_n2.c, sum(cmv_basetable_2_n2.d)
  FROM cmv_basetable_n5 JOIN cmv_basetable_2_n2 ON (cmv_basetable_n5.a = 
cmv_basetable_2_n2.a)
  WHERE cmv_basetable_2_n2.c > 10.0
  GROUP BY cmv_basetable_n5.a, cmv_basetable_2_n2.c;

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-04-07 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24991:
-

 Summary: Enable fetching deleted rows in vectorized mode
 Key: HIVE-24991
 URL: https://issues.apache.org/jira/browse/HIVE-24991
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa


HIVE-24855 enables loading deleted rows from ORC tables when table property 
*acid.fetch.deleted.rows* is true.
The goal of this jira is to enable this feature in vectorized orc batch reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24990) Support distinct in window aggregation in vectorized mode

2021-04-07 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24990:
-

 Summary: Support distinct in window aggregation in vectorized mode
 Key: HIVE-24990
 URL: https://issues.apache.org/jira/browse/HIVE-24990
 Project: Hive
  Issue Type: Improvement
  Components: UDF, Vectorization
Reporter: Krisztian Kasa


PTF operator can not be vectorized if query has windowing function with 
*distinct* because this version of the aggregate functions are not implemented 
yet.
{code}
SELECT sum(DISTINCT a) OVER (PARTITION BY b) FROM t1;
{code}

The only exception is *count*.

List of functions has vectorized version but does not have vectorized distinct 
version
{code}
row_number
rank
dense_rank
min
max
sum
avg
first_value
last_value
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24935) Remove outdated check for correlated exists subqueries with full aggregate

2021-03-24 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24935:
-

 Summary: Remove outdated check for correlated exists subqueries 
with full aggregate
 Key: HIVE-24935
 URL: https://issues.apache.org/jira/browse/HIVE-24935
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Since HIVE-24929 QBSubQuery.subqueryRestrictionsCheck is no longer called. 
Check for exists subqueries with full aggregate moved to 
QBSubQueryParseInfo.hasFullAggregate() and QBSubQueryParseInfo.getOperator()




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24929) Allow correlated exists subqueries with windowing clause

2021-03-24 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24929:
-

 Summary: Allow correlated exists subqueries with windowing clause
 Key: HIVE-24929
 URL: https://issues.apache.org/jira/browse/HIVE-24929
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Currently queries which has windowing clause with subqueries are not supported 
by Hive: Hive rewrites subqueries to joins and the rewritten plan would lead to 
incorrect results such cases.
However this restriction can be lifted in case of Exists/Not exists subqueries 
since those cases we don not interested in the result of the window function 
call but the existence of any record.
{code}
select id, int_col
from alltypesagg a
where exists
  (select sum(int_col) over (partition by bool_col)
   from alltypestiny b
   where a.id = b.id);
{code}
{code}
select id, int_col from alltypestiny t
where not exists
  (select sum(int_col) over (partition by bool_col)
   from alltypesagg a where t.id = a.int_col);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24925) Query materialized view invalidation info can cause ORA-01795

2021-03-23 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24925:
-

 Summary: Query materialized view invalidation info can cause 
ORA-01795
 Key: HIVE-24925
 URL: https://issues.apache.org/jira/browse/HIVE-24925
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Krisztian Kasa
 Fix For: 4.0.0


Query materialized view invalidation info assemble direct sql query to pull 
update/delete completed transactions on source tables since the last rebuild of 
the materialized view.
Invalid writeIds are also used to filter the result. These writeIds are passed 
using an *in* operator. 
Depend on the size of invalid writeId list the operands of the *in* operator or 
the overall query text can exceed limitations. Example: in case of Oracle 
backend db maximum number of expressions in a list is 1000.

{code}
SELECT "CTC_UPDATE_DELETE"
FROM "COMPLETED_TXN_COMPONENTS"
WHERE "CTC_UPDATE_DELETE" ='Y'
AND ( ("CTC_DATABASE"=? AND "CTC_TABLE"=? AND ("CTC_WRITEID" > 1 OR 
"CTC_WRITEID" IN (,  ... )) ) OR
  ("CTC_DATABASE"=? AND "CTC_TABLE"=? AND ("CTC_WRITEID" > 1) )
  )
AND "CTC_TXNID" <= 16
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24908) Adding Respect/Ignore nulls as a UDAF parameter is ambiguous

2021-03-19 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24908:
-

 Summary: Adding Respect/Ignore nulls as a UDAF parameter is 
ambiguous
 Key: HIVE-24908
 URL: https://issues.apache.org/jira/browse/HIVE-24908
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Both function calls translated to the same UDAF call:
{code}
SELECT lead(a, 2, true) ...
SELECT lead(a, 2) IGNORE NULLS ...
{code}
IGNORE NULLS is passed as an extra constant boolean parameter to the UDAF
https://github.com/apache/hive/blob/eed78dfdcb6dfc2de400397a60de12e6f62b96e2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L743

However the semantics of the two function calls has different semantics:
* *lead(a, 2, true)* - 'true' is the default value: "The value of DEFAULT is 
returned as the result if there is no row corresponding to the OFFSET number of 
rows before R within P (for the lag function) or after R within P (for the lead 
function)"
* *lead(a, 2) IGNORE NULLS* - For each row in the current window find the 2nd 
not-NULL value starting directly after the current row. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24894) transform_acid is unstable

2021-03-17 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24894:
-

 Summary: transform_acid is unstable
 Key: HIVE-24894
 URL: https://issues.apache.org/jira/browse/HIVE-24894
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa


[http://ci.hive.apache.org/job/hive-flaky-check/217]
{code}
Client execution failed with error code = 2 
running 

SELECT transform(*) USING 'transform_acid_grep.sh' AS (col string) FROM 
transform_acid 
fname=transform_acid.q
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24869) Implement Respect/Ignore Nulls in lag/lead

2021-03-10 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24869:
-

 Summary: Implement Respect/Ignore Nulls in lag/lead
 Key: HIVE-24869
 URL: https://issues.apache.org/jira/browse/HIVE-24869
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Krisztian Kasa


{code}
 ::=

  [   [   ] ] 
  [  ]

 ::=
  LEAD | LAG

 ::=
  

 ::=
  

 ::=
  

 ::=
  RESPECT NULLS | IGNORE NULLS
{code}
Example: get the a column value from the previous and the next row or return 0 
if there is no  previous/next row corresponding to the current row. 
Respect/Ignore nulls control whether null values should be preserved or 
eliminated.
{code}
SELECT 
  a,
  LAG(a, 1, 0) OVER (ORDER BY a) IGNORE NULLS,
  LEAD(a, 1, 0) OVER (ORDER BY a) RESPECT NULLS
FROM ...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24868) Support specifying Respect/Ignore Nulls in function parameter list

2021-03-10 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24868:
-

 Summary: Support specifying Respect/Ignore Nulls in function 
parameter list
 Key: HIVE-24868
 URL: https://issues.apache.org/jira/browse/HIVE-24868
 Project: Hive
  Issue Type: Improvement
  Components: Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
select last_value(b, ignore nulls) over(partition by a order by b) from t1;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value

2021-03-09 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24865:
-

 Summary: Implement Respect/Ignore Nulls in first/last_value
 Key: HIVE-24865
 URL: https://issues.apache.org/jira/browse/HIVE-24865
 Project: Hive
  Issue Type: New Feature
  Components: Parser, UDF
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code:java}
 ::=
RESPECT NULLS | IGNORE NULLS

 ::=
[ 
]

 ::=
FIRST_VALUE | LAST_VALUE
{code}
Example:
{code:java}
select last_value(b) ignore nulls over(partition by a order by b) from t1;
{code}
Existing non-standard implementation:
{code:java}
select last_value(b, true) over(partition by a order by b) from t1;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24863) Wrong property value in UDAF percentile_cont/disc description

2021-03-08 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24863:
-

 Summary: Wrong property value in UDAF percentile_cont/disc 
description
 Key: HIVE-24863
 URL: https://issues.apache.org/jira/browse/HIVE-24863
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24859) TestZookeeperLockManager#testMetrics fails intermittently

2021-03-08 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24859:
-

 Summary: TestZookeeperLockManager#testMetrics fails intermittently
 Key: HIVE-24859
 URL: https://issues.apache.org/jira/browse/HIVE-24859
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa


http://ci.hive.apache.org/job/hive-flaky-check/198/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24855) Introduce virtual colum ROWISDELETED

2021-03-07 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24855:
-

 Summary: Introduce virtual colum ROW__IS__DELETED
 Key: HIVE-24855
 URL: https://issues.apache.org/jira/browse/HIVE-24855
 Project: Hive
  Issue Type: New Feature
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations

2021-03-07 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24854:
-

 Summary: Incremental Materialized view refresh in presence of 
update/delete operations
 Key: HIVE-24854
 URL: https://issues.apache.org/jira/browse/HIVE-24854
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Current implementation of incremental Materialized can not be used if any of 
the Materialized view source tables has update or delete operation since the 
last rebuild. In such cases a full rebuild should be performed.

Steps to enable incremental rebuild:
1. Introduce a new virtual column to mark a row deleted
2. Execute the query in the view definition 
2.a. Add filter to each table scan in order to pull only the rows from each 
source table which has a higher writeId than the writeId of the last rebuild - 
this is already implemented by current incremental rebuild
2.b Add row is deleted virtual column to each table scan. In join nodes if any 
of the branches has a deleted row the result row is also deleted.

We should distinguish two type of view definition queries: with and without 
Aggregate.

3.a No aggregate path:
Rewrite the plan of the full rebuild to a multi insert statement with two 
insert branches. One branch to insert new rows into the materialized view table 
and the second one for insert deleted rows to the materialized view delete 
delta.

3.b Aggregate path: TBD




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction

2021-03-02 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24840:
-

 Summary: Materialized View incremental rebuild produces wrong 
result set after compaction
 Key: HIVE-24840
 URL: https://issues.apache.org/jira/browse/HIVE-24840
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
('transactional'='true');
insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
NULL);

create materialized view mat1 stored as orc TBLPROPERTIES 
('transactional'='true') as 
select a,b,c from t1 where a > 0 or a is null;

delete from t1 where a = 1;

alter table t1 compact 'major';

-- Wait until compaction finished.
alter materialized view mat1 rebuild;
{code}

Expected result of query
{code}
select * from mat1;
{code}
{code}
2 two 2
NULL NULL NULL
{code}
but if incremental rebuild is enabled the result is
{code}
1 one 1
2 two 2
NULL NULL NULL
{code}

Cause: Incremental rebuild queries whether the source tables of a materialized 
view has delete or update transaction since the last rebuild from metastore 
from COMPLETED_TXN_COMPONENTS table. However when a major compaction is 
performed on the source tables the records related to these tables are deleted 
from COMPLETED_TXN_COMPONENTS.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24822) Materialized View rebuild loses materializationTime property value

2021-02-24 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24822:
-

 Summary: Materialized View rebuild loses materializationTime 
property value
 Key: HIVE-24822
 URL: https://issues.apache.org/jira/browse/HIVE-24822
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Materialized View rebuild like
{code}
alter materialized view mat1 rebuild;
{code}
updates the CreationMetadata of a org.apache.hadoop.hive.ql.metadata.Table 
object  of the materialized view but it does not copy the materializationTime 
property value from the original CreationMetadata object and updates the entry 
in the MaterializedViewCache:
{code}
} else if (desc.isUpdateCreationMetadata()) {
// We need to update the status of the creation signature
Table mvTable = context.getDb().getTable(desc.getName());
CreationMetadata cm = new 
CreationMetadata(MetaStoreUtils.getDefaultCatalog(context.getConf()),
mvTable.getDbName(), mvTable.getTableName(),
ImmutableSet.copyOf(mvTable.getCreationMetadata().getTablesUsed()));

cm.setValidTxnList(context.getConf().get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY));
context.getDb().updateCreationMetadata(mvTable.getDbName(), 
mvTable.getTableName(), cm);
mvTable.setCreationMetadata(cm);

HiveMaterializedViewsRegistry.get().createMaterializedView(context.getDb().getConf(),
 mvTable);
  }
{code}
Later when loading Materializetions using 
{code}
Hive.getValidMaterializedViews(List materializedViewTables ...) 
{code}
the materialization stored in the cache and in the metastore will be not the 
same because of the lost materializationTime.
Cache tried to be refreshed 
{code}
HiveMaterializedViewsRegistry.get().refreshMaterializedView(conf, null, 
materializedViewTable);
{code}
by passing null as oldMaterializedViewTable which leads to NullPointerException 
and CBO failure because the registry expect a non null oldMaterializedViewTable 
value:
It may drops the old MV in Metastore and also tries to update the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24820) MaterializeViewCache enables adding multiple entries of the same Materialization instance

2021-02-24 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24820:
-

 Summary: MaterializeViewCache enables adding multiple entries of 
the same Materialization instance
 Key: HIVE-24820
 URL: https://issues.apache.org/jira/browse/HIVE-24820
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24775) Incorrect null handling when rebuilding Materialized view incrementally

2021-02-15 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24775:
-

 Summary: Incorrect null handling when rebuilding Materialized view 
incrementally
 Key: HIVE-24775
 URL: https://issues.apache.org/jira/browse/HIVE-24775
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
CREATE TABLE t1 (a int, b varchar(256), c decimal(10,2), d int) STORED AS orc 
TBLPROPERTIES ('transactional'='true');

INSERT INTO t1 VALUES
 (NULL, 'null_value', 100.77, 7),
 (1, 'calvin', 978.76, 3),
 (1, 'charlie', 9.8, 1);

CREATE MATERIALIZED VIEW mat1 TBLPROPERTIES ('transactional'='true') AS
  SELECT a, b, sum(d)
  FROM t1
  WHERE c > 10.0
  GROUP BY a, b;

INSERT INTO t1 VALUES
 (NULL, 'null_value', 100.88, 8),
 (1, 'charlie', 15.8, 1);

ALTER MATERIALIZED VIEW mat1 REBUILD;

SELECT * FROM mat1
ORDER BY a, b;
{code}
View contains:
{code}
1   calvin  3
1   charlie 1
NULLnull_value  8
NULLnull_value  7
{code}
but it should contain:
{code}
1   calvin  3
1   charlie 1
NULLnull_value  15
{code}

Rows with aggregate key columns having NULL values are not aggregated because 
incremental materialized view rebuild plan is altered by 
[applyPreJoinOrderingTransforms|https://github.com/apache/hive/blob/76732ad27e139fbdef25b820a07cf35934771083/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1975]:
  IS NOT NULL filter added for each of these columns on top of the view scan 
when joining with the branch pulls the rows inserted after the last rebuild:
{code}
HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(AND(IS NULL($0), IS NULL($1)), $5, 
+($5, $2))])
  HiveFilter(condition=[OR(AND(IS NULL($0), IS NULL($1)), AND(=($0, $3), =($1, 
$4)))])
HiveJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[right], 
algorithm=[none], cost=[not available])
  HiveProject(a=[$0], b=[$1], _c2=[$2])
HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))])
  HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
  HiveProject(a=[$0], b=[$1], $f2=[$2])
HiveAggregate(group=[{0, 1}], agg#0=[sum($3)])
  HiveFilter(condition=[AND(<(1, $6.writeid), >($2, 10))])
HiveTableScan(table=[[default, t1]], table:alias=[t1])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24763) Incremental rebuild of Materialized view fails

2021-02-09 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24763:
-

 Summary: Incremental rebuild of Materialized view fails
 Key: HIVE-24763
 URL: https://issues.apache.org/jira/browse/HIVE-24763
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Exception is thrown when Materialized view definition contains aggregate 
operator with only one key:
{code}
CREATE MATERIALIZED VIEW cmv_mat_view_n5 TBLPROPERTIES ('transactional'='true') 
AS
  SELECT cmv_basetable_n5.a, sum(cmv_basetable_2_n2.d)
  FROM cmv_basetable_n5 JOIN cmv_basetable_2_n2 ON (cmv_basetable_n5.a = 
cmv_basetable_2_n2.a)
  WHERE cmv_basetable_2_n2.c > 10.0
  GROUP BY cmv_basetable_n5.a;
...
ALTER MATERIALIZED VIEW cmv_mat_view_n5 REBUILD;
{code}
{code}
java.lang.AssertionError: wrong operand count 1 for AND
at org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
at 
org.apache.calcite.sql.SqlBinaryOperator.validRexOperands(SqlBinaryOperator.java:219)
at org.apache.calcite.rex.RexCall.(RexCall.java:86)
at org.apache.calcite.rex.RexBuilder.makeCall(RexBuilder.java:251)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveAggregateIncrementalRewritingRule.onMatch(HiveAggregateIncrementalRewritingRule.java:124)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2715)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2681)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyMaterializedViewRewriting(CalcitePlanner.java:2318)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1934)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1810)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1571)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:562)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12538)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456)
at 
org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:89)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:315)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:315)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
at org.apache.hadoop.hi

[jira] [Created] (HIVE-24664) Support column aliases in Values clause

2021-01-19 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24664:
-

 Summary: Support column aliases in Values clause
 Key: HIVE-24664
 URL: https://issues.apache.org/jira/browse/HIVE-24664
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Enable explicitly specify column aliases in the first row of Values clause. If 
not all the columns has alias specified generate one.
{code:java}
values(1, 2 b, 3 c),(4, 5, 6);
{code}
{code:java}
_col1   b   c
  1 2   3
  4 5   6
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24644) QueryResultCache parses the query twice

2021-01-15 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24644:
-

 Summary: QueryResultCache parses the query twice
 Key: HIVE-24644
 URL: https://issues.apache.org/jira/browse/HIVE-24644
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Query result cache lookup results by query text which has fully resolved table 
references.
In order to generate this query text currently implementation 
* transforms the AST tree back to String
* parses the String generated in above step
* traverse the new AST and replaces the table references to the fully qualified 
ones
* transforms the new AST tree back to String -> this will be the cache key




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24635) Support Values clause as operand of set operation

2021-01-14 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24635:
-

 Summary: Support Values clause as operand of set operation
 Key: HIVE-24635
 URL: https://issues.apache.org/jira/browse/HIVE-24635
 Project: Hive
  Issue Type: Improvement
  Components: Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
VALUES (1,2),(3,4)
UNION ALL
VALUES (1,2),(7,8);
{code}
{code}
1   2
3   4
1   2
7   8
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24633) Support CTE with column labels

2021-01-13 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24633:
-

 Summary: Support CTE with column labels
 Key: HIVE-24633
 URL: https://issues.apache.org/jira/browse/HIVE-24633
 Project: Hive
  Issue Type: Improvement
  Components: Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
with cte1(a, b) as (select int_col x, bigint_col y from t1)
select a, b from cte1{code}
{code}
a   b
1   2
3   4
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24613) Support Values clause without Insert

2021-01-11 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24613:
-

 Summary: Support Values clause without Insert
 Key: HIVE-24613
 URL: https://issues.apache.org/jira/browse/HIVE-24613
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Standalone:
{code}
VALUES(1,2,3),(4,5,6);
{code}
{code}
1   2   3
4   5   6
{code}

In subquery:
{code}
SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO;
{code}
{code}
1   2   3
4   5   6
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24599) Add support vectorized two parameter trim functions

2021-01-07 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24599:
-

 Summary: Add support vectorized two parameter trim functions
 Key: HIVE-24599
 URL: https://issues.apache.org/jira/browse/HIVE-24599
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa


HIVE-24565 introduces a version of trim/ltrim/rtrim functions when the 
characters to trim can be specified as a second parameter of the function.

Two parameter version of these functions has some vectorized scenarios:
* source:  COLUMN - trim characters: SCALAR - this is supported by HIVE-24565
{code}
SELECT trim(col0, 'a');
{code}
* source:  COLUMN - trim characters: COLUMN
{code}
SELECT trim(col0, col1);
{code}
* source:  SCALAR - trim characters: COLUMN
{code}
SELECT trim('string to trim', col0);
{code}

The scope of this jira is the support of the last two scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24598) Trim function should return null if any of its parameter is null

2021-01-07 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24598:
-

 Summary: Trim function should return null if any of its parameter 
is null
 Key: HIVE-24598
 URL: https://issues.apache.org/jira/browse/HIVE-24598
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa


Hive throws exception when null is passed as a parameter of trim/rtrim/ltrim

However null should be returned. From SQL11 standard:
a) Let S be the value of the .
b) If  is specified, then let SC be the value of ; otherwise, let SC be

.

c) If at least one of S and SC is the null value, then the result of the  is the null value.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24565) Implement standard trim function

2020-12-23 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24565:
-

 Summary: Implement standard trim function
 Key: HIVE-24565
 URL: https://issues.apache.org/jira/browse/HIVE-24565
 Project: Hive
  Issue Type: Improvement
  Components: Parser, UDF
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

2020-12-23 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24564:
-

 Summary: Extend PPD filter transitivity to be able to discover new 
opportunities
 Key: HIVE-24564
 URL: https://issues.apache.org/jira/browse/HIVE-24564
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24479) Introduce setting to set lower bound of hash aggregation reduction.

2020-12-03 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24479:
-

 Summary: Introduce setting to set lower bound of hash aggregation 
reduction.
 Key: HIVE-24479
 URL: https://issues.apache.org/jira/browse/HIVE-24479
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Affects Versions: 4.0.0
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


* Default setting of hash group by min reduction % is 0.99. 
* During compilation, we check its effectiveness and adjust it accordingly in 
{{SetHashGroupByMinReduction}}:
{code}
float defaultMinReductionHashAggrFactor = desc.getMinReductionHashAggr();
float minReductionHashAggrFactor = 1f - ((float) ndvProduct / numRows);
if (minReductionHashAggrFactor < defaultMinReductionHashAggrFactor) {
  desc.setMinReductionHashAggr(minReductionHashAggrFactor);
}
{code}
For certain queries, this computation turns out to be "0".

This forces operator to skip HashAggregates completely and always ends up 
choosing streaming mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24446) Materialized View plan remove explicit cast from query

2020-11-30 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-24446:
-

 Summary: Materialized View plan remove explicit cast from query
 Key: HIVE-24446
 URL: https://issues.apache.org/jira/browse/HIVE-24446
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
('transactional'='true') as
select
  total_views `total_views`,
  sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
  program
from tv_view_data;
{code}
{code}
LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
  HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
table:alias=[mv_tv_view_data_av1])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 161 matches

Mail list logo