from:"Krisztian Kasa \(JIRA\)"

[jira] [Resolved] (HIVE-28253) Unable to set the value for hplsql.onerror in hplsql mode.

2024-05-13 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28253.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> Unable to set the value for hplsql.onerror in hplsql mode.
> --
>
> Key: HIVE-28253
> URL: https://issues.apache.org/jira/browse/HIVE-28253
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Unable to set the value for hplsql.onerror in hplsql mode.
>  
> Stesp to reproduce:
> {noformat}
> 0: jdbc:hive2://localhost> set hplsql.onerror='stop';
> . . . . . . . . . . . . . . . . . . . . . . .> /
> ERROR : Syntax error at line 1:18 no viable alternative at input 
> 'hplsql.onerror='
> ERROR : Ln:1 identifier 'SET' must be declared.
> No rows affected (0.534 seconds)
> 0: jdbc:hive2://localhost> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28247) Execute immediate 'select count(*) from tbl' throwing ClassCastException in hplsql mode.

2024-05-07 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28247.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> Execute immediate 'select count(*) from tbl' throwing ClassCastException in 
> hplsql mode.
> 
>
> Key: HIVE-28247
> URL: https://issues.apache.org/jira/browse/HIVE-28247
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Execute immediate 'select count(*) from tbl' throwing ClassCastException in 
> hplsql mode.
>  
> Steps to reproduce:
> {noformat}
> execute immediate 'SELECT count(*) from result';"{noformat}
> StackTrace in HS2 logs:
> {noformat}
> 2024-05-06T08:45:42,730 ERROR [HiveServer2-Background-Pool: Thread-850] 
> hplsql.HplSqlOperation: Error running hive query
> org.apache.hive.service.cli.HiveSQLException: Error running HPL/SQL operation
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlOperation.interpret(HplSqlOperation.java:111)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlOperation.access$500(HplSqlOperation.java:54)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlOperation$BackgroundWork.lambda$run$0(HplSqlOperation.java:207)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_292]
>at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_292]
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  ~[hadoop-common-3.3.6.jar:?]
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlOperation$BackgroundWork.run(HplSqlOperation.java:219)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_292]
>at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_292]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_292]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_292]
>at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
> Caused by: java.lang.ClassCastException: class java.lang.Long cannot be 
> casted to class java.lang.String
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlQueryExecutor$OperationRowResult.get(HplSqlQueryExecutor.java:147)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.executor.QueryResult.column(QueryResult.java:49) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Stmt.exec(Stmt.java:1095) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Exec.visitExec_stmt(Exec.java:2061) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Exec.visitExec_stmt(Exec.java:96) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.hplsql.HplsqlParser$Exec_stmtContext.accept(HplsqlParser.java:10369)
>  ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
>  ~[antlr4-runtime-4.9.3.jar:4.9.3]
>at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1103) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:96) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.hplsql.HplsqlParser$StmtContext.accept(HplsqlParser.java:1054)
>  ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
>  ~[antlr4-runtime-4.9.3.jar:4.9.3]
>at 
> org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(HplsqlBaseVisitor.java:27)
>  ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.hplsql.HplsqlParser$BlockContext.accept(HplsqlParser.java:473)
>  ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
>  ~[antlr4-runtime-4.9.3.jar:4.9.3]
>at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:999) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:96) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
>

[jira] [Resolved] (HIVE-28215) Signalling CONDITION HANDLER is not working in HPLSQL.

2024-05-06 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28215.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> Signalling CONDITION HANDLER is not working in HPLSQL.
> --
>
> Key: HIVE-28215
> URL: https://issues.apache.org/jira/browse/HIVE-28215
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Signalling CONDITION HANDLER is not working in HPLSQL.
> Refer [http://www.hplsql.org/declare-condition] and 
> [http://www.hplsql.org/declare-handler] for more details about this feature.
>  
> +Steps to Reproduce:+
> {noformat}
> jdbc:hive2://ccycloud-1.nightly-71x-oq.roo> DECLARE cnt INT DEFAULT 0; 
> . . . . . . . . . . . . . . . . . . . . . . .> DECLARE wrong_cnt_condition 
> CONDITION;
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> DECLARE EXIT HANDLER FOR 
> wrong_cnt_condition
> . . . . . . . . . . . . . . . . . . . . . . .>   PRINT 'Wrong number of 
> rows'; 
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> EXECUTE IMMEDIATE 'SELECT 
> COUNT(*) FROM sys.tbls' INTO cnt;
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> IF cnt <> 0 THEN
> . . . . . . . . . . . . . . . . . . . . . . .>   SIGNAL wrong_cnt_condition;
> . . . . . . . . . . . . . . . . . . . . . . .> END IF;
> . . . . . . . . . . . . . . . . . . . . . . .> /
> INFO  : Compiling 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b): 
> SELECT COUNT(*) FROM sys.tbls
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:bigint, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b); 
> Time taken: 0.995 seconds 
> INFO  : Completed executing 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b); 
> Time taken: 8.479 seconds
> INFO  : OK
> ERROR : wrong_cnt_condition
> No rows affected (9.559 seconds)
> 0: jdbc:hive2://localhost>{noformat}
>  
> Here when _SIGNAL wrong_cnt_condition;_ statement is executed, it has to 
> invoke corresponding continue/exit handlers and should execute the statements 
> present in the handler block. But currently its not happening.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28213) Incorrect results after insert-select from similar bucketed source & target table

2024-05-02 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842956#comment-17842956
 ] 

Krisztian Kasa commented on HIVE-28213:
---

IMHO {{hive.tez.bucket.pruning}} shouldn't be allowed while scanning external 
tables. It relies on the filenames:
[https://github.com/apache/hive/blob/636b0d3abf00afbe2cf71dc89f762acca48867ca/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L339]
[https://github.com/apache/hive/blob/636b0d3abf00afbe2cf71dc89f762acca48867ca/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2077]
Datafiles belongs to external tables are allowed to changed without Hive hence 
this optimization can lead to data correctness issues.
 

> Incorrect results after insert-select from similar bucketed source & target 
> table
> -
>
> Key: HIVE-28213
> URL: https://issues.apache.org/jira/browse/HIVE-28213
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Zoltán Rátkai
>Priority: Major
> Attachments: test.q
>
>
> Insert-select is not honoring bucketing if both source & target are bucketed 
> on same column.
> eg., 
> {code:java}
> CREATE EXTERNAL TABLE bucketing_table1 (id INT)
> CLUSTERED BY (id)
> SORTED BY (id ASC)
> INTO 32 BUCKETS stored as textfile;
> INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);
> CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;
> INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
> id=1 => murmur_hash(1) %32 should go to 29th bucket file.
> bucketing_table1 has id=1 at 29th file,
> but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
> the bucketing.
> {code:java}
> SELECT count(*) FROM bucketing_table1 WHERE id = 1;
> ===
> 1 //correct result
> SELECT count(*) FROM bucketing_table2 WHERE id = 1;   
> === 
> 0 // incorrect result
> select *, INPUT__FILE__NAME from bucketing_table1;
> +--++
> | bucketing_table1.id  |                 input__file__name                  |
> +--++
> | 2                    | /bucketing_table1/04_0 |
> | 3                    | /bucketing_table1/06_0 |
> | 5                    | /bucketing_table1/15_0 |
> | 4                    | /bucketing_table1/21_0 |
> | 1                    | /bucketing_table1/29_0 |
> +--++
> select *, INPUT__FILE__NAME from bucketing_table2;
> +-++
> | bucketing_table2.id  |                 input__file__name                  |
> +-++
> | 2           | /bucketing_table2/00_0 |
> | 3           | /bucketing_table2/01_0 |
> | 5           | /bucketing_table2/02_0 |
> | 4           | /bucketing_table2/03_0 |
> | 1           | /bucketing_table2/04_0 |
> +--++{code}
> Workaround for read: hive.tez.bucket.pruning=false;
> PS: Attaching repro file [^test.q]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28173) Issues with staging dirs with materialized views on HDFS encrypted table

2024-05-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28173.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~scarlin] for the patch and [~dkuzmenko] for review.

> Issues with staging dirs with materialized views on HDFS encrypted table
> 
>
> Key: HIVE-28173
> URL: https://issues.apache.org/jira/browse/HIVE-28173
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In the materialized view registry thread which runs in the background, there 
> are 2 issues involving staging directories on hdfs encrypted tables
> 1) The staging directory is created at compile time.  For non hdfs encrypted 
> tables, the "mkdir" flag is set to false.  There is no such flag for hdfs 
> encrypted tables.
> 2) The "FileSystem.deleteOnFileExit()" method is not called from the 
> HiveMaterializedViewRegistry thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28214) HPLSQL not using the hive variables passed through beeline using --hivevar option

2024-05-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28214.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar]  for the patch.

> HPLSQL not using the hive variables passed through beeline using --hivevar 
> option
> -
>
> Key: HIVE-28214
> URL: https://issues.apache.org/jira/browse/HIVE-28214
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HPLSQL not using the hive variables passed through beeline using --hivevar 
> option.
> Steps to reproduce:
> {noformat}
> beeline -u 
> 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' 
> --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat}
> {noformat}
> 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string;
>  . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || 
> hivetbl into hivedb_tbl;
>  . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl;
>  . . . . . . . . . . . . . . . . . . . . . . .> /
> INFO  : Compiling 
> command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): 
> SELECT CONCAT(hivedb, '.', hivetbl) 
> ERROR : FAILED: SemanticException [Error 10004]: Line 1:14 Invalid table 
> alias or column reference 'hivedb': (possible column names are: ) 
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Invalid table 
> alias or column reference 'hivedb': (possible column names are: )
>  
> INFO  : Completed compiling 
> command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); 
> Time taken: 3.976 seconds 
> ERROR : Unhandled exception in HPL/SQL 
> No rows affected (4.901 seconds)
> 0: jdbc:hive2://localhost>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28203) Fix qtest mv_iceberg_orc5.q

2024-04-18 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28203.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~zhangbutao] for the patch and [~dkuzmenko] for 
review.

> Fix qtest mv_iceberg_orc5.q
> ---
>
> Key: HIVE-28203
> URL: https://issues.apache.org/jira/browse/HIVE-28203
> Project: Hive
>  Issue Type: Task
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5190/3/tests]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5195/4/tests]
>  
> Flaky CI report:
> [http://ci.hive.apache.org/job/hive-flaky-check/837/testReport/] 
>  
> {code:java}
> Execution succeeded but contained differences (error code = 1) after 
> executing mv_iceberg_orc5.q 
> 101c101
> < HiveJoin(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
> FROM($1, $6))], joinType=[right], algorithm=[BucketJoin], cost=[not 
> available])
> ---
> > HiveJoin(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
> > FROM($1, $6))], joinType=[right], algorithm=[none], cost=[not available])
> 106c106
> <   HiveJoin(condition=[=($0, $3)], joinType=[inner], 
> algorithm=[CommonJoin], cost=[not available])
> ---
> >   HiveJoin(condition=[=($0, $3)], joinType=[inner], 
> > algorithm=[none], cost=[not available]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-28203) Fix qtest mv_iceberg_orc5.q

2024-04-18 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-28203:
-

Assignee: Butao Zhang

> Fix qtest mv_iceberg_orc5.q
> ---
>
> Key: HIVE-28203
> URL: https://issues.apache.org/jira/browse/HIVE-28203
> Project: Hive
>  Issue Type: Task
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5190/3/tests]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5195/4/tests]
>  
> Flaky CI report:
> [http://ci.hive.apache.org/job/hive-flaky-check/837/testReport/] 
>  
> {code:java}
> Execution succeeded but contained differences (error code = 1) after 
> executing mv_iceberg_orc5.q 
> 101c101
> < HiveJoin(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
> FROM($1, $6))], joinType=[right], algorithm=[BucketJoin], cost=[not 
> available])
> ---
> > HiveJoin(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
> > FROM($1, $6))], joinType=[right], algorithm=[none], cost=[not available])
> 106c106
> <   HiveJoin(condition=[=($0, $3)], joinType=[inner], 
> algorithm=[CommonJoin], cost=[not available])
> ---
> >   HiveJoin(condition=[=($0, $3)], joinType=[inner], 
> > algorithm=[none], cost=[not available]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28127) Exception when rebuilding materialized view with calculated columns on iceberg sources

2024-04-16 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28127.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~dkuzmenko] for review.

> Exception when rebuilding materialized view with calculated columns on 
> iceberg sources
> --
>
> Key: HIVE-28127
> URL: https://issues.apache.org/jira/browse/HIVE-28127
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='1');
> insert into tbl_ice values (1, 'one', 50), (4, 'four', 53), (5, 'five', 54);
> create materialized view iceberg_mat2 stored by iceberg stored as orc 
> tblproperties ('format-version'='2') as
> select tbl_ice.b, sum(tbl_ice.c), count(tbl_ice.c), avg(tbl_ice.c)
> from tbl_ice
> group by tbl_ice.b;
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54);
> alter materialized view iceberg_mat2 rebuild;
> {code}
> {code}
>  org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
> reference '_c3'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13598)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13540)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4931)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4719)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11554)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11496)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12432)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12298)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:634)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13162)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>   at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:178)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestIcebergLlapLocalCliDriver.testCliDriver(TestIcebergLlapLocalCliDriver.java:60)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at

[jira] [Commented] (HIVE-28082) HiveAggregateReduceFunctionsRule could generate an inconsistent result

2024-04-15 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837244#comment-17837244
 ] 

Krisztian Kasa commented on HIVE-28082:
---

Some more details about the issue:
{code}
explain cbo
select avg('text');
select avg('text');
{code}
{{avg('text')}} is converted to {{sum('text')/count('text')}}
{code}
HiveProject(_o__c0=[/($0, $1)])
  HiveAggregate(group=[{}], agg#0=[sum($0)], agg#1=[count()])
HiveProject($f0=[_UTF-16LE'text':VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
  HiveTableScan(table=[[_dummy_database, _dummy_table]], 
table:alias=[_dummy_table])
{code}
and {{sum('text')}} throws an exception at execution time and logged as a 
warning:
{code}
 2024-04-15T04:47:57,568  WARN [TezTR-671313_1_1_0_0_0] generic.GenericUDAFSum: 
GenericUDAFSumDouble java.lang.NumberFormatException: For input string: "text"
at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:867)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumDouble.iterate(GenericUDAFSum.java:444)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:215)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:620)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:792)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:701)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

2024-04-15T04:47:57,568  WARN [TezTR-671313_1_1_0_0_0] generic.GenericUDAFSum: 
GenericUDAFSumDouble ignoring similar exceptions.
{code}

Similar when CBO is turned off
{code}
set hive.cbo.enable=false;
select avg('text');
{code}
{code}
024-04-15T04:55:29,444  WARN [TezTR-126305_1_1_0_0_0] 
generic.GenericUDAFAverage: Ignoring similar exceptions
java.lang.NumberFormatException: For input string: "text"
at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043) 
~[?:1.8.0_301]
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) 
~[?:1.8.0_301]
at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_301]
at

[jira] [Assigned] (HIVE-28173) Issues with staging dirs with materialized views on HDFS encrypted table

2024-04-12 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-28173:
-

Assignee: Steve Carlin

> Issues with staging dirs with materialized views on HDFS encrypted table
> 
>
> Key: HIVE-28173
> URL: https://issues.apache.org/jira/browse/HIVE-28173
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>
> In the materialized view registry thread which runs in the background, there 
> are 2 issues involving staging directories on hdfs encrypted tables
> 1) The staging directory is created at compile time.  For non hdfs encrypted 
> tables, the "mkdir" flag is set to false.  There is no such flag for hdfs 
> encrypted tables.
> 2) The "FileSystem.deleteOnFileExit()" method is not called from the 
> HiveMaterializedViewRegistry thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28126) Use added record count in cost model when rebuilding materialized view stored by iceberg

2024-04-09 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28126.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~dkuzmenko]  and [~okumin] for review.

> Use added record count  in cost model when rebuilding materialized view 
> stored by iceberg
> -
>
> Key: HIVE-28126
> URL: https://issues.apache.org/jira/browse/HIVE-28126
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> There is a cost based decision when rebuilding materialized views having 
> aggregate.
> Currently the cost model has the total count of rows of source table however 
> incremental rebuild plan scans only the rows inserted since the last MV 
> rebuild.
> The goal is to update the row count in the cost model in case of Iceberg 
> source tables with the sum of the {{added-records}} stored in snapshots 
> summary since the last Mv rebuild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27366) Incorrect incremental rebuild mode shown of materialized view with Iceberg sources

2024-04-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27366:
--
Parent: HIVE-26497
Issue Type: Sub-task  (was: Bug)

> Incorrect incremental rebuild mode shown of materialized view with Iceberg 
> sources
> --
>
> Key: HIVE-27366
> URL: https://issues.apache.org/jira/browse/HIVE-27366
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> CREATE TABLE shtb_test1(KEY INT, VALUE STRING) PARTITIONED BY(ds STRING)
> stored by iceberg stored as orc tblproperties ('format-version'='2');
> CREATE MATERIALIZED VIEW shtb_test1_view1 stored by iceberg stored as orc 
> tblproperties ('format-version'='1') AS
> SELECT * FROM shtb_test1 where KEY > 1000 and KEY < 2000;
> SHOW MATERIALIZED VIEWS;
> {code}
> {code}
> # MV Name Rewriting Enabled   Mode
> Incremental rebuild 
> shtb_test1_view1  Yes Manual refresh  
> Available   
> {code}
> It should be 
> {code}
> # MV Name Rewriting Enabled   Mode
> Incremental rebuild 
> shtb_test1_view1  Yes Manual refresh  
> Available in presence of insert operations only
> {code}
> because deleted rows can not be identified in case of Iceberg source tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27307) NPE when generating incremental rebuild plan of materialized view with empty Iceberg source table

2024-04-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27307:
--
Parent: HIVE-26497
Issue Type: Sub-task  (was: Bug)

> NPE when generating incremental rebuild plan of materialized view with empty 
> Iceberg source table
> -
>
> Key: HIVE-27307
> URL: https://issues.apache.org/jira/browse/HIVE-27307
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='1');
> create external table tbl_ice_v2(d int, e string, f int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice_v2 values (1, 'one v2', 50), (4, 'four v2', 53), (5, 
> 'five v2', 54);
> create materialized view mat1 as
> select tbl_ice.b, tbl_ice.c, tbl_ice_v2.e from tbl_ice join tbl_ice_v2 on 
> tbl_ice.a=tbl_ice_v2.d where tbl_ice.c > 52;
> -- insert some new values to one of the source tables
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54);
> alter materialized view mat1 rebuild;
> {code}
> {code}
> 2023-04-28T07:34:17,949  WARN [1fb94a8e-8d75-4a1f-8f44-a5beaa8aafb6 Listener 
> at 0.0.0.0/36857] rebuild.AlterMaterializedViewRebuildAnalyzer: Exception 
> loading materialized views
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidMaterializedViews(Hive.java:2298)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getMaterializedViewForRebuild(Hive.java:2204)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyMaterializedViewRewriting(AlterMaterializedViewRebuildAnalyzer.java:215)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1722)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1591)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1343)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12824)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:135)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:326)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at

[jira] [Updated] (HIVE-27101) Support incremental materialized view rebuild when Iceberg source tables have insert operation only.

2024-04-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27101:
--
Parent: HIVE-26497
Issue Type: Sub-task  (was: Improvement)

> Support incremental materialized view rebuild when Iceberg source tables have 
> insert operation only.
> 
>
> Key: HIVE-27101
> URL: https://issues.apache.org/jira/browse/HIVE-27101
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27867) Incremental materialized view throws NPE whew Iceberg source table is empty

2024-04-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27867:
--
Parent: HIVE-26497
Issue Type: Sub-task  (was: Bug)

> Incremental materialized view throws NPE whew Iceberg source table is empty
> ---
>
> Key: HIVE-27867
> URL: https://issues.apache.org/jira/browse/HIVE-27867
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: iceberg, materializedviews, pull-request-available
> Fix For: 4.0.0
>
>
> Repro
> https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/test/queries/positive/mv_iceberg_orc.q
> in hive.log
> {code}
> 2023-11-09T05:17:05,625  WARN [e35c7637-b0ba-4e30-8448-5bdc0d0e4779 main] 
> rebuild.AlterMaterializedViewRebuildAnalyzer: Exception loading materialized 
> views
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidMaterializedViews(Hive.java:2321)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getMaterializedViewForRebuild(Hive.java:2227)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyMaterializedViewRewriting(AlterMaterializedViewRebuildAnaly
> zer.java:215) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1700)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1569)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1321)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13113)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:135)
>  ~[hive-exec-4.0.0-beta-2-SNAPSH
> OT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
>

[jira] [Updated] (HIVE-26924) Alter materialized view enable rewrite throws SemanticException for source iceberg table

2024-04-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-26924:
--
Parent: HIVE-26497
Issue Type: Sub-task  (was: Bug)

> Alter materialized view enable rewrite throws SemanticException for source 
> iceberg table
> 
>
> Key: HIVE-26924
> URL: https://issues.apache.org/jira/browse/HIVE-26924
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: Dharmik Thakkar
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> alter materialized view enable rewrite throws SemanticException for source 
> iceberg table
> SQL test
> {code:java}
> >>> create materialized view mv_rewrite as select t, si from all100k where 
> >>> t>115;
> >>> analyze table mv_rewrite compute statistics for columns;
> >>> set hive.explain.user=false;
> >>> explain select si,t from all100k where t>116 and t<120;
> !!! match row_contains
>   alias: iceberg_test_db_hive.mv_rewrite
> >>> alter materialized view mv_rewrite disable rewrite;
> >>> explain select si,t from all100k where t>116 and t<120;
> !!! match row_contains
>   alias: all100k
> >>> alter materialized view mv_rewrite enable rewrite;
> >>> explain select si,t from all100k where t>116 and t<120;
> !!! match row_contains
>   alias: iceberg_test_db_hive.mv_rewrite
> >>> drop materialized view mv_rewrite; {code}
>  
> Error
> {code:java}
> 2023-01-10T18:40:34,303 INFO  [pool-3-thread-1] jdbc.TestDriver: Query: alter 
> materialized view mv_rewrite enable rewrite
> 2023-01-10T18:40:34,365 INFO  [Thread-10] jdbc.TestDriver: INFO  : Compiling 
> command(queryId=hive_20230110184034_f557b4a6-40a0-42ba-8e67-2f273f50af36): 
> alter materialized view mv_rewrite enable rewrite
> 2023-01-10T18:40:34,426 INFO  [Thread-10] jdbc.TestDriver: ERROR : FAILED: 
> SemanticException Automatic rewriting for materialized view cannot be enabled 
> if the materialized view uses non-transactional tables
> 2023-01-10T18:40:34,426 INFO  [Thread-10] jdbc.TestDriver: 
> org.apache.hadoop.hive.ql.parse.SemanticException: Automatic rewriting for 
> materialized view cannot be enabled if the materialized view uses 
> non-transactional tables
> 2023-01-10T18:40:34,426 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rewrite.AlterMaterializedViewRewriteAnalyzer.analyzeInternal(AlterMaterializedViewRewriteAnalyzer.java:75)
> 2023-01-10T18:40:34,426 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:313)
> 2023-01-10T18:40:34,427 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:222)
> 2023-01-10T18:40:34,427 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> 2023-01-10T18:40:34,427 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:201)
> 2023-01-10T18:40:34,427 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:657)
> 2023-01-10T18:40:34,427 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:603)
> 2023-01-10T18:40:34,427 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:597)
> 2023-01-10T18:40:34,427 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
> 2023-01-10T18:40:34,427 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
> 2023-01-10T18:40:34,428 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
> 2023-01-10T18:40:34,428 INFO  [Thread-10] jdbc.TestDriver:      at 
> java.base/java.security.AccessController.doPrivileged(Native Method)
> 2023-01-10T18:40:34,428 INFO  [Thread-10] jdbc.TestDriver:      at 
> java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> 2023-01-10T18:40:34,428 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> 2023-01-10T18:40:34,428 INFO  [Thread-10] jdbc.TestDriver:      at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> 2023-01-10T18:40:34,428 INFO  [Thread-10] jdbc.TestDriver:      at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>

[jira] [Updated] (HIVE-28127) Exception when rebuilding materialized view with calculated columns on iceberg sources

2024-04-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28127:
--
Parent: HIVE-26497
Issue Type: Sub-task  (was: Bug)

> Exception when rebuilding materialized view with calculated columns on 
> iceberg sources
> --
>
> Key: HIVE-28127
> URL: https://issues.apache.org/jira/browse/HIVE-28127
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='1');
> insert into tbl_ice values (1, 'one', 50), (4, 'four', 53), (5, 'five', 54);
> create materialized view iceberg_mat2 stored by iceberg stored as orc 
> tblproperties ('format-version'='2') as
> select tbl_ice.b, sum(tbl_ice.c), count(tbl_ice.c), avg(tbl_ice.c)
> from tbl_ice
> group by tbl_ice.b;
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54);
> alter materialized view iceberg_mat2 rebuild;
> {code}
> {code}
>  org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
> reference '_c3'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13598)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13540)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4931)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4719)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11554)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11496)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12432)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12298)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:634)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13162)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>   at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:178)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestIcebergLlapLocalCliDriver.testCliDriver(TestIcebergLlapLocalCliDriver.java:60)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
>

[jira] [Updated] (HIVE-28126) Use added record count in cost model when rebuilding materialized view stored by iceberg

2024-04-02 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28126:
--
Parent: HIVE-26497
Issue Type: Sub-task  (was: Improvement)

> Use added record count  in cost model when rebuilding materialized view 
> stored by iceberg
> -
>
> Key: HIVE-28126
> URL: https://issues.apache.org/jira/browse/HIVE-28126
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> There is a cost based decision when rebuilding materialized views having 
> aggregate.
> Currently the cost model has the total count of rows of source table however 
> incremental rebuild plan scans only the rows inserted since the last MV 
> rebuild.
> The goal is to update the row count in the cost model in case of Iceberg 
> source tables with the sum of the {{added-records}} stored in snapshots 
> summary since the last Mv rebuild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28127) Exception when rebuilding materialized view with calculated columns on iceberg sources

2024-03-19 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-28127:
-

 Summary: Exception when rebuilding materialized view with 
calculated columns on iceberg sources
 Key: HIVE-28127
 URL: https://issues.apache.org/jira/browse/HIVE-28127
 Project: Hive
  Issue Type: Bug
  Components: Iceberg integration, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create external table tbl_ice(a int, b string, c int) stored by iceberg stored 
as orc tblproperties ('format-version'='1');

insert into tbl_ice values (1, 'one', 50), (4, 'four', 53), (5, 'five', 54);

create materialized view iceberg_mat2 stored by iceberg stored as orc 
tblproperties ('format-version'='2') as
select tbl_ice.b, sum(tbl_ice.c), count(tbl_ice.c), avg(tbl_ice.c)
from tbl_ice
group by tbl_ice.b;

insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
(4, 'four', 53), (5, 'five', 54);

alter materialized view iceberg_mat2 rebuild;
{code}
{code}
 org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
reference '_c3'
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13598)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13540)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4931)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4719)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11554)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11496)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12432)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12298)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:634)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13162)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
at 
org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:178)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestIcebergLlapLocalCliDriver.testCliDriver(TestIcebergLlapLocalCliDriver.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at

[jira] [Updated] (HIVE-28126) Use added record count in cost model when rebuilding materialized view stored by iceberg

2024-03-19 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28126:
--
Summary: Use added record count  in cost model when rebuilding materialized 
view stored by iceberg  (was: Use added row count in cost model when rebuilding 
materialized view stored by iceberg)

> Use added record count  in cost model when rebuilding materialized view 
> stored by iceberg
> -
>
> Key: HIVE-28126
> URL: https://issues.apache.org/jira/browse/HIVE-28126
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> There is a cost based decision when rebuilding materialized views having 
> aggregate.
> Currently the cost model has the total count of rows of source table however 
> incremental rebuild plan scans only the rows inserted since the last MV 
> rebuild.
> The goal is to update the row count in the cost model in case of Iceberg 
> source tables with the sum of the {{added-records}} stored in snapshots 
> summary since the last Mv rebuild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28126) Use added row count in cost model when rebuilding materialized view stored by iceberg

2024-03-19 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-28126:
-

 Summary: Use added row count in cost model when rebuilding 
materialized view stored by iceberg
 Key: HIVE-28126
 URL: https://issues.apache.org/jira/browse/HIVE-28126
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


There is a cost based decision when rebuilding materialized views having 
aggregate.
Currently the cost model has the total count of rows of source table however 
incremental rebuild plan scans only the rows inserted since the last MV rebuild.

The goal is to update the row count in the cost model in case of Iceberg source 
tables with the sum of the {{added-records}} stored in snapshots summary since 
the last Mv rebuild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28006) Materialized view with aggregate function incorrectly shows it allows incremental rebuild

2024-03-11 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28006.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~abstractdog] and [~amansinha100] for the review.

> Materialized view with aggregate function incorrectly shows it allows 
> incremental rebuild
> -
>
> Key: HIVE-28006
> URL: https://issues.apache.org/jira/browse/HIVE-28006
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0, 4.0.0-beta-1, 4.1.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table store_sales (
>   ss_sold_date_sk int,
>   ss_ext_sales_price int,
>   ss_customer_sk int
> ) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into store_sales (ss_sold_date_sk, ss_ext_sales_price, ss_customer_sk) 
> values (2, 2, 2);
> create materialized view mat1 stored as orc tblproperties 
> ('format-version'='2') as
> select ss_customer_sk
>   ,min(ss_ext_sales_price)
>   ,count(*)
>  from store_sales
>  group by ss_customer_sk;
> delete from store_sales where ss_sold_date_sk = 1;
> show materialized views;
> explain cbo
> alter materialized view mat1 rebuild;
> {code}
> Incremental rebuild is available
> {code}
> # MV Name Rewriting Enabled   Mode
> Incremental rebuild 
> mat1  Yes Manual refresh  
> Available   
> {code}
> vs full rebuild plan
> {code}
> CBO PLAN:
> HiveAggregate(group=[{2}], agg#0=[min($1)], agg#1=[count()])
>   HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28098) Fails to copy empty column statistics of materialized CTE

2024-03-11 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28098:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged to master. Thanks [~okumin] for the patch.

> Fails to copy empty column statistics of materialized CTE
> -
>
> Key: HIVE-28098
> URL: https://issues.apache.org/jira/browse/HIVE-28098
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-28080 introduced the optimization of materialized CTEs, but it turned 
> out that it failed when statistics were empty.
> This query reproduces the issue.
> {code:java}
> set hive.stats.autogather=false;
> CREATE TABLE src_no_stats AS SELECT '123' as key, 'val123' as value UNION ALL 
> SELECT '9' as key, 'val9' as value;
> set hive.optimize.cte.materialize.threshold=2;
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> EXPLAIN WITH materialized_cte1 AS (
>   SELECT * FROM src_no_stats
> ),
> materialized_cte2 AS (
>   SELECT a.key
>   FROM materialized_cte1 a
>   JOIN materialized_cte1 b ON (a.key = b.key)
> )
> SELECT a.key
> FROM materialized_cte2 a
> JOIN materialized_cte2 b ON (a.key = b.key); {code}
> It throws an error.
> {code:java}
> Error: Error while compiling statement: FAILED: IllegalStateException The 
> size of col stats must be equal to that of schema. Stats = [], Schema = [key] 
> (state=42000,code=4) {code}
> Attaching a debugger, FSO of materialized_cte2 has empty stats as 
> JoinOperator loses stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28076) Selecting data from a bucketed table with decimal column type throwing NPE.

2024-03-05 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28076.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> Selecting data from a bucketed table with decimal column type throwing NPE.
> ---
>
> Key: HIVE-28076
> URL: https://issues.apache.org/jira/browse/HIVE-28076
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> selecting data from a bucketed table with decimal bucket column type throwing 
> NPE.
> Steps to reproduce:
> {noformat}
> create table bucket_table(id decimal(38,0), name string) clustered by(id) 
> into 3 buckets;
> insert into bucket_table values(5999640711, 'Cloud');
> select * from bucket_table bt where id = 5999640711;{noformat}
> HS2 log contains NPE:
> {noformat}
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCodeMurmur(ObjectInspectorUtils.java:889)
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getBucketHashCode(ObjectInspectorUtils.java:805)
>at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getBucketNumber(ObjectInspectorUtils.java:638)
>at 
> org.apache.hadoop.hive.ql.optimizer.FixedBucketPruningOptimizer$BucketBitsetGenerator.generatePredicate(FixedBucketPruningOptimizer.java:225)
>at 
> org.apache.hadoop.hive.ql.optimizer.PrunerOperatorFactory$FilterPruner.process(PrunerOperatorFactory.java:87)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>at 
> org.apache.hadoop.hive.ql.optimizer.PrunerUtils.walkOperatorTree(PrunerUtils.java:84)
>at 
> org.apache.hadoop.hive.ql.optimizer.FixedBucketPruningOptimizer.transform(FixedBucketPruningOptimizer.java:331)
>at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:249)
>at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12995)
>at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:303)
>at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
>at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194)
>at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:621)
>at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:567)
>at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561)
>at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
>at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:260)
>at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:204)
>at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:130)
>at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:429)
>at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:360)
>at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:857)
>at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:827)
>at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:191)
>at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:498)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>at 
>

[jira] [Resolved] (HIVE-27490) HPL/SQL says it support default value for parameters but not considering them when no value is passed

2024-02-29 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27490.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> HPL/SQL says it support default value for parameters but not considering them 
> when no value is passed
> -
>
> Key: HIVE-27490
> URL: https://issues.apache.org/jira/browse/HIVE-27490
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HPL/SQL says it support default value for parameters but not considering them 
> when no value is passed.
> {noformat}
> CREATE OR replace PROCEDURE test123(a NUMBER DEFAULT -110)
> AS
> BEGIN
> dbms_output.put_line (a);
> end;{noformat}
> Oracle shows the default value-
> {noformat}
> SQL> call test123();
> -110{noformat}
> Hive shows the variable name instead of the default value-
> {noformat}
> call test123();
> INFO : a{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28080) Propagate statistics from a source table to the materialized CTE

2024-02-27 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28080:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged to master. Thanks [~okumin] for the patch.

> Propagate statistics from a source table to the materialized CTE
> 
>
> Key: HIVE-28080
> URL: https://issues.apache.org/jira/browse/HIVE-28080
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 2.3.8, 3.1.3, 4.0.0-beta-1
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Hive doesn't fill in the statistics of materialized CTEs, and the size of 
> those TableScans is underestimated. That causes Tez to run with fewer tasks 
> or to fail with OOM because MapJoin could be wrongly applied.
>  
> The following example shows Map 1 reading `src` generates 493 rows, but Map 3 
> reading `cte` is expected to scan only 1 row.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> EXPLAIN WITH cte AS (
> . . . . . . . . . . . . . . . . . . . . . . .>   SELECT * FROM src
> . . . . . . . . . . . . . . . . . . . . . . .> )
> . . . . . . . . . . . . . . . . . . . . . . .> SELECT *
> . . . . . . . . . . . . . . . . . . . . . . .> FROM cte a
> . . . . . . . . . . . . . . . . . . . . . . .> JOIN cte b ON (a.key = b.key)
> . . . . . . . . . . . . . . . . . . . . . . .> JOIN cte c ON (a.key = c.key);
> ...
> ++
> |                      Explain                       |
> ++
> | Plan optimized by CBO.                             |
> |                                                    |
> | Vertex dependency in Stage-4                       |
> | Map 2 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE) |
> | Reducer 4 <- Map 3 (SIMPLE_EDGE)                   |
> |                                                    |
> | Stage-3                                            |
> |   Fetch Operator                                   |
> |     limit:-1                                       |
> |     Stage-4                                        |
> |       Map 2 vectorized                             |
> |       File Output Operator [FS_69]                 |
> |         Map Join Operator [MAPJOIN_68] (rows=1 width=444) |
> |           
> Conds:MAPJOIN_67._col0=RS_61._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5"]
>  |
> |         <-Map 3 [BROADCAST_EDGE] vectorized        |
> |           SHUFFLE [RS_61]                          |
> |             PartitionCols:_col0                    |
> |             Select Operator [SEL_60] (rows=1 width=368) |
> |               Output:["_col0","_col1"]             |
> |               Filter Operator [FIL_59] (rows=1 width=368) |
> |                 predicate:key is not null          |
> |                 TableScan [TS_11] (rows=1 width=368) |
> |                   
> default@cte,c,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
> |         <-Map Join Operator [MAPJOIN_67] (rows=1 width=404) |
> |             
> Conds:SEL_66._col0=RS_64._col0(Inner),Output:["_col0","_col1","_col2","_col3"]
>  |
> |           <-Reducer 4 [BROADCAST_EDGE] vectorized  |
> |             BROADCAST [RS_64]                      |
> |               PartitionCols:_col0                  |
> |               Select Operator [SEL_63]             |
> |                 Output:["_col0","_col1"]           |
> |           <-Select Operator [SEL_66] (rows=1 width=368) |
> |               Output:["_col0","_col1"]             |
> |               Filter Operator [FIL_65] (rows=1 width=368) |
> |                 predicate:key is not null          |
> |                 TableScan [TS_5] (rows=1 width=368) |
> |                   
> default@cte,a,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
> |         Stage-2                                    |
> |           Dependency Collection{}                  |
> |             Stage-1                                |
> |               Map 1 vectorized                     |
> |               File Output Operator [FS_4]          |
> |                 table:{"name:":"default.cte"}      |
> |                 Select Operator [SEL_3] (rows=493 width=350) |
> |                   Output:["_col0","_col1"]         |
> |                   TableScan [TS_0] (rows=493 width=350) |
> |                     
> default@src,src,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
> |         Stage-0                                    |
> |           Move Operator                            |
> |              Please refer to the previous Stage-1  |
> |

[jira] [Updated] (HIVE-27636) Exception in HiveMaterializedViewsRegistry is leaving staging directories behind

2024-02-22 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27636:
--
Description: 
In case of any exception while query parsing in 
`HiveMaterializedViewsRegistry.createMaterialization`, we bail out and there is 
no hdfs dir cleanup until JVM exit. This leaves behind the staging directories. 
For a long-running HS2, these staging directories keeps on increasing and can 
cause limit reached exception.
{code:java}
Error: Error while compiling statement: FAILED: RuntimeException Cannot create 
staging directory 
'hdfs://aidaprd01/warehouse/tablespace/managed/hive/test.db/testTable/.hive-staging_hive_2023-08-05_06-17-06_711_5516272990801215078-168329:
 The directory item limit of 
/warehouse/tablespace/managed/hive/test.db/testTable is exceeded: limit=1048576 
items=1048576 {code}
We should do hdfs directory cleanup for `HiveMaterializedViewsRegistry` thread 
[here|https://github.com/apache/hive/blob/1a574783afee13e33ecf3ed6fc60bdc94fe47bb1/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java#L231]

  was:
In case of any exception while query parsing in 
`HiveMaterializedViewsRegistry.createMaterialization`, we bail out and there is 
no hdfs dir cleanup until JVM exit. This leaves behind the staging directories. 
For a long-running HS2, these staging directories keeps on increasing and can 
cause limit reached exception.
{code:java}
Error: Error while compiling statement: FAILED: RuntimeException Cannot create 
staging directory 
'hdfs://aidaprd01/warehouse/tablespace/managed/hive/test.db/testTable/.hive-staging_hive_2023-08-05_06-17-06_711_5516272990801215078-168329:
 The directory item limit of 
/warehouse/tablespace/managed/hive/test.db/testTable is exceeded: limit=1048576 
items=1048576 {code}
We should do hdfs directory cleanup for `HiveMaterializedViewsRegistry` thread 
[here|https://github.infra.cloudera.com/CDH/hive/blob/39b9e39e5167c8fcd35683f8e9e2c9a89fe86555/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java#L226]


> Exception in HiveMaterializedViewsRegistry is leaving staging directories 
> behind
> 
>
> Key: HIVE-27636
> URL: https://issues.apache.org/jira/browse/HIVE-27636
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Riju Trivedi
>Priority: Major
>
> In case of any exception while query parsing in 
> `HiveMaterializedViewsRegistry.createMaterialization`, we bail out and there 
> is no hdfs dir cleanup until JVM exit. This leaves behind the staging 
> directories. For a long-running HS2, these staging directories keeps on 
> increasing and can cause limit reached exception.
> {code:java}
> Error: Error while compiling statement: FAILED: RuntimeException Cannot 
> create staging directory 
> 'hdfs://aidaprd01/warehouse/tablespace/managed/hive/test.db/testTable/.hive-staging_hive_2023-08-05_06-17-06_711_5516272990801215078-168329:
>  The directory item limit of 
> /warehouse/tablespace/managed/hive/test.db/testTable is exceeded: 
> limit=1048576 items=1048576 {code}
> We should do hdfs directory cleanup for `HiveMaterializedViewsRegistry` 
> thread 
> [here|https://github.com/apache/hive/blob/1a574783afee13e33ecf3ed6fc60bdc94fe47bb1/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java#L231]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2024-02-21 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27924:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, hive-4.1.0-must, known_issue, 
> materializedviews, pull-request-available
> Fix For: 4.1.0
>
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based query rewrite

[jira] [Commented] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2024-02-21 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819506#comment-17819506
 ] 

Krisztian Kasa commented on HIVE-27924:
---

Merged to master. Thanks [~dkuzmenko] for review the patch and [~wenhaoli] for 
the detailed repro steps.

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, hive-4.1.0-must, known_issue, 
> materializedviews, pull-request-available
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based

[jira] [Resolved] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-11 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28050.
---
Resolution: Fixed

Merged to master. Thanks [~amansinha100] and [~zabetak] for review.

> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> right joined with the records present in the view.
> The join keys should be the unique columns of each table in the definition 
> query but we can not determine which are those columns.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.
> Current implementation leads to data correctness issues:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'bob', 10.30, 2),
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> delete from cmv_basetable_n6 where b = 'bob';
> explain cbo
> alter materialized view cmv_mat_view_n6 rebuild;
> alter materialized view cmv_mat_view_n6 rebuild;
> select * from cmv_mat_view_n6;
> {code}
> {code}
> 3 978.76
> 3 978.76
> {code}
> but it should be
> {code}
> 1 10.30
> 3 978.76
> 3 978.76
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-07 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815523#comment-17815523
 ] 

Krisztian Kasa commented on HIVE-28050:
---

[~zabetak]
Another example without join:
{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table t1 (a int, b int) stored as orc TBLPROPERTIES 
('transactional'='true');

insert into t1 values
(3, 3),
(2, 1),
(2, 2),
(1, 2),
(1, 1);

CREATE MATERIALIZED VIEW mat1
  TBLPROPERTIES ('transactional'='true') AS
SELECT a
FROM t1
WHERE b < 10;

delete from t1 where b = 2;

alter materialized view mat1 rebuild;

SELECT a
FROM t1
WHERE b < 10;
{code}
{code}
3
{code}
vs
{code}
drop materialized view mat1;

SELECT a
FROM t1
WHERE b < 10;
{code}
{code}
3
2
1
{code}

> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> right joined with the records present in the view.
> The join keys should be the unique columns of each table in the definition 
> query but we can not determine which are those columns.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.
> Current implementation leads to data correctness issues:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'bob', 10.30, 2),
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> delete from cmv_basetable_n6 where b = 'bob';
> explain cbo
> alter materialized view cmv_mat_view_n6 rebuild;
> alter materialized view cmv_mat_view_n6 rebuild;
> select * from cmv_mat_view_n6;
> {code}
> {code}
> 3 978.76
> 3 978.76
> {code}
> but it should be
> {code}
> 1 10.30
> 3 978.76
> 3 978.76
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-07 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815276#comment-17815276
 ] 

Krisztian Kasa commented on HIVE-28050:
---

[~zabetak]
Updated the description with an example.
The current implementation should check whether there are unique columns 
projected from each source table in the view definition query. Unfortunately we 
can not identify unique columns. If a method is found to do it the feature can 
be restored in the future.

> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> right joined with the records present in the view.
> The join keys should be the unique columns of each table in the definition 
> query but we can not determine which are those columns.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.
> Current implementation leads to data correctness issues:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'bob', 10.30, 2),
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> delete from cmv_basetable_n6 where b = 'bob';
> explain cbo
> alter materialized view cmv_mat_view_n6 rebuild;
> alter materialized view cmv_mat_view_n6 rebuild;
> select * from cmv_mat_view_n6;
> {code}
> {code}
> 3 978.76
> 3 978.76
> {code}
> but it should be
> {code}
> 1 10.30
> 3 978.76
> 3 978.76
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-07 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28050:
--
Description: 
To support incremental rebuild of materialized views which definition does not 
have aggregate in presence of delete operations in any of its source tables the 
records of the source tables need to be uniquely identified and right joined 
with the records present in the view.
The join keys should be the unique columns of each table in the definition 
query but we can not determine which are those columns.

One possibility is to project ROW_IDs of each source table in the view 
definition but the writeId component is changing at delete.

Another way is to project columns of primary keys or unique keys but these 
constraints are not enforced in Hive.


Current implementation leads to data correctness issues:
{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');

insert into cmv_basetable_n6 values
 (1, 'bob', 10.30, 2),
 (1, 'alfred', 10.30, 2),
 (2, 'bob', 3.14, 3),
 (2, 'bonnie', 172342.2, 3),
 (3, 'calvin', 978.76, 3),
 (3, 'charlie', 9.8, 1);

create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');

insert into cmv_basetable_2_n3 values
 (1, 'alfred', 10.30, 2),
 (3, 'calvin', 978.76, 3);

CREATE MATERIALIZED VIEW cmv_mat_view_n6
  TBLPROPERTIES ('transactional'='true') AS
  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
  WHERE cmv_basetable_2_n3.c > 10.0;

delete from cmv_basetable_n6 where b = 'bob';

explain cbo
alter materialized view cmv_mat_view_n6 rebuild;
alter materialized view cmv_mat_view_n6 rebuild;

select * from cmv_mat_view_n6;
{code}

{code}
3   978.76
3   978.76
{code}

but it should be
{code}
1   10.30
3   978.76
3   978.76
{code}

  was:
To support incremental rebuild of materialized views which definition does not 
have aggregate in presence of delete operations in any of its source tables the 
records of the source tables need to be uniquely identified and joined with the 
records present in the view.

One possibility is to project ROW_IDs of each source table in the view 
definition but the writeId component is changing at delete.

Another way is to project columns of primary keys or unique keys but these 
constraints are not enforced in Hive.

Current implementation leads to data correctness issues:
{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');

insert into cmv_basetable_n6 values
 (1, 'bob', 10.30, 2),
 (1, 'alfred', 10.30, 2),
 (2, 'bob', 3.14, 3),
 (2, 'bonnie', 172342.2, 3),
 (3, 'calvin', 978.76, 3),
 (3, 'charlie', 9.8, 1);

create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');

insert into cmv_basetable_2_n3 values
 (1, 'alfred', 10.30, 2),
 (3, 'calvin', 978.76, 3);

CREATE MATERIALIZED VIEW cmv_mat_view_n6
  TBLPROPERTIES ('transactional'='true') AS
  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
  WHERE cmv_basetable_2_n3.c > 10.0;

delete from cmv_basetable_n6 where b = 'bob';

explain cbo
alter materialized view cmv_mat_view_n6 rebuild;
alter materialized view cmv_mat_view_n6 rebuild;

select * from cmv_mat_view_n6;
{code}

{code}
3   978.76
3   978.76
{code}

but it should be
{code}
1   10.30
3   978.76
3   978.76
{code}


> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> right joined with the records present in the view.
> The join keys should be the unique columns of each table in the definition 
> query but we can not determine

[jira] [Updated] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-07 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28050:
--
Description: 
To support incremental rebuild of materialized views which definition does not 
have aggregate in presence of delete operations in any of its source tables the 
records of the source tables need to be uniquely identified and joined with the 
records present in the view.

One possibility is to project ROW_IDs of each source table in the view 
definition but the writeId component is changing at delete.

Another way is to project columns of primary keys or unique keys but these 
constraints are not enforced in Hive.

Current implementation leads to data correctness issues:
{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');

insert into cmv_basetable_n6 values
 (1, 'bob', 10.30, 2),
 (1, 'alfred', 10.30, 2),
 (2, 'bob', 3.14, 3),
 (2, 'bonnie', 172342.2, 3),
 (3, 'calvin', 978.76, 3),
 (3, 'charlie', 9.8, 1);

create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');

insert into cmv_basetable_2_n3 values
 (1, 'alfred', 10.30, 2),
 (3, 'calvin', 978.76, 3);

CREATE MATERIALIZED VIEW cmv_mat_view_n6
  TBLPROPERTIES ('transactional'='true') AS
  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
  WHERE cmv_basetable_2_n3.c > 10.0;

delete from cmv_basetable_n6 where b = 'bob';

explain cbo
alter materialized view cmv_mat_view_n6 rebuild;
alter materialized view cmv_mat_view_n6 rebuild;

select * from cmv_mat_view_n6;
{code}

{code}
3   978.76
3   978.76
{code}

but it should be
{code}
1   10.30
3   978.76
3   978.76
{code}

  was:
To support incremental rebuild of materialized views which definition does not 
have aggregate in presence of delete operations in any of its source tables the 
records of the source tables need to be uniquely identified and joined with the 
records present in the view.

One possibility is to project ROW_IDs of each source table in the view 
definition but the writeId component is changing at delete.

Another way is to project columns of primary keys or unique keys but these 
constraints are not enforced in Hive.


> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> joined with the records present in the view.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.
> Current implementation leads to data correctness issues:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'bob', 10.30, 2),
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> delete from cmv_basetable_n6 where b = 'bob';
> explain cbo
> alter materialized view cmv_mat_view_n6 rebuild;
> alter materialized view cmv_mat_view_n6 rebuild;
> select * from cmv_mat_view_n6;
> {code}
> {code}
> 3 978.76
> 3 978.76
> {code}
> but it should be
>

[jira] [Resolved] (HIVE-28054) SemanticException for join condition in subquery

2024-02-07 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28054.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~soumyakanti.das] for the patch and [~henrib] for 
review.

> SemanticException for join condition in subquery
> 
>
> Key: HIVE-28054
> URL: https://issues.apache.org/jira/browse/HIVE-28054
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-beta-1
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Running the following query:
> {code:java}
> create table t1 (id int);
> create table t2 (id int);
> explain cbo select id,
>   (select count(*) from t1 join t2 on t1.id=t2.id)
>   from t2
> order by id; {code}
> or:
> {code:java}
> explain cbo select id,
>   (select count(*) from t1 join t2 using (id))
>   from t2
> order by id; {code}
> throws:
> {code:java}
>  
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Could not resolve column name
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSubQueryRelNode(CalcitePlanner.java:3346)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.internalGenSelectLogicalPlan(CalcitePlanner.java:4580)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:4405)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5074)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1625)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1569)
>     at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
>     at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>     at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
>     at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1321)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13113)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>     at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>     at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>     at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:733)
>     at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:703)
>     at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>     at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>     at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>  {code}
> Workaround:
> {code:java}
> explain cbo select id,
>   (select count(*) from t1 join t2 where t1.id=t2.id)
>   from t2
> order by id; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28000) Hive QL : "not in" clause gives incorrect results when type coercion cannot take place.

2024-01-31 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28000.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~AnmolSun] for the patch and [~sjc362000] and 
[~aturoczy] for review.

>  Hive QL : "not in" clause gives incorrect results when type coercion cannot 
> take place.
> 
>
> Key: HIVE-28000
> URL: https://issues.apache.org/jira/browse/HIVE-28000
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Anmol Sundaram
>Assignee: Anmol Sundaram
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: not_in_examples.q
>
>
> There are certain scenarios where "not in" clause gives incorrect results 
> when type coercion cannot take place. 
> These occur when the in clause contains at least one operand which cannot be 
> type-coerced to the column on which the in clause is being applied to. 
>  
> Please refer to the attached query examples for more details. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-28000) Hive QL : "not in" clause gives incorrect results when type coercion cannot take place.

2024-01-31 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-28000:
-

Assignee: Anmol Sundaram

>  Hive QL : "not in" clause gives incorrect results when type coercion cannot 
> take place.
> 
>
> Key: HIVE-28000
> URL: https://issues.apache.org/jira/browse/HIVE-28000
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Anmol Sundaram
>Assignee: Anmol Sundaram
>Priority: Major
>  Labels: pull-request-available
> Attachments: not_in_examples.q
>
>
> There are certain scenarios where "not in" clause gives incorrect results 
> when type coercion cannot take place. 
> These occur when the in clause contains at least one operand which cannot be 
> type-coerced to the column on which the in clause is being applied to. 
>  
> Please refer to the attached query examples for more details. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-01-30 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-28050:
-

 Summary: Disable Incremental non aggregated materialized view 
rebuild in presence of delete operations
 Key: HIVE-28050
 URL: https://issues.apache.org/jira/browse/HIVE-28050
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.1.0


To support incremental rebuild of materialized views which definition does not 
have aggregate in presence of delete operations in any of its source tables the 
records of the source tables need to be uniquely identified and joined with the 
records present in the view.

One possibility is to project ROW_IDs of each source table in the view 
definition but the writeId component is changing at delete.

Another way is to project columns of primary keys or unique keys but these 
constraints are not enforced in Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28009) Shared work optimizer ignores schema merge setting in case of virtual column difference

2024-01-22 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28009.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~dkuzmenko] for review!

> Shared work optimizer ignores schema merge setting in case of virtual column 
> difference
> ---
>
> Key: HIVE-28009
> URL: https://issues.apache.org/jira/browse/HIVE-28009
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> {code:java}
> set hive.optimize.shared.work.merge.ts.schema=false;
> create table t1(a int);
> explain
> WITH t AS (
>   select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from (
> select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a, row_number() 
> OVER (partition by INPUT__FILE__NAME) rn from t1
> where a = 1
>   ) q
>   where rn=1
> )
> select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from t1 where NOT (a 
> = 1) AND INPUT__FILE__NAME IN (select INPUT__FILE__NAME from t)
> union all
> select * from t
> {code}
> Before SharedWorkOptimizer:
> {code:java}
> TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
> TS[3]-FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
> TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
> {code}
> After SharedWorkOptimizer:
> {code:java}
> TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
>  -FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
> TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
> {code}
> TS[3] and TS[18] are merged but their schema doesn't match and 
> {{hive.optimize.shared.work.merge.ts.schema}} was turned off in the test
> {code:java}
> TS[3]: 0 = FILENAME
> TS[18]: 0 = BLOCKOFFSET,  FILENAME
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27489) HPL/SQL does not support table aliases on column names in loops

2024-01-21 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27489.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch and [~aturoczy] for review.

> HPL/SQL does not support table aliases on column names in loops
> ---
>
> Key: HIVE-27489
> URL: https://issues.apache.org/jira/browse/HIVE-27489
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HPL/SQL does not support table aliases on column names in cursor loops where 
> as the same works in Oracle's PL/SQL.
>  
> This works in Oracle:
>  
> {noformat}
> CREATE OR replace PROCEDURE test2
> AS
> BEGIN
> FOR rec IN (select tab.a from test tab) LOOP
> dbms_output.put_line(rec.a);
> END LOOP;
> END;
> SQL> call test2();
> one
> two
> {noformat}
>  
> This does not work in Hive -
> ERROR : Unhandled exception in HPL/SQL
> No other errors are shown
> Without alias, it works in Hive:
> {noformat}
> BEGIN
> FOR rec IN (select a from test tab) LOOP
> dbms_output.put_line(rec.a);
> END LOOP;
> END;{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28009) Shared work optimizer ignores schema merge setting in case of virtual column difference

2024-01-18 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-28009:
-

 Summary: Shared work optimizer ignores schema merge setting in 
case of virtual column difference
 Key: HIVE-28009
 URL: https://issues.apache.org/jira/browse/HIVE-28009
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0-beta-1, 4.0.0
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code:java}
set hive.optimize.shared.work.merge.ts.schema=false;

create table t1(a int);

explain
WITH t AS (
  select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from (
select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a, row_number() OVER 
(partition by INPUT__FILE__NAME) rn from t1
where a = 1
  ) q
  where rn=1
)
select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from t1 where NOT (a = 
1) AND INPUT__FILE__NAME IN (select INPUT__FILE__NAME from t)
union all
select * from t
{code}
Before SharedWorkOptimizer:
{code:java}
TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
TS[3]-FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
{code}
After SharedWorkOptimizer:
{code:java}
TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
 -FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
{code}
TS[3] and TS[18] are merged but their schema doesn't match and 
{{hive.optimize.shared.work.merge.ts.schema}} was turned off in the test
{code:java}
TS[3]: 0 = FILENAME
TS[18]: 0 = BLOCKOFFSET,  FILENAME
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28006) Materialized view with aggregate function incorrectly shows it allows incremental rebuild

2024-01-17 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-28006:
-

 Summary: Materialized view with aggregate function incorrectly 
shows it allows incremental rebuild
 Key: HIVE-28006
 URL: https://issues.apache.org/jira/browse/HIVE-28006
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table store_sales (
  ss_sold_date_sk int,
  ss_ext_sales_price int,
  ss_customer_sk int
) stored as orc TBLPROPERTIES ('transactional'='true');

insert into store_sales (ss_sold_date_sk, ss_ext_sales_price, ss_customer_sk) 
values (2, 2, 2);

create materialized view mat1 stored as orc tblproperties 
('format-version'='2') as
select ss_customer_sk
  ,min(ss_ext_sales_price)
  ,count(*)
 from store_sales
 group by ss_customer_sk;

delete from store_sales where ss_sold_date_sk = 1;

show materialized views;

explain cbo
alter materialized view mat1 rebuild;
{code}
Incremental rebuild is available
{code}
# MV Name   Rewriting Enabled   Mode
Incremental rebuild 
mat1Yes Manual refresh  
Available   
{code}
vs full rebuild plan
{code}
CBO PLAN:
HiveAggregate(group=[{2}], agg#0=[min($1)], agg#1=[count()])
  HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27492) HPL/SQL built-in functions like sysdate not working

2024-01-14 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27492.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the fix and [~amansinha100] and 
[~aturoczy] for the review.

> HPL/SQL built-in functions like sysdate not working
> ---
>
> Key: HIVE-27492
> URL: https://issues.apache.org/jira/browse/HIVE-27492
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HPL/SQL built-in functions like sysdate not working
>  
> {noformat}
> select sysdate;
> /
> ERROR : FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias 
> or column reference 'sysdate': (possible column names are: )
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:7 Invalid table 
> alias or column reference 'sysdate': (possible column names are: ){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2024-01-08 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27948:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged to master. Thanks [~zabetak] for reporting, reproducing the issue and 
review the PR.

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2024-01-08 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27948:
--
Component/s: Materialized views

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Materialized views
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27161) MetaException when executing CTAS query in Druid storage handler

2023-12-21 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27161:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged to master. Thanks [~dkuzmenko] for review!

> MetaException when executing CTAS query in Druid storage handler
> 
>
> Key: HIVE-27161
> URL: https://issues.apache.org/jira/browse/HIVE-27161
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Any kind of CTAS query targeting the Druid storage handler fails with the 
> following exception:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:LOCATION may not be specified for Druid)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1347) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1352) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:158)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:116)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:228) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) 
> ~[hive-cli-4.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.initDataset(QTestDatasetHandler.java:86)
>  ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.beforeTest(QTestDatasetHandler.java:190)
>  ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79)
>  ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:607) 
> ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:112)
>  ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) 
> ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
>

[jira] [Resolved] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy

2023-12-20 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27876.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Megred to master. Thanks [~rameshkumar] for the patch and [~aturoczy] for 
review.

> Incorrect query results on tables with ClusterBy & SortBy
> -
>
> Key: HIVE-27876
> URL: https://issues.apache.org/jira/browse/HIVE-27876
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Repro:
>  
> {code:java}
> create external table test_bucket(age int, name string, dept string) 
> clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored 
> as orc;
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> //empty wrong results
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--+---+--+
> | age  | name  | _c2  |
> +--+---+--+
> +--+---+--+
> // Workaround
> set hive.map.aggr=false;
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--++--+
> | age  |  name  | _c2  |
> +--++--+
> | 1    | user1  | 2    |
> | 2    | user2  | 2    |
> +--++--+ {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-12-19 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27924:
--
Status: Patch Available  (was: In Progress)

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, hive-4.0.0-must, known_issue, 
> materializedviews, pull-request-available
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based query rewrite for the MV and re-run the definition, which 
> should give the correct results:
>

[jira] [Resolved] (HIVE-27428) CTAS fails with SemanticException when join subquery has complex type column and false filter predicate

2023-12-17 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27428.
---
Fix Version/s: 4.1.0
   Resolution: Resolved

Resolved by HIVE-27690

> CTAS fails with SemanticException when join subquery has complex type column 
> and false filter predicate
> ---
>
> Key: HIVE-27428
> URL: https://issues.apache.org/jira/browse/HIVE-27428
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Repro steps:
> {code:java}
> drop table if exists table1;
> drop table if exists table2;
> create table table1 (a string, b string);
> create table table2 (complex_column  create table table2 (complex_column 
> array, values:array description:string, categories:array);
> -- CTAS failing query
> create table table3 as with t1 as (select * from table1), t2 as (select * 
> from table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code}
> Exception:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
> CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the 
> type, near field:  t2.complex_column
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442)
>  
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300)
>  
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) 
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194)  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27690) Handle casting NULL literal to complex type

2023-12-17 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27690.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master. Thanks [~lvegh] for review.

> Handle casting NULL literal to complex type
> ---
>
> Key: HIVE-27690
> URL: https://issues.apache.org/jira/browse/HIVE-27690
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {{NULL}} literal values of a complex type column are treated as void typed 
> literals.
> {code:java}
> create table explain_npe_map( c1 map );
> explain select c1 from explain_npe_map where c1 is null;
> {code}
> [https://github.com/apache/hive/blob/88bc8269a64d31eee372bf3602933c75283c686b/ql/src/test/results/clientpositive/llap/analyze_npe.q.out#L142]
> The goal of this patch is to use the original complex type:
> {code:java}
>   Select Operator
> expressions: Const map null (type: 
> map)
> {code}
> Void typed {{NULL}} literals makes CTAS statements are fail since the 
> original complex type can not be inferred.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27690) Handle casting NULL literal to complex type

2023-12-17 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27690:
--
Fix Version/s: 4.1.0
   (was: 4.0.0)

> Handle casting NULL literal to complex type
> ---
>
> Key: HIVE-27690
> URL: https://issues.apache.org/jira/browse/HIVE-27690
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> {{NULL}} literal values of a complex type column are treated as void typed 
> literals.
> {code:java}
> create table explain_npe_map( c1 map );
> explain select c1 from explain_npe_map where c1 is null;
> {code}
> [https://github.com/apache/hive/blob/88bc8269a64d31eee372bf3602933c75283c686b/ql/src/test/results/clientpositive/llap/analyze_npe.q.out#L142]
> The goal of this patch is to use the original complex type:
> {code:java}
>   Select Operator
> expressions: Const map null (type: 
> map)
> {code}
> Void typed {{NULL}} literals makes CTAS statements are fail since the 
> original complex type can not be inferred.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-15 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17797157#comment-17797157
 ] 

Krisztian Kasa commented on HIVE-27948:
---

The general rule is if there is any error message during materialized view 
definition validation the MV can not be used by Calcite to perform query 
rewrites but it can be used for text/AST based rewrite algorithm so MV creation 
succeeds with a warning message.

 As the description shows this logic can not be applied to any type of MVs, so 
I submitted a patch to filter out MVs which doesn't support any type of rewrite 
algorithm.

Thanks for reporting this bug.

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-14 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-27948:
-

Assignee: Krisztian Kasa

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27446) Exception when rebuild materialized view incrementally in presence of delete operations

2023-12-12 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27446.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master. Thanks [~lvegh] for review.

> Exception when rebuild materialized view incrementally in presence of delete 
> operations
> ---
>
> Key: HIVE-27446
> URL: https://issues.apache.org/jira/browse/HIVE-27446
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> DELETE from cmv_basetable_2_n3 WHERE a=1;
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> DELETE FROM cmv_basetable_n6 WHERE a=1;
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> {code}
> The second rebuild fails
> {code}
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1686925588164_0001_7_06, 
> diagnostics=[Task failed, taskId=task_1686925588164_0001_7_06_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1686925588164_0001_7_06_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:387)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:303)
>   ... 17 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
>   at 
>

[jira] [Commented] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-12-12 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795740#comment-17795740
 ] 

Krisztian Kasa commented on HIVE-27924:
---

A draft patch created to address the issue when the MV definition has 
aggregate. I'm working on the part which handles the non-aggregate case.

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, hive-4.0.0-must, known_issue, 
> materializedviews, pull-request-available
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;

[jira] [Work started] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-12-05 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27924 started by Krisztian Kasa.
-
> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, materializedviews
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based query rewrite for the MV and re-run the definition, which 
> should give the correct results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE;
> SELECT

[jira] [Assigned] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-12-05 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-27924:
-

Assignee: Krisztian Kasa

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, materializedviews
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based query rewrite for the MV and re-run the definition, which 
> should give the correct results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE;
>

[jira] [Assigned] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2023-11-30 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-26505:
-

Assignee: Krisztian Kasa

> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0-alpha-1
>Reporter: GuangMing Lu
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: check, hive-4.0.0-must
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27867) Incremental materialized view throws NPE whew Iceberg source table is empty

2023-11-29 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27867.
---
Resolution: Fixed

Merged to master. Thanks [~dkuzmenko] for review.

> Incremental materialized view throws NPE whew Iceberg source table is empty
> ---
>
> Key: HIVE-27867
> URL: https://issues.apache.org/jira/browse/HIVE-27867
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: iceberg, materializedviews, pull-request-available
> Fix For: 4.0.0
>
>
> Repro
> https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/test/queries/positive/mv_iceberg_orc.q
> in hive.log
> {code}
> 2023-11-09T05:17:05,625  WARN [e35c7637-b0ba-4e30-8448-5bdc0d0e4779 main] 
> rebuild.AlterMaterializedViewRebuildAnalyzer: Exception loading materialized 
> views
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidMaterializedViews(Hive.java:2321)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getMaterializedViewForRebuild(Hive.java:2227)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyMaterializedViewRewriting(AlterMaterializedViewRebuildAnaly
> zer.java:215) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1700)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1569)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1321)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13113)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:135)
>  ~[hive-exec-4.0.0-beta-2-SNAPSH
> OT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
>

[jira] [Commented] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2023-11-27 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790021#comment-17790021
 ] 

Krisztian Kasa commented on HIVE-26505:
---

This is the expression sent by {{PartitionPruner}} to HMS 
get_partitions_spec_by_expr
{code}
(true and (cp = '2022-08-23')) is not true
{code}
This removes partition cp = '2022-08-23' from the partitions which should be 
scan.

CBO plan
{code}
HiveProject(id=[$0], cp=[$1])
  HiveFilter(condition=[IS NOT TRUE(AND(=($0, _UTF-16LE'a'), =($1, 
_UTF-16LE'2022-08-23')))])
HiveTableScan(table=[[default, test0831]], table:alias=[test0831])
{code}

Expression 
{code}
IS NOT TRUE(AND(=($0, _UTF-16LE'a'), =($1, _UTF-16LE'2022-08-23')))
{code}
is converted to
{code}
GenericUDFOPNotTrue(GenericUDFOPAnd(Const boolean true, 
GenericUDFOPEqual(Column[cp], Const string 2022-08-23)))
{code}
in RelOptHiveTable.computePartitionList
https://github.com/apache/hive/blob/f4c4d6b2ec1555abb844b6f348f879719b0a67c3/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L476-L478


> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0-alpha-1
>Reporter: GuangMing Lu
>Priority: Critical
>  Labels: check, hive-4.0.0-must
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy

2023-11-23 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789050#comment-17789050
 ] 

Krisztian Kasa edited comment on HIVE-27876 at 11/23/23 10:10 AM:
--

I found another data correctness issue regarding this optimization:
{code:java}
create table test_bucket(age int, name string, dept string) clustered by (age) 
sorted by (age asc) into 2 buckets stored as orc;

insert into test_bucket values (10, 'user1', 'dept1'), (10, 'user2' , 'dept2'), 
( 2, 'user2' , 'dept2');
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');

select * from test_bucket;
{code}
Order is not global:
{code:java}
2   user2   dept2
10  user1   dept1
10  user2   dept2
2   user2   dept2
1   user1   dept1
{code}
{code:java}
select age, count(*) from test_bucket group by age;
{code}
Records with key {{age = 2}} are not aggregated:
{code:java}
2   1
10  2
2   1
1   1
{code}
First insert creates one file:
{code:java}
itests/qtest/target/localfs/warehouse/test_bucket/00_0
{"age":2,"name":"user2","dept":"dept2"}
{"age":10,"name":"user1","dept":"dept1"}
{"age":10,"name":"user2","dept":"dept2"}
{code}
Second insert creates 2 files:
{code:java}
itests/qtest/target/localfs/warehouse/test_bucket/00_0_copy_1
{"age":2,"name":"user2","dept":"dept2"}

itests/qtest/target/localfs/warehouse/test_bucket/01_0
{"age":1,"name":"user1","dept":"dept1"}
{code}


was (Author: kkasa):
I found another data correctness issue regarding this optimization:
{code}
create table test_bucket(age int, name string, dept string) clustered by (age) 
sorted by (age asc) into 2 buckets stored as orc;

insert into test_bucket values (10, 'user1', 'dept1'), (10, 'user2' , 'dept2'), 
( 2, 'user2' , 'dept2');
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');

select * from test_bucket;
{code}
Order is not global:
{code}
2   user2   dept2
10  user1   dept1
10  user2   dept2
2   user2   dept2
1   user1   dept1
{code}
{code}
select age, count(*) from test_bucket group by age;
{code}
Records with key {{age = 2}} are bit aggregated:
{code}
2   1
10  2
2   1
1   1
{code}
First insert creates one file:
{code}
itests/qtest/target/localfs/warehouse/test_bucket/00_0
{"age":2,"name":"user2","dept":"dept2"}
{"age":10,"name":"user1","dept":"dept1"}
{"age":10,"name":"user2","dept":"dept2"}
{code}

Second insert creates 2 files:
{code}
itests/qtest/target/localfs/warehouse/test_bucket/00_0_copy_1
{"age":2,"name":"user2","dept":"dept2"}

itests/qtest/target/localfs/warehouse/test_bucket/01_0
{"age":1,"name":"user1","dept":"dept1"}
{code}

> Incorrect query results on tables with ClusterBy & SortBy
> -
>
> Key: HIVE-27876
> URL: https://issues.apache.org/jira/browse/HIVE-27876
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
>  
> {code:java}
> create external table test_bucket(age int, name string, dept string) 
> clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored 
> as orc;
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> //empty wrong results
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--+---+--+
> | age  | name  | _c2  |
> +--+---+--+
> +--+---+--+
> // Workaround
> set hive.map.aggr=false;
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--++--+
> | age  |  name  | _c2  |
> +--++--+
> | 1    | user1  | 2    |
> | 2    | user2  | 2    |
> +--++--+ {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy

2023-11-23 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789050#comment-17789050
 ] 

Krisztian Kasa commented on HIVE-27876:
---

I found another data correctness issue regarding this optimization:
{code}
create table test_bucket(age int, name string, dept string) clustered by (age) 
sorted by (age asc) into 2 buckets stored as orc;

insert into test_bucket values (10, 'user1', 'dept1'), (10, 'user2' , 'dept2'), 
( 2, 'user2' , 'dept2');
insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');

select * from test_bucket;
{code}
Order is not global:
{code}
2   user2   dept2
10  user1   dept1
10  user2   dept2
2   user2   dept2
1   user1   dept1
{code}
{code}
select age, count(*) from test_bucket group by age;
{code}
Records with key {{age = 2}} are bit aggregated:
{code}
2   1
10  2
2   1
1   1
{code}
First insert creates one file:
{code}
itests/qtest/target/localfs/warehouse/test_bucket/00_0
{"age":2,"name":"user2","dept":"dept2"}
{"age":10,"name":"user1","dept":"dept1"}
{"age":10,"name":"user2","dept":"dept2"}
{code}

Second insert creates 2 files:
{code}
itests/qtest/target/localfs/warehouse/test_bucket/00_0_copy_1
{"age":2,"name":"user2","dept":"dept2"}

itests/qtest/target/localfs/warehouse/test_bucket/01_0
{"age":1,"name":"user1","dept":"dept1"}
{code}

> Incorrect query results on tables with ClusterBy & SortBy
> -
>
> Key: HIVE-27876
> URL: https://issues.apache.org/jira/browse/HIVE-27876
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
>  
> {code:java}
> create external table test_bucket(age int, name string, dept string) 
> clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored 
> as orc;
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> //empty wrong results
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--+---+--+
> | age  | name  | _c2  |
> +--+---+--+
> +--+---+--+
> // Workaround
> set hive.map.aggr=false;
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--++--+
> | age  |  name  | _c2  |
> +--++--+
> | 1    | user1  | 2    |
> | 2    | user2  | 2    |
> +--++--+ {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy

2023-11-23 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789045#comment-17789045
 ] 

Krisztian Kasa commented on HIVE-27876:
---

[~rameshkumar]
1. Based on cwiki sort by ensures the record order at bucket file level.
[https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl]
Another page shares more details
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-DifferencebetweenSortByandOrderBy]

When inserting all records using one insert statement only one bucket file is 
created so global order matches the record order in that singe file.

2. The goal of this optimization is to remove the RS because shuffle takes most 
of the execution time.
Group by can distribute key values two ways
 * using a hash table. It has limitations (memory).
 * the data is sorted by the group by keys. This is ensured by the RS.

IIUC The optimization assumes if the physical order of the records in the table 
matches the group by needs then the RS can be removed. However it relies on the 
bucketing keys which according to 1) and also by the repro in description it 
doesn't ensures global ordering of the data.

> Incorrect query results on tables with ClusterBy & SortBy
> -
>
> Key: HIVE-27876
> URL: https://issues.apache.org/jira/browse/HIVE-27876
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
>  
> {code:java}
> create external table test_bucket(age int, name string, dept string) 
> clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored 
> as orc;
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> //empty wrong results
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--+---+--+
> | age  | name  | _c2  |
> +--+---+--+
> +--+---+--+
> // Workaround
> set hive.map.aggr=false;
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--++--+
> | age  |  name  | _c2  |
> +--++--+
> | 1    | user1  | 2    |
> | 2    | user2  | 2    |
> +--++--+ {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26618) Add setting to turn on/off removing sections of a query plan known never produces rows

2023-11-22 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-26618.
---
Resolution: Won't Fix

> Add setting to turn on/off removing sections of a query plan known never 
> produces rows
> --
>
> Key: HIVE-26618
> URL: https://issues.apache.org/jira/browse/HIVE-26618
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HIVE-26524 introduced an optimization to remove sections of query plan known 
> never produces rows.
> Add a setting into hive conf to turn on/off this optimization.
> When the optimization is turned off restore the legacy behavior:
> * represent empty result operator with {{HiveSortLimit}} 0
> * disable {{HiveRemoveEmptySingleRules}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27788) Exception when join has 2 Group By operators in the same branch in the same reducer

2023-11-21 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27788.
---
Resolution: Fixed

Merged to master. Thanks [~zabetak] for review.

> Exception when join has 2 Group By operators in the same branch in the same 
> reducer
> ---
>
> Key: HIVE-27788
> URL: https://issues.apache.org/jira/browse/HIVE-27788
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 4.0.0-beta-1
>Reporter: Riju Trivedi
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: auto_sortmerge_join_17.q
>
>
> Sort- merge join with Group By + PTF operator leads  to Runtime exception 
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:387)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:303)
>   ... 17 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:392)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:316)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:127)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.handleOutputRows(PTFOperator.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:372)
>   ... 18 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:534)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:488)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:390)
>   ... 31 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:522)
>   ... 33 more {code}
> Issue can be reproduced with [^auto_sortmerge_join_17.q]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27876) Incorrect query results on tables with ClusterBy & SortBy

2023-11-20 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788031#comment-17788031
 ] 

Krisztian Kasa commented on HIVE-27876:
---

The query in the description has a plan:
{code}
POSTHOOK: query: explain
select age, name, count(*) from test_bucket group by  age, name having count(*) 
> 1
POSTHOOK: type: QUERY
POSTHOOK: Input: default@test_bucket
 A masked pattern was here 
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test_bucket
  Select Operator
expressions: age (type: int), name (type: string)
outputColumnNames: age, name
Group By Operator
  aggregations: count()
  keys: age (type: int), name (type: string)
  mode: final
  outputColumnNames: _col0, _col1, _col2
  Filter Operator
predicate: (_col2 > 1L) (type: boolean)
ListSink
{code}

In this case 2 bucket files are created. Both are sorted but only at file 
level. The records are fetched this order by FetchOperator 
{code}
1   user1   dept1
2   user2   dept2
1   user1   dept1
2   user2   dept2
{code}
Data is not sorted globally and group by operator treats all {{age, name}} 
column values as distinct values hence {{count( * )}} is 1 for all the key 
values then Filter operator filters out all records.

Possible workaround to turn off map side group by optimization
https://github.com/apache/hive/blob/feda35389dc28c8c9bf3c8a3d39de53ba90e41c0/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2019-L2022
{code}
set hive.map.groupby.sorted=false;
{code}

> Incorrect query results on tables with ClusterBy & SortBy
> -
>
> Key: HIVE-27876
> URL: https://issues.apache.org/jira/browse/HIVE-27876
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
>
> Repro:
>  
> {code:java}
> create external table test_bucket(age int, name string, dept string) 
> clustered by (age, name) sorted by (age asc, name asc) into 2 buckets stored 
> as orc;
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> insert into test_bucket values (1, 'user1', 'dept1'), ( 2, 'user2' , 'dept2');
> //empty wrong results
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--+---+--+
> | age  | name  | _c2  |
> +--+---+--+
> +--+---+--+
> // Workaround
> set hive.map.aggr=false;
> select age, name, count(*) from test_bucket group by  age, name having 
> count(*) > 1; 
> +--++--+
> | age  |  name  | _c2  |
> +--++--+
> | 1    | user1  | 2    |
> | 2    | user2  | 2    |
> +--++--+ {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27491) HPL/SQL does not allow variables in update statements

2023-11-19 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27491.
---
Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> HPL/SQL does not allow variables in update statements
> -
>
> Key: HIVE-27491
> URL: https://issues.apache.org/jira/browse/HIVE-27491
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
>
> HPL/SQL does not allow variables in update statements
> Works in Oracle:
> {noformat}
> DECLARE
> val_to_update varchar(10);
> BEGIN
> val_to_update := 'one';
> FOR REC in (select a,b from test1 where a = val_to_update) LOOP
> dbms_output.put_line (rec.a);
> dbms_output.put_line (rec.b);
> END LOOP;
> update test1 set b = 'another'
> where a = val_to_update;
> end;{noformat}
> Doesn't work in Hive:
> {noformat}
> DECLARE
> val_to_update STRING;
> BEGIN
> val_to_update := 'one';
> FOR REC in (select a,b from test where a = val_to_update) LOOP
> print (rec.a);
> print (rec.b);
> END LOOP;
> update test set b = 'another test'
> where a = val_to_update;
> end;
> /
> ERROR : FAILED: SemanticException [Error 10004]: Line 2:14 Invalid table 
> alias or column reference 'val_to_update': (possible column names are: a, b)
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 2:14 Invalid table 
> alias or column reference 'val_to_update': (possible column names are: a, b)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13636)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13575)
> ...
> {noformat}
>  
> Select (not update) does work in hive:
> {noformat}
> DECLARE
> val_to_update STRING;
> BEGIN
> val_to_update := 'one';
> FOR REC in (select a,b from test where a = val_to_update) LOOP
> print (rec.a);
> print (rec.b);
> END LOOP;
> select * from test
> where a = val_to_update;
> end;
> /{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26660) TPC-DS query 71 returns wrong results

2023-11-17 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787207#comment-17787207
 ] 

Krisztian Kasa commented on HIVE-26660:
---

[~glapark]
1. Could you please describe how HIVE-26986 solve this issue. Having 
unnecessary RS operators in the plan shouldn't cause wrong results only 
unnecessary shuffle during runtime hence performance degradation.
2. please provide simple repro steps (ddl, some test data and a simple query) 
of the issue mentioned in the description.
3. if the issue somehow relates to SharedWorkOptimizer, OperatorGraph and/or 
ParallelEdgeFixer please try running the query with
{code:java}
set hive.optimize.shared.work.parallel.edge.support=false;
{code}
and see if the issue persist.

> TPC-DS query 71 returns wrong results
> -
>
> Key: HIVE-26660
> URL: https://issues.apache.org/jira/browse/HIVE-26660
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Priority: Major
>
> TPC-DS query 71 returns wrong results when tested with 100GB dataset. The 
> query fails with an error:
>  
> Caused by: org.apache.hadoop.hive.common.NoDynamicValuesException: Value does 
> not exist in registry: RS_39_item_i_item_sk_min
>     at 
> org.apache.hadoop.hive.ql.exec.tez.DynamicValueRegistryTez.getValue(DynamicValueRegistryTez.java:77)
>     at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:128)
>     at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterLongColumnBetweenDynamicValue.evaluate(FilterLongColumnBetweenDynamicValue.java:88)
>     at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr.evaluate(FilterExprAndExpr.java:42)
>     at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:125)
>     at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>     at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
>     at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
>     at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27788) Exception when join has 2 Group By operators in the same branch in the same reducer

2023-11-16 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27788:
--
Summary: Exception when join has 2 Group By operators in the same branch in 
the same reducer  (was: Exception in Sort Merge join with Group By + PTF 
Operator)

> Exception when join has 2 Group By operators in the same branch in the same 
> reducer
> ---
>
> Key: HIVE-27788
> URL: https://issues.apache.org/jira/browse/HIVE-27788
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 4.0.0-beta-1
>Reporter: Riju Trivedi
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: auto_sortmerge_join_17.q
>
>
> Sort- merge join with Group By + PTF operator leads  to Runtime exception 
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:387)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:303)
>   ... 17 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:392)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:316)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:127)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.handleOutputRows(PTFOperator.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:372)
>   ... 18 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:534)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:488)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:390)
>   ... 31 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:522)
>   ... 33 more {code}
> Issue can be reproduced with [^auto_sortmerge_join_17.q]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27828) Iceberg integration: enable copy on write update when split update is on

2023-11-16 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27828.
---
Resolution: Fixed

Merged to master. Thanks [~dkuzmenko] for review.

> Iceberg integration: enable copy on write update when split update is on
> 
>
> Key: HIVE-27828
> URL: https://issues.apache.org/jira/browse/HIVE-27828
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently {{hive.split.update}} has to be set to {{false}} if a copy-on-write 
> update should be executed when updating an Iceberg table.
> [https://github.com/apache/hive/blob/0233dcc7f1f09198c093cb4b69bd2b2598c97303/iceberg/iceberg-handler/src/test/queries/positive/update_iceberg_copy_on_write_unpartitioned.q#L1]
> [https://github.com/apache/hive/blob/0233dcc7f1f09198c093cb4b69bd2b2598c97303/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L78-L81]
>  
> Copy-on-write mode should be independent from split update because split 
> update uses positional delete.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27788) Exception in Sort Merge join with Group By + PTF Operator

2023-11-15 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786353#comment-17786353
 ] 

Krisztian Kasa commented on HIVE-27788:
---

[~zabetak], [~amansinha]
I think the summary of this jira is misleading. As per my analysis the issue is 
caused by
 * The operator tree in a reducer has a merge join operator and any of the join 
branches has more than one GBY: 
{code}
RS-...-GBY-...-GBY-...-MERGEJOIN-...
   RS-...-/
{code}
 * the data have unique values in GBY key(s) processed by that branch or at 
least the last 3 records in the record stream.

The presence of PTF operator is irrelevant in this issue. It can be anything. 
Please see another example:
[https://github.com/apache/hive/blob/17525f169b9a08cd715bfb42899e45b7c689c77a/ql/src/test/results/clientpositive/llap/subquery_in_having.q.out#L263-L391]

 

> Exception in Sort Merge join with Group By + PTF Operator
> -
>
> Key: HIVE-27788
> URL: https://issues.apache.org/jira/browse/HIVE-27788
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 4.0.0-beta-1
>Reporter: Riju Trivedi
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: auto_sortmerge_join_17.q
>
>
> Sort- merge join with Group By + PTF operator leads  to Runtime exception 
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:387)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:303)
>   ... 17 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:392)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:316)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:127)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.handleOutputRows(PTFOperator.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:372)
>   ... 18 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:534)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:488)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:390)
>   ... 31 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>

[jira] [Resolved] (HIVE-27533) Incorrect FOREIGN KEY constraints in SHOW CREATE TABLE

2023-11-14 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27533.
---
Resolution: Fixed

Merged to master. Thanks [~soumyakanti.das]  for the patch.

> Incorrect FOREIGN KEY constraints in SHOW CREATE TABLE
> --
>
> Key: HIVE-27533
> URL: https://issues.apache.org/jira/browse/HIVE-27533
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> For a table that has a composite foreign key, SHOW CREATE TABLE returns 
> multiple  ALTER STATEMENTS, which is incorrect.
> For example, in show_create_table.q, we have FK constraints on table TEST3 
> referencing table TEST:
> {code:java}
> foreign key(col1, col2) references TEST(col1, col2) disable novalidate rely 
> {code}
> In the output of {{{}SHOW CREATE TABLE TEST3;{}}}, we see that there are two 
> ALTER TABLE constraints for the composite key, which is incorrect as FK 
> constraint cannot be on a subset of a composite PK:
> {code:java}
> ALTER TABLE `default`.`test3` ADD CONSTRAINT ` A masked pattern was here 
> ` FOREIGN KEY (`col1`) REFERENCES `default`.`test`(`col1`) DISABLE 
> NOVALIDATE RELY;
> ALTER TABLE `default`.`test3` ADD CONSTRAINT ` A masked pattern was here 
> ` FOREIGN KEY (`col2`) REFERENCES `default`.`test`(`col2`) DISABLE 
> NOVALIDATE RELY; {code}
> For this case, we should get a single ALTER TABLE statement like:
> {code:java}
> ALTER TABLE `default`.`test3` ADD CONSTRAINT ` A masked pattern was here 
> ` FOREIGN KEY (`col1`, `col2`) REFERENCES `default`.`test`(`col1`, 
> `col2`) DISABLE NOVALIDATE RELY; {code}
> To reproduce this, please run:
> {code:java}
> mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite=true 
> -Dqfile=show_create_table.q {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.

2023-11-10 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784956#comment-17784956
 ] 

Krisztian Kasa commented on HIVE-26986:
---

[~seonggon] 
1. It is not clear why adding extra concentrator RS leads to data correctness 
issue.
Could you please share a simple repro on a small dataset which has the 
necessary records only. It can be also added to the PR to extend the test 
coverage of SWO and ParallelEdgeFixer.
2. IIUC parallel edge support can be controlled via config setting. Could you 
please verify if the correctness issue stands when
{code:java}
set hive.optimize.shared.work.parallel.edge.support=false;
{code}

> A DAG created by OperatorGraph is not equal to the Tez DAG.
> ---
>
> Key: HIVE-26986
> URL: https://issues.apache.org/jira/browse/HIVE-26986
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0-alpha-2
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.0-must, pull-request-available
> Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A DAG created by OperatorGraph is not equal to the corresponding DAG that is 
> submitted to Tez.
> Because of this problem, ParallelEdgeFixer reports a pair of normal edges to 
> a parallel edge.
> We observe this problem by comparing OperatorGraph and Tez DAG when running 
> TPC-DS query 71 on 1TB ORC format managed table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-27867) Incremental materialized view throws NPE whew Iceberg source table is empty

2023-11-09 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27867 started by Krisztian Kasa.
-
> Incremental materialized view throws NPE whew Iceberg source table is empty
> ---
>
> Key: HIVE-27867
> URL: https://issues.apache.org/jira/browse/HIVE-27867
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: iceberg, materializedviews, pull-request-available
> Fix For: 4.0.0
>
>
> Repro
> https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/test/queries/positive/mv_iceberg_orc.q
> in hive.log
> {code}
> 2023-11-09T05:17:05,625  WARN [e35c7637-b0ba-4e30-8448-5bdc0d0e4779 main] 
> rebuild.AlterMaterializedViewRebuildAnalyzer: Exception loading materialized 
> views
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidMaterializedViews(Hive.java:2321)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getMaterializedViewForRebuild(Hive.java:2227)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyMaterializedViewRewriting(AlterMaterializedViewRebuildAnaly
> zer.java:215) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1700)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1569)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1321)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13113)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:135)
>  ~[hive-exec-4.0.0-beta-2-SNAPSH
> OT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) 
>

[jira] [Created] (HIVE-27867) Incremental materialized view throws NPE whew Iceberg source table is empty

2023-11-09 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-27867:
-

 Summary: Incremental materialized view throws NPE whew Iceberg 
source table is empty
 Key: HIVE-27867
 URL: https://issues.apache.org/jira/browse/HIVE-27867
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Repro
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/test/queries/positive/mv_iceberg_orc.q

in hive.log
{code}
2023-11-09T05:17:05,625  WARN [e35c7637-b0ba-4e30-8448-5bdc0d0e4779 main] 
rebuild.AlterMaterializedViewRebuildAnalyzer: Exception loading materialized 
views
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.Hive.getValidMaterializedViews(Hive.java:2321)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.metadata.Hive.getMaterializedViewForRebuild(Hive.java:2227)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyMaterializedViewRewriting(AlterMaterializedViewRebuildAnaly
zer.java:215) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1700)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1569)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1321)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13113)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:135)
 ~[hive-exec-4.0.0-beta-2-SNAPSH
OT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) 
~[hive-cli-4.0.0-beta-2-SNAPSHOT.jar:?]

[jira] [Updated] (HIVE-27867) Incremental materialized view throws NPE whew Iceberg source table is empty

2023-11-09 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27867:
--
Labels: iceberg materializedviews  (was: )

> Incremental materialized view throws NPE whew Iceberg source table is empty
> ---
>
> Key: HIVE-27867
> URL: https://issues.apache.org/jira/browse/HIVE-27867
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: iceberg, materializedviews
> Fix For: 4.0.0
>
>
> Repro
> https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/test/queries/positive/mv_iceberg_orc.q
> in hive.log
> {code}
> 2023-11-09T05:17:05,625  WARN [e35c7637-b0ba-4e30-8448-5bdc0d0e4779 main] 
> rebuild.AlterMaterializedViewRebuildAnalyzer: Exception loading materialized 
> views
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidMaterializedViews(Hive.java:2321)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getMaterializedViewForRebuild(Hive.java:2227)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer$MVRebuildCalcitePlannerAction.applyMaterializedViewRewriting(AlterMaterializedViewRebuildAnaly
> zer.java:215) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1700)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1569)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1321)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:570)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13113)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ddl.view.materialized.alter.rebuild.AlterMaterializedViewRebuildAnalyzer.analyzeInternal(AlterMaterializedViewRebuildAnalyzer.java:135)
>  ~[hive-exec-4.0.0-beta-2-SNAPSH
> OT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) 
>

[jira] [Assigned] (HIVE-27788) Exception in Sort Merge join with Group By + PTF Operator

2023-11-08 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-27788:
-

Assignee: Krisztian Kasa

> Exception in Sort Merge join with Group By + PTF Operator
> -
>
> Key: HIVE-27788
> URL: https://issues.apache.org/jira/browse/HIVE-27788
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 4.0.0-beta-1
>Reporter: Riju Trivedi
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: auto_sortmerge_join_17.q
>
>
> Sort- merge join with Group By + PTF operator leads  to Runtime exception 
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:387)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:303)
>   ... 17 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:392)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:316)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:127)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.handleOutputRows(PTFOperator.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:372)
>   ... 18 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:534)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:488)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:390)
>   ... 31 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:522)
>   ... 33 more {code}
> Issue can be reproduced with [^auto_sortmerge_join_17.q]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27533) Incorrect FOREIGN KEY constraints in SHOW CREATE TABLE

2023-11-07 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783601#comment-17783601
 ] 

Krisztian Kasa edited comment on HIVE-27533 at 11/7/23 12:26 PM:
-

[~soumyakanti.das] 
Could you please add all repro steps to the description: create table statement 
and the show create table statement etc.
 


was (Author: kkasa):
[~soumyakanti.das] 
Could you please add the full repro steps to the description: create table 
statement and the show create table statement.
 

> Incorrect FOREIGN KEY constraints in SHOW CREATE TABLE
> --
>
> Key: HIVE-27533
> URL: https://issues.apache.org/jira/browse/HIVE-27533
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> For a table that has a composite foreign key, SHOW CREATE TABLE returns 
> multiple  ALTER STATEMENTS, which is incorrect.
> For example, for tpcds table {{{}catalog_returns{}}}, we see:
> {code:java}
> | ALTER TABLE tpcds_orc_def.catalog_returns ADD CONSTRAINT 
> 3abe2c00-25ec-47ca-a5f9-38773995e8c1 FOREIGN KEY (cr_item_sk) REFERENCES 
> tpcds_orc_def.catalog_sales(cs_item_sk) DISABLE NOVALIDATE RELY; |
> | ALTER TABLE tpcds_orc_def.catalog_returns ADD CONSTRAINT 
> 3abe2c00-25ec-47ca-a5f9-38773995e8c1 FOREIGN KEY (cr_order_number) REFERENCES 
> tpcds_orc_def.catalog_sales(cs_order_number) DISABLE NOVALIDATE RELY; | {code}
> Here we see two ALTER STATEMENTS with the same name, which reference primary 
> keys of table {{{}catalog_sales{}}}. However, this is incorrect as a FK 
> constraint cannot be on a subset of a composite PK.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27533) Incorrect FOREIGN KEY constraints in SHOW CREATE TABLE

2023-11-07 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783601#comment-17783601
 ] 

Krisztian Kasa commented on HIVE-27533:
---

[~soumyakanti.das] 
Could you please add the full repro steps to the description: create table 
statement and the show create table statement.
 

> Incorrect FOREIGN KEY constraints in SHOW CREATE TABLE
> --
>
> Key: HIVE-27533
> URL: https://issues.apache.org/jira/browse/HIVE-27533
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> For a table that has a composite foreign key, SHOW CREATE TABLE returns 
> multiple  ALTER STATEMENTS, which is incorrect.
> For example, for tpcds table {{{}catalog_returns{}}}, we see:
> {code:java}
> | ALTER TABLE tpcds_orc_def.catalog_returns ADD CONSTRAINT 
> 3abe2c00-25ec-47ca-a5f9-38773995e8c1 FOREIGN KEY (cr_item_sk) REFERENCES 
> tpcds_orc_def.catalog_sales(cs_item_sk) DISABLE NOVALIDATE RELY; |
> | ALTER TABLE tpcds_orc_def.catalog_returns ADD CONSTRAINT 
> 3abe2c00-25ec-47ca-a5f9-38773995e8c1 FOREIGN KEY (cr_order_number) REFERENCES 
> tpcds_orc_def.catalog_sales(cs_order_number) DISABLE NOVALIDATE RELY; | {code}
> Here we see two ALTER STATEMENTS with the same name, which reference primary 
> keys of table {{{}catalog_sales{}}}. However, this is incorrect as a FK 
> constraint cannot be on a subset of a composite PK.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27531) Unparse identifiers in show create table output

2023-11-07 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27531.
---
Resolution: Fixed

Merged to master. Thanks [~soumyakanti.das] for the patch.

> Unparse identifiers in show create table output
> ---
>
> Key: HIVE-27531
> URL: https://issues.apache.org/jira/browse/HIVE-27531
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> Currently, {{SHOW CREATE TABLE}} on tables with constraints return {{ALTER 
> TABLE}} statements with incompatible constraint names. Running the ALTER 
> TABLE statements throw ParseException.
> For example:
> {code:java}
> 0: jdbc:hive2://localhost:11050/default> ALTER TABLE reason ADD CONSTRAINT 
> 2e47abb2-b6c7-450a-8229-395d6b1ff168 PRIMARY KEY (r_reason_sk) DISABLE 
> NOVALIDATE RELY;
> Error: Error while compiling statement: FAILED: ParseException line 1:43 
> cannot recognize input near 'CONSTRAINT' '2e47abb2' '-' in add constraint 
> statement (state=42000,code=4) {code}
> Ideally all identifiers should be unparsed in the output of show create table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27828) Iceberg integration: enable copy on write update when split update is on

2023-10-27 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-27828:
-

 Summary: Iceberg integration: enable copy on write update when 
split update is on
 Key: HIVE-27828
 URL: https://issues.apache.org/jira/browse/HIVE-27828
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Currently {{hive.split.update}} has to be set to {{false}} if a copy-on-write 
update should be executed when updating an Iceberg table.


[https://github.com/apache/hive/blob/0233dcc7f1f09198c093cb4b69bd2b2598c97303/iceberg/iceberg-handler/src/test/queries/positive/update_iceberg_copy_on_write_unpartitioned.q#L1]

[https://github.com/apache/hive/blob/0233dcc7f1f09198c093cb4b69bd2b2598c97303/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L78-L81]

 

Copy-on-write mode should be independent from split update because split update 
uses positional delete.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27788) Exception in Sort Merge join with Group By + PTF Operator

2023-10-17 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776189#comment-17776189
 ] 

Krisztian Kasa commented on HIVE-27788:
---

Another repro with 3 records and inner join
{code}
set hive.optimize.semijoin.conversion = false;

CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;

insert into tbl1_n5(key, value)
values
(0, 'val_0'),
(2, 'val_2'),
(9, 'val_9');

explain
SELECT t1.key from
(SELECT  key , row_number() over(partition by key order by value desc) as rk 
from tbl1_n5) t1
join
( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2
on t1.key = t2.key where rk = 1;
{code}
{code}
POSTHOOK: query: explain
SELECT t1.key from
(SELECT  key , row_number() over(partition by key order by value desc) as rk 
from tbl1_n5) t1
join
( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2
on t1.key = t2.key where rk = 1
POSTHOOK: type: QUERY
POSTHOOK: Input: default@tbl1_n5
 A masked pattern was here 
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
 A masked pattern was here 
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: tbl1_n5
  filterExpr: key is not null (type: boolean)
  Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
Reduce Output Operator
  key expressions: key (type: int), value (type: string)
  null sort order: aa
  sort order: +-
  Map-reduce partition columns: key (type: int)
  Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map 3 
Map Operator Tree:
TableScan
  alias: tbl1_n5
  filterExpr: key is not null (type: boolean)
  Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
Group By Operator
  keys: key (type: int), value (type: string)
  minReductionHashAggr: 0.4
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: string)
null sort order: zz
sort order: ++
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2 
Reduce Operator Tree:
  Group By Operator
keys: KEY._col0 (type: int), KEY._col1 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: _col0 (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
  Group By Operator
keys: _col0 (type: int)
mode: complete
outputColumnNames: _col0
Statistics: Num rows: 3 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
Dummy Store
Execution mode: llap
Reduce Operator Tree:
  Select Operator
expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 
(type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
PTF Operator
  Function definitions:
  Input definition
input alias: ptf_0
output shape:

[jira] [Resolved] (HIVE-27761) Compilation of nested CTEs throws SemanticException

2023-10-05 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27761.
---
Resolution: Fixed

Merged to master. Thanks [~soumyakanti.das] for the patch.

> Compilation of nested CTEs throws SemanticException
> ---
>
> Key: HIVE-27761
> URL: https://issues.apache.org/jira/browse/HIVE-27761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> Currently, nested CTEs are not supported in Hive. Simple repro:
> {code:java}
> with
> test1 as (
> with t1 as (select 1)
> select 1
> )
> select * from test1;
>  org.apache.hadoop.hive.ql.parse.SemanticException: Line 5:13 Ambiguous table 
> alias 't1'
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processCTE(SemanticAnalyzer.java:1310)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1980)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-10-02 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771097#comment-17771097
 ] 

Krisztian Kasa commented on HIVE-27754:
---

If the expression in the where clause has logical operators ({{{}OR{}}}, 
{{{}AND{}}}, ...) the operands are implicitly casted to boolean
https://github.com/apache/hive/blob/85f6162becb8723ff6c9f85875048ced6ca7ae89/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L842-L847

> Query Filter with OR condition updates every record in the table
> 
>
> Key: HIVE-27754
> URL: https://issues.apache.org/jira/browse/HIVE-27754
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
>  
> {noformat}
> UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' 
> ;{noformat}
>  After the above statement, all the records are updated. The condition 
> {{'Taylor'}} is a constant string, and it will always evaluate to true 
> because it's a non-empty string. So, effectively,  {{UPDATE}} statement is 
> updating all rows in the {{customers_man.}}
> {{}}
> {{Repro: }}
> {noformat}
> create  table customers_man (customer_id bigint, first_name string) 
> PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES 
> ('transactional'='true');
>  insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", 
> "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", 
> "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", 
> "Johnson"), (3, "Trudy", "Henderson");
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 1  | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 1  | Sharon| Taylor
>|
>  
> ++---+--+
>  UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> last_name='Taylor' ;
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 22 | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 22 | Sharon| Taylor
>|
>  
> ++---+--+
>   UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> 'Taylor' ;
>   select * from customers_man;
>   
> ++---+--+
>   | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>   
> ++---+--+
>   | 22 | Blake | Burr 
> |
>   | 22 | Jake  | Donnel   
> |
>   | 22 | Trudy |

[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-09-28 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770037#comment-17770037
 ] 

Krisztian Kasa commented on HIVE-27754:
---

{code}
set hive.cbo.fallback.strategy=NEVER;
{code}
Can be used to prevent running these statements.
see also:
https://github.com/apache/hive/blob/2dbfbeefc1a73d6a50f1c829658846fc827fc780/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L687-L688

> Query Filter with OR condition updates every record in the table
> 
>
> Key: HIVE-27754
> URL: https://issues.apache.org/jira/browse/HIVE-27754
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>
>  
> {noformat}
> UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 'Taylor' 
> ;{noformat}
>  After the above statement, all the records are updated. The condition 
> {{'Taylor'}} is a constant string, and it will always evaluate to true 
> because it's a non-empty string. So, effectively,  {{UPDATE}} statement is 
> updating all rows in the {{customers_man.}}
> {{}}
> {{Repro: }}
> {noformat}
> create  table customers_man (customer_id bigint, first_name string) 
> PARTITIONED BY (last_name string) STORED AS orc TBLPROPERTIES 
> ('transactional'='true');
>  insert into customers_man values(1, "Joanna", "Pierce"),(1, "Sharon", 
> "Taylor"), (2, "Joanna", "Silver"), (2, "Bob", "Silver"), (2, "Susan", 
> "Morrison") ,(2, "Jake", "Donnel") , (3, "Blake", "Burr"), (3, "Trudy", 
> "Johnson"), (3, "Trudy", "Henderson");
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 1  | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 1  | Sharon| Taylor
>|
>  
> ++---+--+
>  UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> last_name='Taylor' ;
>  select * from customers_man;
>  
> ++---+--+
>  | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>  
> ++---+--+
>  | 3  | Blake | Burr  
>|
>  | 2  | Jake  | Donnel
>|
>  | 3  | Trudy | Henderson 
>|
>  | 3  | Trudy | Johnson   
>|
>  | 2  | Susan | Morrison  
>|
>  | 22 | Joanna| Pierce
>|
>  | 2  | Joanna| Silver
>|
>  | 2  | Bob   | Silver
>|
>  | 22 | Sharon| Taylor
>|
>  
> ++---+--+
>   UPDATE customers_man SET customer_id=22 WHERE last_name='Pierce' OR 
> 'Taylor' ;
>   select * from customers_man;
>   
> ++---+--+
>   | customers_man.customer_id  | customers_man.first_name  | 
> customers_man.last_name  |
>   
> ++---+--+
>   | 22 | Blake | Burr 
> |
>   | 22 | Jake  | Donnel   
> |
>   | 22 | Trudy | Henderson
> |
>   | 22

[jira] [Commented] (HIVE-27754) Query Filter with OR condition updates every record in the table

2023-09-28 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770032#comment-17770032
 ] 

Krisztian Kasa commented on HIVE-27754:
---

A simple repro with a query:
{code} 
create table t1 (a int);

insert into t1(a) values (1), (2), (NULL);

select * from t1 where 'anything';
{code}
returns
{code}
1
2
NULL
{code}

CBO is failing in this case. From hive.log
{code}
2023-09-28T05:14:55,578 ERROR [08def54d-804f-44fc-8452-c9873eb3a06e Listener at 
0.0.0.0/36139] parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Filter 
expression with non-boolean return type.
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3216)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3202)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3399)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3410)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5084)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1649)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1593)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1345)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13023)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:467)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)

[jira] [Resolved] (HIVE-27727) Materialized view query rewrite fails if query has decimal derived aggregate

2023-09-26 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27727.
---
Resolution: Fixed

Merged to master. Thanks [~zabetak] for review.

> Materialized view query rewrite fails if query has decimal derived aggregate
> 
>
> Key: HIVE-27727
> URL: https://issues.apache.org/jira/browse/HIVE-27727
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: cbo, materializedviews, pull-request-available
>
> {code}
> create table t1 (a int, b decimal(3,2)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create materialized view mv1 as
> select a, sum(b), count(b) from t1 group by a;
> explain cbo
> select a, avg(b) from t1 group by a;
> {code}
> MV is not used
> {code}
> CBO PLAN:
> HiveProject(a=[$0], _o__c1=[CAST(/($1, $2)):DECIMAL(7, 6)])
>   HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[count($1)])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}
> If {{avg}} input is not decimal but for example {{int}} the query plan is 
> rewritten to use the MV



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27727) Materialized view query rewrite fails if query has decimal derived aggregate

2023-09-25 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27727:
--
Description: 
{code}
create table t1 (a int, b decimal(3,2)) stored as orc TBLPROPERTIES 
('transactional'='true');

create materialized view mv1 as
select a, sum(b), count(b) from t1 group by a;

explain cbo
select a, avg(b) from t1 group by a;
{code}
MV is not used
{code}
CBO PLAN:
HiveProject(a=[$0], _o__c1=[CAST(/($1, $2)):DECIMAL(7, 6)])
  HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[count($1)])
HiveTableScan(table=[[default, t1]], table:alias=[t1])
{code}

If {{avg}} input is not decimal but for example {{int}} the query plan is 
rewritten to use the MV

> Materialized view query rewrite fails if query has decimal derived aggregate
> 
>
> Key: HIVE-27727
> URL: https://issues.apache.org/jira/browse/HIVE-27727
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: cbo, materializedviews
>
> {code}
> create table t1 (a int, b decimal(3,2)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create materialized view mv1 as
> select a, sum(b), count(b) from t1 group by a;
> explain cbo
> select a, avg(b) from t1 group by a;
> {code}
> MV is not used
> {code}
> CBO PLAN:
> HiveProject(a=[$0], _o__c1=[CAST(/($1, $2)):DECIMAL(7, 6)])
>   HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[count($1)])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}
> If {{avg}} input is not decimal but for example {{int}} the query plan is 
> rewritten to use the MV



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27727) Materialized view query rewrite fails if query has decimal derived aggregate

2023-09-25 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27727:
--
Labels: cbo materializedviews  (was: )

> Materialized view query rewrite fails if query has decimal derived aggregate
> 
>
> Key: HIVE-27727
> URL: https://issues.apache.org/jira/browse/HIVE-27727
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: cbo, materializedviews
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27727) Materialized view query rewrite fails if query has decimal derived aggregate

2023-09-25 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-27727:
-

Assignee: Krisztian Kasa

> Materialized view query rewrite fails if query has decimal derived aggregate
> 
>
> Key: HIVE-27727
> URL: https://issues.apache.org/jira/browse/HIVE-27727
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27727) Materialized view query rewrite fails if query has decimal derived aggregate

2023-09-25 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-27727:
-

 Summary: Materialized view query rewrite fails if query has 
decimal derived aggregate
 Key: HIVE-27727
 URL: https://issues.apache.org/jira/browse/HIVE-27727
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27558) HBase table query does not push BETWEEN predicate to storage layer

2023-09-15 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27558.
---
Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> HBase table query does not push BETWEEN predicate to storage layer
> --
>
> Key: HIVE-27558
> URL: https://issues.apache.org/jira/browse/HIVE-27558
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> INSERT INTO TABLE target_tbl
> SELECT
>   ...
> FROM
>   (
> SELECT
>   ...
> FROM
>   hbase_tbl
> WHERE
>   CDS_PK >= '2-00OZG-0'
>   and CDS_PK <= '2-00OZG-g'
>   ) CDS_VIEW;
> {code}
> The statement predicate is not pushed to the storage lager causing the job to 
> execute longer than it needs to.
> Possible solutions:
> 1. Support pushing down BETWEEN clause to HBaseStorageHandler
> 2. Don't convert specific filters on key column incase of HBaseStorageHandler 
> or don't apply this optimization for HBaseStorageHandler tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27558) HBase table query does not push BETWEEN predicate to storage layer

2023-09-15 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27558:
--
Fix Version/s: 4.0.0

> HBase table query does not push BETWEEN predicate to storage layer
> --
>
> Key: HIVE-27558
> URL: https://issues.apache.org/jira/browse/HIVE-27558
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> INSERT INTO TABLE target_tbl
> SELECT
>   ...
> FROM
>   (
> SELECT
>   ...
> FROM
>   hbase_tbl
> WHERE
>   CDS_PK >= '2-00OZG-0'
>   and CDS_PK <= '2-00OZG-g'
>   ) CDS_VIEW;
> {code}
> The statement predicate is not pushed to the storage lager causing the job to 
> execute longer than it needs to.
> Possible solutions:
> 1. Support pushing down BETWEEN clause to HBaseStorageHandler
> 2. Don't convert specific filters on key column incase of HBaseStorageHandler 
> or don't apply this optimization for HBaseStorageHandler tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table

2023-09-15 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27138.
---
Resolution: Fixed

Merged to master. Thanks [~seonggon] for the patch.

> MapJoinOperator throws NPE when computing OuterJoin with filter expressions 
> on small table
> --
>
> Key: HIVE-27138
> URL: https://issues.apache.org/jira/browse/HIVE-27138
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. 
> (I used TestMiniLlapCliDriver.)
> The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives 
> the last object from the given list.
> To the best of my knowledge, if Hive selects MapJoin to perform Join 
> operation, filterTag should be computed and appended to a row before the row 
> is passed to MapJoinOperator.
> In the case of MapReduce engine, this is done by HashTableSinkOperator.
> However, I cannot find any logic pareparing filterTag for small tables when 
> Hive uses Tez engine.
> I think there are 2 available options:
> 1. Don't use MapJoinOperator if a small table has filter expression.
> 2. Add a new logic that computes and passes filterTag to MapJoinOperator.
> I am working on the second option and ready to discuss about it.
> It would be grateful if you could give any opinion about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table

2023-09-15 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-27138:
-

Assignee: Seonggon Namgung  (was: Sönke Liebau)

> MapJoinOperator throws NPE when computing OuterJoin with filter expressions 
> on small table
> --
>
> Key: HIVE-27138
> URL: https://issues.apache.org/jira/browse/HIVE-27138
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. 
> (I used TestMiniLlapCliDriver.)
> The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives 
> the last object from the given list.
> To the best of my knowledge, if Hive selects MapJoin to perform Join 
> operation, filterTag should be computed and appended to a row before the row 
> is passed to MapJoinOperator.
> In the case of MapReduce engine, this is done by HashTableSinkOperator.
> However, I cannot find any logic pareparing filterTag for small tables when 
> Hive uses Tez engine.
> I think there are 2 available options:
> 1. Don't use MapJoinOperator if a small table has filter expression.
> 2. Add a new logic that computes and passes filterTag to MapJoinOperator.
> I am working on the second option and ready to discuss about it.
> It would be grateful if you could give any opinion about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-24606) Multi-stage materialized CTEs can lose intermediate data

2023-09-15 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24606:
--
Fix Version/s: 4.0.0

> Multi-stage materialized CTEs can lose intermediate data
> 
>
> Key: HIVE-24606
> URL: https://issues.apache.org/jira/browse/HIVE-24606
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With complex multi-stage CTEs, Hive can start a latter stage before its 
> previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve 
> dependency between multistage materialized CTEs when a non-materialized CTE 
> cuts in.
>  
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It 
> means the dependency between `a1` and `a2` will be ignored and `a2` can start 
> without waiting for `a1`. As a result, the above query returns the following 
> result.
> {code:java}
> +-+
> | id  |
> +-+
> | a1  |
> | x   |
> +-+
> {code}
> For your information, I ran this test with revision = 
> 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-24606) Multi-stage materialized CTEs can lose intermediate data

2023-09-15 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-24606.
---
Resolution: Fixed

Merged [4363|https://github.com/apache/hive/pull/4363] to master. Thanks 
[~okumin] for the patch.

> Multi-stage materialized CTEs can lose intermediate data
> 
>
> Key: HIVE-24606
> URL: https://issues.apache.org/jira/browse/HIVE-24606
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With complex multi-stage CTEs, Hive can start a latter stage before its 
> previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve 
> dependency between multistage materialized CTEs when a non-materialized CTE 
> cuts in.
>  
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It 
> means the dependency between `a1` and `a2` will be ignored and `a2` can start 
> without waiting for `a1`. As a result, the above query returns the following 
> result.
> {code:java}
> +-+
> | id  |
> +-+
> | a1  |
> | x   |
> +-+
> {code}
> For your information, I ran this test with revision = 
> 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27690) Handle casting NULL literal to complex type

2023-09-14 Thread Krisztian Kasa (Jira)

Krisztian Kasa created HIVE-27690:
-

 Summary: Handle casting NULL literal to complex type
 Key: HIVE-27690
 URL: https://issues.apache.org/jira/browse/HIVE-27690
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{{NULL}} literal values of a complex type column are treated as void typed 
literals.
{code:java}
create table explain_npe_map( c1 map );
explain select c1 from explain_npe_map where c1 is null;
{code}
[https://github.com/apache/hive/blob/88bc8269a64d31eee372bf3602933c75283c686b/ql/src/test/results/clientpositive/llap/analyze_npe.q.out#L142]

The goal of this patch is to use the original complex type:
{code:java}
  Select Operator
expressions: Const map null (type: 
map)
{code}

Void typed {{NULL}} literals makes CTAS statements are fail since the original 
complex type can not be inferred.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27648) CREATE TABLE with CHECK constraint fails with SemanticException

2023-09-13 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27648.
---
Resolution: Fixed

Merged to master. Thanks [~dkuzmenko] and [~soumyakanti.das] for review.

> CREATE TABLE with CHECK constraint fails with SemanticException
> ---
>
> Key: HIVE-27648
> URL: https://issues.apache.org/jira/browse/HIVE-27648
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When we run:
> {code:java}
> create table test (
> col1 int,
> `col 2` int check (`col 2` > 10) enable novalidate rely,
> constraint check_constraint check (col1 + `col 2` > 15) enable novalidate 
> rely
> );
> {code}
> It fails with:
>  
> {code:java}
>  org.apache.hadoop.hive.ql.parse.SemanticException: Invalid Constraint syntax 
> Invalid CHECK constraint expression: col 2 > 10.
>     at 
> org.apache.hadoop.hive.ql.ddl.table.constraint.ConstraintsUtils.validateCheckConstraint(ConstraintsUtils.java:462)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:13839)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12618)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12787)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:467)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>     at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>     at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:733)
>     at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:703)
>     at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>     at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>     at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>  {code}
>  
> I noticed while debugging that the check constraint expression in 
> [cc.getCheck_expression()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/table/constraint/ConstraintsUtils.java#L446]
>  doesn't include the backticks (`), and this results in wrong token 
> generation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1626 matches

Mail list logo