[jira] [Assigned] (HIVE-25358) Remove reviewer pattern

2021-07-20 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-25358:
--

Assignee: Jesus Camacho Rodriguez  (was: Panagiotis Garefalakis)

> Remove reviewer pattern
> ---
>
> Key: HIVE-25358
> URL: https://issues.apache.org/jira/browse/HIVE-25358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25358) Remove reviewer pattern

2021-07-20 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25358:
---
Reporter: Jesus Camacho Rodriguez  (was: Panagiotis Garefalakis)

> Remove reviewer pattern
> ---
>
> Key: HIVE-25358
> URL: https://issues.apache.org/jira/browse/HIVE-25358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25358) Remove reviewer pattern

2021-07-20 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-25358:
--


> Remove reviewer pattern
> ---
>
> Key: HIVE-25358
> URL: https://issues.apache.org/jira/browse/HIVE-25358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25283) Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup conversion fails on output mismatch after alter table

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-25283:
--

Assignee: Steve Carlin

> Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup 
> conversion fails on output mismatch after alter table
> ---
>
> Key: HIVE-25283
> URL: https://issues.apache.org/jira/browse/HIVE-25283
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25283) Schema evolution fails on output mismatch after alter table

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25283.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~scarlin]!

> Schema evolution fails on output mismatch after alter table
> ---
>
> Key: HIVE-25283
> URL: https://issues.apache.org/jira/browse/HIVE-25283
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25283) Schema evolution fails on output mismatch after alter table

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25283:
---
Summary: Schema evolution fails on output mismatch after alter table  (was: 
Schema Evolution tests for IntToNumericGroup and BigintToNumericGroup 
conversion fails on output mismatch after alter table)

> Schema evolution fails on output mismatch after alter table
> ---
>
> Key: HIVE-25283
> URL: https://issues.apache.org/jira/browse/HIVE-25283
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-22254) Mappings.NoElementException: no target in mapping, in `MaterializedViewAggregateRule

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-22254.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Mappings.NoElementException: no target in mapping, in 
> `MaterializedViewAggregateRule
> 
>
> Key: HIVE-22254
> URL: https://issues.apache.org/jira/browse/HIVE-22254
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: ojoin_full.sql
>
>
> A Mappings.NoElementException happens on an edge condition for a query using 
> a materialized view.
> The query contains a "group by" clause which contains fields from both sides 
> of a join.  There is no real reason to group by this same field twice, but 
> there is also no reason that this shouldn't succeed.
> Attached is a script which causes this failure.  The query causing the 
> problem looks like this:
> explain extended select sum(1)
> from fact inner join dim1
> on fact.f1 = dim1.pk1
> group by f1, pk1;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23456) Upgrade Calcite version to 1.25.0

2021-06-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-23456:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~soumyakanti.das], [~zabetak]!

> Upgrade Calcite version to 1.25.0
> -
>
> Key: HIVE-23456
> URL: https://issues.apache.org/jira/browse/HIVE-23456
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23456.01.patch, HIVE-23456.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-22806) Missing materialized view rewrite in case the filter is further narrowed

2021-06-23 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-22806.

Fix Version/s: 4.0.0
   Resolution: Duplicate

> Missing materialized view rewrite in case the filter is further narrowed
> 
>
> Key: HIVE-22806
> URL: https://issues.apache.org/jira/browse/HIVE-22806
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Zoltan Haindrich
>Priority: Major
> Fix For: 4.0.0
>
>
> I was checking some basic things when I've noticed that mv rewriting doesn't 
> kick in for some cases:
> {code}
> explain
> SELECT empid, deptname
> FROM emps
> JOIN depts
>   using (deptno)
> WHERE hire_date >= 600
> AND hire_date <= 1200-- depending on the presence of this condition 
> the rewrite may not happen
> ;
> {code}
> qtest:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.stats.fetch.column.stats=true;
> set hive.materializedview.rewriting=true;
> -- create some tables
> CREATE TABLE emps (
>   empid INT,
>   deptno INT,
>   name VARCHAR(256),
>   salary FLOAT,
>   hire_date int)
> STORED AS ORC
> TBLPROPERTIES ('transactional'='true');
>  
> CREATE TABLE depts (
>   deptno INT,
>   deptname VARCHAR(256),
>   locationid INT)
> STORED AS ORC
> TBLPROPERTIES ('transactional'='true');
> -- load data
> insert into emps values (100, 10, 'Bill', 1, 1000), (200, 20, 'Eric', 
> 8000, 500),
>   (150, 10, 'Sebastian', 7000, null), (110, 10, 'Theodore', 1, 250), 
> (120, 10, 'Bill', 1, 250)
>   ;
> insert into depts values (10, 'Sales', 10), (30, 'Marketing', null), (20, 
> 'HR', 20);
> alter table emps add constraint pk1 primary key (empid) disable novalidate 
> rely;
> alter table depts add constraint pk2 primary key (deptno) disable novalidate 
> rely;
> alter table emps add constraint fk1 foreign key (deptno) references 
> depts(deptno) disable novalidate rely;
> -- create mv
> CREATE MATERIALIZED VIEW mv1
> AS
> SELECT empid, deptname, hire_date
> FROM emps JOIN depts
>   using (deptno)
>   -- ON (emps.deptno = depts.deptno)
> WHERE hire_date >= 500;
> -- expected to see that materialzed view is being used; however it doesnt:
> explain
> SELECT empid, deptname
> FROM emps
> JOIN depts
>   using (deptno)
> WHERE hire_date >= 600
> AND hire_date <= 1200 
> ;
> -- now we can see that the materialzed view is being used:
> explain
> SELECT empid, deptname
> FROM emps
> JOIN depts
>   using (deptno)
> WHERE hire_date >= 600
> --AND hire_date <= 1200  
> ;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-20473) Optimization for materialized views

2021-06-23 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-20473.

Fix Version/s: 4.0.0
   Resolution: Duplicate

> Optimization for materialized views
> ---
>
> Key: HIVE-20473
> URL: https://issues.apache.org/jira/browse/HIVE-20473
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
> Environment: Can be reproduced on a Single node pseudo cluster. 
>Reporter: Shyam Rai
>Priority: Critical
>  Labels: materializedviews
> Fix For: 4.0.0
>
>
> Optimizer is taking advantage of materialized view only when the query syntax 
> matches the way view was created. Here is an example.
> *Source table on which materialized views are created*
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `mysource`(   |
> |   `id` int,|
> |   `name` string,   |
> |   `start_date` date)   |
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
> | WITH SERDEPROPERTIES ( |
> |   'field.delim'=',',   |
> |   'serialization.format'=',')  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.mapred.TextInputFormat'   |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs://xlhive3.openstacklocal:8020/warehouse/tablespace/managed/hive/mysource'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='insert_only',|
> |   'transient_lastDdlTime'='1535392655')|
> ++
> {code}
> One of the materialized views "view_1" is created to fetch the data between 
> IDs 1 and 2 using this statement
> {code}
> select `mysource`.`id`, `mysource`.`name`, `mysource`.`start_date` from 
> `default`.`mysource` where `mysource`.`id` between 1 and 2
> {code}
> *When a SELECT is executed against the source table using the following 
> SELECT statement, this works fine and can be validated with the explain plan.
> *
> {code}
> 0: jdbc:hive2://localhost:1/default> explain select * from mysource where 
> id between 1 and 2;
> INFO  : Compiling 
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): 
> explain select * from mysource where id between 1 and 2
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
> type:string, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); 
> Time taken: 0.224 seconds
> INFO  : Executing 
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): 
> explain select * from mysource where id between 1 and 2
> INFO  : Starting task [Stage-1:EXPLAIN] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); 
> Time taken: 0.006 seconds
> INFO  : OK
> ++
> |  Explain   |
> ++
> | STAGE DEPENDENCIES:|
> |   Stage-0 is a root stage  |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-0   |
> | Fetch Operator |
> |   limit: -1|
> |   Processor Tree:  |
> | TableScan  |
> |   alias: default.view_1|
> |   Select Operator  |
> | expressions: id (type: int), name (type: string), start_date 
> (type: date) |
> | outputColumnNames: _col0, _col1, _col2 |
> | ListSink   |
> ||
> ++
> {code}
> If the rewrite of the same SELECT is written using >= and 

[jira] [Resolved] (HIVE-25229) Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW

2021-06-21 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25229.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~soumyakanti.das]!

> Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW
> -
>
> Key: HIVE-25229
> URL: https://issues.apache.org/jira/browse/HIVE-25229
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While creating materialized view HookContext is supposed to send lineage info 
> which is missing.
> CREATE MATERIALIZED VIEW tbl1_view as select * from tbl1;
> Hook Context passed from hive.ql.Driver to Hive Hook of Atlas through 
> hookRunner.runPostExecHooks call doesn't have lineage info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23987) Upgrade arrow version to 0.11.0

2021-06-08 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23987.

Fix Version/s: 4.0.0
 Assignee: Jesus Camacho Rodriguez  (was: Barnabas Maidics)
   Resolution: Fixed

> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23756) Added more constraints to the package.jdo file

2021-06-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-23756:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~scarlin]!

> Added more constraints to the package.jdo file
> --
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23756.1.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table ([Ref|#L60]]). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23756) Added more constraints to the package.jdo file

2021-06-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-23756:
---
Summary: Added more constraints to the package.jdo file  (was: drop table 
command fails with MySQLIntegrityConstraintViolationException:)

> Added more constraints to the package.jdo file
> --
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23756.1.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table ([Ref|#L60]]). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23756) Added more constraints to the package.jdo file

2021-06-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-23756:
--

Assignee: Steve Carlin  (was: Ganesha Shreedhara)

> Added more constraints to the package.jdo file
> --
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23756.1.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table ([Ref|#L60]]). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-06-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25104.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25183) Parsing error for Correlated Inner Joins

2021-06-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25183.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~soumyakanti.das]!

> Parsing error for Correlated Inner Joins
> 
>
> Key: HIVE-25183
> URL: https://issues.apache.org/jira/browse/HIVE-25183
> Project: Hive
>  Issue Type: Sub-task
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-25090



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25129) Wrong results when timestamps stored in Avro/Parquet fall into the DST shift

2021-06-01 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355217#comment-17355217
 ] 

Jesus Camacho Rodriguez commented on HIVE-25129:


It's been a while, but I assume if we do timezone shifting, e.g., we use the 
old write path, this may still occur. On the other hand, I think this would be 
fixed once we write timestamp as it is represented internally, i.e., in UTC.

> Wrong results when timestamps stored in Avro/Parquet fall into the DST shift
> 
>
> Key: HIVE-25129
> URL: https://issues.apache.org/jira/browse/HIVE-25129
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: parquet_timestamp_dst.q
>
>
> Timestamp values falling into the daylight savings time of the system 
> timezone cannot be retrieved as is when those are stored in Parquet/Avro 
> tables. The respective SELECT query shifts those timestamps by +1 reflecting 
> the DST shift.
> +Example+
> {code:sql}
> --! qt:timezone:US/Pacific
> create table employee (eid int, birthdate timestamp) stored as parquet;
> insert into employee values (0, '2019-03-10 02:00:00');
> insert into employee values (1, '2020-03-08 02:00:00');
> insert into employee values (2, '2021-03-14 02:00:00');
> select eid, birthdate from employee order by eid;{code}
> +Actual results+
> |0|2019-03-10 03:00:00|
> |1|2020-03-08 03:00:00|
> |2|2021-03-14 03:00:00|
> +Expected results+
> |0|2019-03-10 02:00:00|
> |1|2020-03-08 02:00:00|
> |2|2021-03-14 02:00:00|
> Storing and retrieving values in columns using the [timestamp data 
> type|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]
>  (equivalent with LocalDateTime java API) should not alter at any way the 
> value that the user is seeing. The results are correct for {{TEXTFILE}} and 
> {{ORC}} tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25054) Upgrade jodd-core due to CVE-2018-21234

2021-05-21 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25054.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~achennagiri]!

> Upgrade jodd-core due to CVE-2018-21234
> ---
>
> Key: HIVE-25054
> URL: https://issues.apache.org/jira/browse/HIVE-25054
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Abhay
>Assignee: Abhay
>Priority: Major
> Fix For: 4.0.0
>
>
> Hive makes use of 3.5.2 version of the `jodd-core` library which is 
> susceptible to CVE-2018-21234. Below is a description of that vulnerability.
> CVE-2018-21234  suppress
> Jodd before 5.0.4 performs Deserialization of Untrusted JSON Data when 
> setClassMetadataName is set.
> CWE-502 Deserialization of Untrusted Data
> CVSSv2:
> Base Score: HIGH (7.5)
> Vector: /AV:N/AC:L/Au:N/C:P/I:P/A:P
> CVSSv3:
> Base Score: CRITICAL (9.8)
> Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
> References:
> MISC - 
> https://github.com/oblac/jodd/commit/9bffc3913aeb8472c11bb543243004b4b4376f16MISC
>  - https://github.com/oblac/jodd/compare/v5.0.3...v5.0.4MISC - 
> https://github.com/oblac/jodd/issues/628Vulnerable Software & Versions:
> cpe:2.3:a:jodd:jodd:*:*:*:*:*:*:*:* versions up to (excluding) 5.0.4
>  
> This library needs to be upgraded. We use a couple of classes 
> `JDateTime`([https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java]
>  ) and `HtmlEncoder`, which have either been deprecated and/or have been 
> moved to a different package called jodd-util.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25046) Log CBO plans right after major transformations

2021-05-20 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-25046.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Log CBO plans right after major transformations
> ---
>
> Key: HIVE-25046
> URL: https://issues.apache.org/jira/browse/HIVE-25046
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the results of various CBO transformations are logged (in DEBUG 
> mode) at the end of the optimization 
> [phase|https://github.com/apache/hive/blob/9f5bd72e908244b2fe915e8dc39f55afa94bbffa/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L2106]
>  and only if we are not in test mode. This has some disadvantages:
> * If there is a failure (exception) in some intermediate step we will miss 
> all the intermediate  plans, possibly losing track of what plan led to the 
> problem.
> * Intermediate logs are very useful for identifying plan problems while 
> working on a patch; unfortunately the logs are explicitly disabled in test 
> mode which means that in order to appear the respective code needs to change 
> every time we need to see those logs.
> * Logging at the end necessitates keeping additional local variables that 
> make code harder to read.
> The goal of this issue is to place DEBUG logging right after major 
> transformations and independently if we are running in test mode or not to 
> alleviate the shortcomings mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25105) Support Parquet as default MV storage format

2021-05-14 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25105:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~kkasa]!

> Support Parquet as default MV storage format
> 
>
> Key: HIVE-25105
> URL: https://issues.apache.org/jira/browse/HIVE-25105
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the default supported storage formats do not include Parquet:
> {code}
> ...
> HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
> "ORC",
> new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25105) Support Parquet as default MV storage format

2021-05-12 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25105:
---
Description: 
Currently the default supported storage formats do not include Parquet:

{code}
...
HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
"ORC",
new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
...
{code}

  was:
Currently the support storage formats do not include Parquet:

{code}
...
HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
"ORC",
new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
...
{code}


> Support Parquet as default MV storage format
> 
>
> Key: HIVE-25105
> URL: https://issues.apache.org/jira/browse/HIVE-25105
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the default supported storage formats do not include Parquet:
> {code}
> ...
> HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
> "ORC",
> new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25105) Support Parquet as default MV storage format

2021-05-12 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25105:
---
Summary: Support Parquet as default MV storage format  (was: Support 
Parquet as MV storage format)

> Support Parquet as default MV storage format
> 
>
> Key: HIVE-25105
> URL: https://issues.apache.org/jira/browse/HIVE-25105
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the support storage formats do not include Parquet:
> {code}
> ...
> HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
> "ORC",
> new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25105) Support Parquet as MV storage format

2021-05-11 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25105:
---
Status: Patch Available  (was: In Progress)

> Support Parquet as MV storage format
> 
>
> Key: HIVE-25105
> URL: https://issues.apache.org/jira/browse/HIVE-25105
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>
> Currently the support storage formats do not include Parquet:
> {code}
> ...
> HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
> "ORC",
> new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25105) Support Parquet as MV storage format

2021-05-11 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25105 started by Jesus Camacho Rodriguez.
--
> Support Parquet as MV storage format
> 
>
> Key: HIVE-25105
> URL: https://issues.apache.org/jira/browse/HIVE-25105
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>
> Currently the support storage formats do not include Parquet:
> {code}
> ...
> HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
> "ORC",
> new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25105) Support Parquet as MV storage format

2021-05-11 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-25105:
--


> Support Parquet as MV storage format
> 
>
> Key: HIVE-25105
> URL: https://issues.apache.org/jira/browse/HIVE-25105
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>
> Currently the support storage formats do not include Parquet:
> {code}
> ...
> HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
> "ORC",
> new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction

2021-05-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24840:
---
Fix Version/s: 4.0.0

> Materialized View incremental rebuild produces wrong result set after 
> compaction
> 
>
> Key: HIVE-24840
> URL: https://issues.apache.org/jira/browse/HIVE-24840
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as 
> select a,b,c from t1 where a > 0 or a is null;
> delete from t1 where a = 1;
> alter table t1 compact 'major';
> -- Wait until compaction finished.
> alter materialized view mat1 rebuild;
> {code}
> Expected result of query
> {code}
> select * from mat1;
> {code}
> {code}
> 2 two 2
> NULL NULL NULL
> {code}
> but if incremental rebuild is enabled the result is
> {code}
> 1 one 1
> 2 two 2
> NULL NULL NULL
> {code}
> Cause: Incremental rebuild queries whether the source tables of a 
> materialized view has delete or update transaction since the last rebuild 
> from metastore from COMPLETED_TXN_COMPONENTS table. However when a major 
> compaction is performed on the source tables the records related to these 
> tables are deleted from COMPLETED_TXN_COMPONENTS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25066) Show whether a materialized view supports incremental review or not

2021-04-29 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335577#comment-17335577
 ] 

Jesus Camacho Rodriguez commented on HIVE-25066:


[~kkasa], can we store this information in the HS2 
{{HiveMaterializedViewsRegistry}} itself when the MV is loaded, without 
persisting it in HMS? For instance, using some variables in 
{{HiveRelOptMaterialization}}? That will remove the burden of maintaining 
redundant information for MVs, modifying HMS tables, etc. At the same time, it 
will make {{show materialized views}} faster. Would that be possible?

> Show whether a materialized view supports incremental review or not
> ---
>
> Key: HIVE-25066
> URL: https://issues.apache.org/jira/browse/HIVE-25066
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> Add information about whether a materialized view supports incremental 
> rebuild or not in an additional column in
> {code:java}
> SHOW MATERIALIZED VIEWS
> {code}
> statement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-28 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24998.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~soumyakanti.das]!

> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24976) CBO: count(distinct) in a window function fails CBO

2021-04-26 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24976:
---
Fix Version/s: 4.0.0

> CBO: count(distinct) in a window function fails CBO
> ---
>
> Key: HIVE-24976
> URL: https://issues.apache.org/jira/browse/HIVE-24976
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Gopal Vijayaraghavan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> create temporary table tmp_tbl(
> `rule_id` string,
> `severity` string,
> `alert_id` string,
> `alert_type` string);
> explain cbo
> select `k`.`rule_id`,
> count(distinct `k`.`alert_id`) over(partition by `k`.`rule_id`) `subj_cnt`
> from tmp_tbl k
> ;
> explain
> select `k`.`rule_id`,
> count(distinct `k`.`alert_id`) over(partition by `k`.`rule_id`) `subj_cnt`
> from tmp_tbl k
> ;
> {code}
> Fails CBO, because the count(distinct) is not being recognized as belonging 
> to a windowing operation.
> So it throws the following exception
> {code}
> throw new CalciteSemanticException("Distinct without an 
> aggregation.",
> UnsupportedFeature.Distinct_without_an_aggreggation);
> {code}
> https://github.com/apache/hive/blob/73c3770d858b063c69dea6c64a759f8fdacad460/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L4914
> This prevents a query like this from using a materialized view which already 
> exists in the system (the MV obviously does not contain this expression, but 
> represents a complex transform from a JSON structure into a columnar layout).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24999) HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple correlations

2021-04-22 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24999:
---
Fix Version/s: 4.0.0

> HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple 
> correlations
> 
>
> Key: HIVE-24999
> URL: https://issues.apache.org/jira/browse/HIVE-24999
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> The problem can be reproduced by using the following query which at the 
> moment can be found in {{subquery_in.q}} file:
> {code:sql}
> explain cbo select * from part where p_name IN (select p_name from part p 
> where p.p_size = part.p_size AND part.p_size + 121150 = p.p_partkey );
> {code}
> The plans before and after {{HiveSubQueryRemoveRule}} are shown below:
> {noformat}
> 2021-04-09T14:29:08,031 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] 
> parse.CalcitePlanner: Plan before removing subquery:
> HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], 
> p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], 
> p_comment=[$8])
>   HiveFilter(condition=[IN($1, {
> HiveProject(p_name=[$1])
>   HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 121150), 
> $0))])
> HiveTableScan(table=[[default, part]], table:alias=[p])
> })])
> HiveTableScan(table=[[default, part]], table:alias=[part])
> 2021-04-09T14:29:08,056 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] 
> parse.CalcitePlanner: Plan just after removing subquery:
> HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], 
> p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], 
> p_comment=[$8])
>   HiveFilter(condition=[=($1, $12)])
> LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{5}])
>   HiveTableScan(table=[[default, part]], table:alias=[part])
>   HiveProject(p_name=[$1])
> HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 
> 121150), $0))])
>   HiveTableScan(table=[[default, part]], table:alias=[p])
> {noformat}
> The plan after applying the rule is invalid. The 
> {{HiveFilter(condition=[=($1, $12)])}} above the correlate references columns 
> ($12) from the right input which do not exist since the correlate is of type 
> SEMI. Running the test with {{-Dcalcite.debug}} property enabled raises an 
> {{AssertionError}} when building the {{HiveFilter}}.
> The problem is hidden at the moment since there is a specific hack in 
> {{HiveRelDecorrelator}} that turns this invalid plan into a valid one. This 
> mechanism is very brittle and it can break easily as it happened while fixing 
> HIVE-24957.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-04-22 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24957:
---
Fix Version/s: 4.0.0

> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25028) Hive: Select query with IS operator producing unexpected result

2021-04-19 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-25028:
---
Reporter: Manthan B Y  (was: Soumyakanti Das)

> Hive: Select query with IS operator producing unexpected result
> ---
>
> Key: HIVE-25028
> URL: https://issues.apache.org/jira/browse/HIVE-25028
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>
> Hive: Select query with IS operator is producing unexpected result.
> The following was executed on postgres:
> {code:java}
> sqlancer=# create table if not exists emp(name text, age int);
> CREATE TABLE
> sqlancer=# insert into emp values ('a', 5), ('b', 15), ('c', 12);
> INSERT 0 3
> sqlancer=# select emp.age from emp where emp.age > 10;
>  age
> -
>   15
>   12
> (2 rows)sqlancer=# select emp.age > 10 is true from emp;
>  ?column?
> --
>  f
>  t
>  t
> (3 rows){code}
> This is happening because IS operator has higher precedence than comparison 
> operators in Hive. In most other databases, comparison operator has higher 
> precedence. The grammar needs to be changed to fix the precedence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations

2021-04-13 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24854:
---
Fix Version/s: 4.0.0

> Incremental Materialized view refresh in presence of update/delete operations
> -
>
> Key: HIVE-24854
> URL: https://issues.apache.org/jira/browse/HIVE-24854
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Current implementation of incremental Materialized can not be used if any of 
> the Materialized view source tables has update or delete operation since the 
> last rebuild. In such cases a full rebuild should be performed.
> Steps to enable incremental rebuild:
> 1. Introduce a new virtual column to mark a row deleted
> 2. Execute the query in the view definition 
> 2.a. Add filter to each table scan in order to pull only the rows from each 
> source table which has a higher writeId than the writeId of the last rebuild 
> - this is already implemented by current incremental rebuild
> 2.b Add row is deleted virtual column to each table scan. In join nodes if 
> any of the branches has a deleted row the result row is also deleted.
> We should distinguish two type of view definition queries: with and without 
> Aggregate.
> 3.a No aggregate path:
> Rewrite the plan of the full rebuild to a multi insert statement with two 
> insert branches. One branch to insert new rows into the materialized view 
> table and the second one for insert deleted rows to the materialized view 
> delete delta.
> 3.b Aggregate path: TBD
> Prerequisite:
> source tables haven't compacted since the last MV revuild



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-09 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24998:
---
Reporter: Manthan B Y  (was: Soumyakanti Das)

> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24595) Vectorization causing incorrect results for scalar subquery

2021-04-05 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17315128#comment-17315128
 ] 

Jesus Camacho Rodriguez commented on HIVE-24595:


Good catch :)

Would it still be valuable to vectorized the UDF from performance standpoint?

> Vectorization causing incorrect results for scalar subquery
> ---
>
> Key: HIVE-24595
> URL: https://issues.apache.org/jira/browse/HIVE-24595
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Vineet Garg
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
>  CREATE EXTERNAL TABLE `alltypessmall`( 
>`id` int,
>`bool_col` boolean,  
>`tinyint_col` tinyint,   
>`smallint_col` smallint, 
>`int_col` int,   
>`bigint_col` bigint, 
>`float_col` float,   
>`double_col` double, 
>`date_string_col` string,
>`string_col` string, 
>`timestamp_col` timestamp)   
>  PARTITIONED BY (   
>`year` int,  
>`month` int) 
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
>  WITH SERDEPROPERTIES ( 
>'escape.delim'='\\', 
>'field.delim'=',',   
>'serialization.format'=',')  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.mapred.TextInputFormat'   
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
>  TBLPROPERTIES (
>'DO_NOT_UPDATE_STATS'='true',
>'OBJCAPABILITIES'='EXTREAD,EXTWRITE',
>'STATS_GENERATED'='TASK',
>'impala.lastComputeStatsTime'='1608312793',  
>'transient_lastDdlTime'='1608310442');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,40,3434,5.4,44.3,'str1','str2', '01-01-2001');
> {code}
> Following query should fail but it succeeds
> {code:sql}
> SELECT id FROM alltypessmall
> WHERE int_col =
>   (SELECT int_col
>FROM alltypessmall)
> ORDER BY id;
> {code}
> *Explain plan*
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)
>   DagName: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: alltypessmall
>   filterExpr: int_col is not null (type: boolean)
>   Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Filter Operator
> predicate: int_col is not null (type: boolean)
> Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: id (type: int), int_col (type: int)
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0
>   1
> outputColumnNames: _col0, _col1
> input vertices:
>   1 Reducer 4
> Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE 

[jira] [Commented] (HIVE-24595) Vectorization causing incorrect results for scalar subquery

2021-04-05 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17315084#comment-17315084
 ] 

Jesus Camacho Rodriguez commented on HIVE-24595:


[~mustafaiman], fyi HIVE-24638 has been fixed.

> Vectorization causing incorrect results for scalar subquery
> ---
>
> Key: HIVE-24595
> URL: https://issues.apache.org/jira/browse/HIVE-24595
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Vineet Garg
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
>  CREATE EXTERNAL TABLE `alltypessmall`( 
>`id` int,
>`bool_col` boolean,  
>`tinyint_col` tinyint,   
>`smallint_col` smallint, 
>`int_col` int,   
>`bigint_col` bigint, 
>`float_col` float,   
>`double_col` double, 
>`date_string_col` string,
>`string_col` string, 
>`timestamp_col` timestamp)   
>  PARTITIONED BY (   
>`year` int,  
>`month` int) 
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
>  WITH SERDEPROPERTIES ( 
>'escape.delim'='\\', 
>'field.delim'=',',   
>'serialization.format'=',')  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.mapred.TextInputFormat'   
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
>  TBLPROPERTIES (
>'DO_NOT_UPDATE_STATS'='true',
>'OBJCAPABILITIES'='EXTREAD,EXTWRITE',
>'STATS_GENERATED'='TASK',
>'impala.lastComputeStatsTime'='1608312793',  
>'transient_lastDdlTime'='1608310442');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,4,3434,5.4,44.3,'str1','str2', '01-01-2001');
> insert into alltypessmall partition(year=2002,month=1) values(1, true, 
> 3,3,40,3434,5.4,44.3,'str1','str2', '01-01-2001');
> {code}
> Following query should fail but it succeeds
> {code:sql}
> SELECT id FROM alltypessmall
> WHERE int_col =
>   (SELECT int_col
>FROM alltypessmall)
> ORDER BY id;
> {code}
> *Explain plan*
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)
>   DagName: vgarg_20210106115838_3fe73bf6-66c2-4281-92e8-fd75fd8ad400:17
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: alltypessmall
>   filterExpr: int_col is not null (type: boolean)
>   Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Filter Operator
> predicate: int_col is not null (type: boolean)
> Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: id (type: int), int_col (type: int)
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0
>   1
> outputColumnNames: _col0, _col1
> input vertices:
>   1 Reducer 4
> Statistics: Num rows: 3 Data size: 24 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map Join Operator
>  

[jira] [Resolved] (HIVE-24638) Redundant filter in scalar subquery

2021-04-05 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24638.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~soumyakanti.das]!

> Redundant filter in scalar subquery 
> 
>
> Key: HIVE-24638
> URL: https://issues.apache.org/jira/browse/HIVE-24638
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Soumyakanti Das
>Priority: Major
> Fix For: 4.0.0
>
>
> Look at the query and CBO plan in 
> https://issues.apache.org/jira/browse/HIVE-24595 .
> Note that there is a filter to guarantee that subquery returns only one row: 
> "HiveFilter(condition=[<=(sq_count_check($0), 1)])" . This condition is 
> redundant as either sq_count_check fails in runtime or condition is true for 
> all rows.
> Look at the stacktrace
> {code:java}
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqualOrLessThan.evaluate(GenericUDFOPEqualOrLessThan.java:111)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
>  at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:113)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1004)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1028)
> {code}
> GenericUDFOPEqualOrLessThan is redundant here as GenericUDFSQCountCheck does 
> the same check.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24966) RuntimeException in CBO if HMS stats are modified externally

2021-04-01 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24966 started by Jesus Camacho Rodriguez.
--
> RuntimeException in CBO if HMS stats are modified externally
> 
>
> Key: HIVE-24966
> URL: https://issues.apache.org/jira/browse/HIVE-24966
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While we want to expose this case so the user can take action, currently we 
> throw a RuntimeException. Rather than failing the query, it may be better to 
> show this information to the user and suggest recomputing stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24966) RuntimeException in CBO if HMS stats are modified externally

2021-04-01 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24966.

Fix Version/s: 4.0.0
   Resolution: Fixed

> RuntimeException in CBO if HMS stats are modified externally
> 
>
> Key: HIVE-24966
> URL: https://issues.apache.org/jira/browse/HIVE-24966
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While we want to expose this case so the user can take action, currently we 
> throw a RuntimeException. Rather than failing the query, it may be better to 
> show this information to the user and suggest recomputing stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24966) RuntimeException in CBO if HMS stats are modified externally

2021-03-31 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24966:
--


> RuntimeException in CBO if HMS stats are modified externally
> 
>
> Key: HIVE-24966
> URL: https://issues.apache.org/jira/browse/HIVE-24966
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> While we want to expose this case so the user can take action, currently we 
> throw a RuntimeException. Rather than failing the query, it may be better to 
> show this information to the user and suggest recomputing stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24886) Support simple equality operations between MAP/LIST/STRUCT data types

2021-03-30 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24886.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Support simple equality operations between MAP/LIST/STRUCT data types
> -
>
> Key: HIVE-24886
> URL: https://issues.apache.org/jira/browse/HIVE-24886
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning, Query Processor
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently equality operations between non primitive data types (MAP, LIST, 
> STRUCT) work in some very limited cases e.g:
> {code:sql}
> create table table_map_types (id int, c1 map, c2 map);
> select id from table_map_types where map(1,1) IN (map(1,1), map(1,2), 
> map(1,3)); 
> {code}
> but this feature was never introduced explicitly (zero tests & JIRAs around 
> the subject) and the vast majority of queries involving comparisons between 
> non primitive data types now fail at compile time.
> The goal of this issue is to support simple equality operations:
> * EQUALS(=)
> * NOT_EQUALS(<>),
> * IN,
> * IS DISTINCT FROM,
> * IS NOT DISTINCT FROM
> between MAP/LIST/STRUCT data types when the compared types are identical 
> (same type category and identical component types). The following examples 
> illustrate the idea of types being identical:
> {noformat}
> MAP EQUALS MAP OK
> MAP EQUALS MAP KO
> STRUCT EQUALS STRUCT KO
> STRUCT EQUALS STRUCT OK
> LIST EQUALS LIST OK
> LIST EQUALS LIST KO
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24934) VectorizedExpressions annotation is not needed in GenericUDFSQCountCheck

2021-03-29 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24934.

Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master, thanks [~soumyakanti.das]!

> VectorizedExpressions annotation is not needed in GenericUDFSQCountCheck
> 
>
> Key: HIVE-24934
> URL: https://issues.apache.org/jira/browse/HIVE-24934
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
> Fix For: 4.0.0
>
>
> This looks like a copy paste error, with a trivial fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24817) "not in" clause returns incorrect data when there is coercion

2021-03-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24817.

Fix Version/s: 4.0.0
 Assignee: Steve Carlin
   Resolution: Fixed

Pushed to master, thanks [~scarlin]!

> "not in" clause returns incorrect data when there is coercion
> -
>
> Key: HIVE-24817
> URL: https://issues.apache.org/jira/browse/HIVE-24817
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When the query has a where clause that has an integer column checking against 
> being "not in" a decimal column, the decimal column is being changed to null, 
> causing incorrect results.
> This is a sample query of a failure:
> select count(*) from my_tbl where int_col not in (355.8);
> Since the int_col can never be 355.8, one would expect all the rows to be 
> returned, but it is changing the 355.8 into a null value causing no rows to 
> be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24855) Introduce virtual colum ROW__IS__DELETED

2021-03-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24855:
---
Fix Version/s: 4.0.0

> Introduce virtual colum ROW__IS__DELETED
> 
>
> Key: HIVE-24855
> URL: https://issues.apache.org/jira/browse/HIVE-24855
> Project: Hive
>  Issue Type: New Feature
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24908) Adding Respect/Ignore nulls as a UDAF parameter is ambiguous

2021-03-23 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24908:
---
Fix Version/s: 4.0.0

> Adding Respect/Ignore nulls as a UDAF parameter is ambiguous
> 
>
> Key: HIVE-24908
> URL: https://issues.apache.org/jira/browse/HIVE-24908
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Both function calls translated to the same UDAF call:
> {code}
> SELECT lead(a, 2, true) ...
> SELECT lead(a, 2) IGNORE NULLS ...
> {code}
> IGNORE NULLS is passed as an extra constant boolean parameter to the UDAF
> https://github.com/apache/hive/blob/eed78dfdcb6dfc2de400397a60de12e6f62b96e2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L743
> However the semantics of the two function calls has different semantics:
> * *lead(a, 2, true)* - 'true' is the default value: "The value of DEFAULT is 
> returned as the result if there is no row corresponding to the OFFSET number 
> of rows before R within P (for the lag function) or after R within P (for the 
> lead function)"
> * *lead(a, 2) IGNORE NULLS* - For each row in the current window find the 2nd 
> not-NULL value starting directly after the current row. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24638) Redundant filter in scalar subquery

2021-03-21 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24638:
--

Assignee: Soumyakanti Das  (was: Vineet Garg)

> Redundant filter in scalar subquery 
> 
>
> Key: HIVE-24638
> URL: https://issues.apache.org/jira/browse/HIVE-24638
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Soumyakanti Das
>Priority: Major
>
> Look at the query and CBO plan in 
> https://issues.apache.org/jira/browse/HIVE-24595 .
> Note that there is a filter to guarantee that subquery returns only one row: 
> "HiveFilter(condition=[<=(sq_count_check($0), 1)])" . This condition is 
> redundant as either sq_count_check fails in runtime or condition is true for 
> all rows.
> Look at the stacktrace
> {code:java}
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqualOrLessThan.evaluate(GenericUDFOPEqualOrLessThan.java:111)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
>  at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:113)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1004)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1028)
> {code}
> GenericUDFOPEqualOrLessThan is redundant here as GenericUDFSQCountCheck does 
> the same check.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24868) Support specifying Respect/Ignore Nulls in function parameter list

2021-03-21 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24868:
---
Fix Version/s: 4.0.0

> Support specifying Respect/Ignore Nulls in function parameter list
> --
>
> Key: HIVE-24868
> URL: https://issues.apache.org/jira/browse/HIVE-24868
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> select last_value(b ignore nulls) over(partition by a order by b) from t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value

2021-03-11 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24865:
---
Fix Version/s: 4.0.0

> Implement Respect/Ignore Nulls in first/last_value
> --
>
> Key: HIVE-24865
> URL: https://issues.apache.org/jira/browse/HIVE-24865
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
>  ::=
> RESPECT NULLS | IGNORE NULLS
>  ::=
> [  treatment>
> ]
>  ::=
> FIRST_VALUE | LAST_VALUE
> {code}
> Example:
> {code:java}
> select last_value(b) ignore nulls over(partition by a order by b) from t1;
> {code}
> Existing non-standard implementation:
> {code:java}
> select last_value(b, true) over(partition by a order by b) from t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24863) Wrong property value in UDAF percentile_cont/disc description

2021-03-10 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24863:
---
Fix Version/s: 4.0.0

> Wrong property value in UDAF percentile_cont/disc description
> -
>
> Key: HIVE-24863
> URL: https://issues.apache.org/jira/browse/HIVE-24863
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-15757) Allow EXISTS/NOT EXISTS correlated subquery with aggregates

2021-03-09 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15757:
---
Fix Version/s: 4.0.0

> Allow EXISTS/NOT EXISTS correlated subquery with aggregates
> ---
>
> Key: HIVE-15757
> URL: https://issues.apache.org/jira/browse/HIVE-15757
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available, sub-query
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently HIVE doesn't allow correlated EXISTS/NOT EXISTS subqueries which 
> has aggregate without group by e.g.
> {code} select *
> from src b 
> where exists 
>   (select count(*) 
>   from src a 
>   where b.value = a.value  and a.key = b.key and a.value > 'val_9'
>   ) ;
> {code}
> Such queries could be rewritten to replace EXISTS/NOT EXISTS predicate with 
> {{true}} or {{false}} based on aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-16432) Improve plan for subquery with EXISTS

2021-03-08 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-16432.

Resolution: Duplicate

> Improve plan for subquery with EXISTS
> -
>
> Key: HIVE-16432
> URL: https://issues.apache.org/jira/browse/HIVE-16432
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Krisztian Kasa
>Priority: Major
>
> If an EXISTS/NOT EXISTS subquery contains an aggregate (with no group by or 
> windowing) it is guaranteed to produce at least one row. Since such subquery 
> in WHERE clause tests for existence of row it could be replaced by true 
> during compile time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-16432) Improve plan for subquery with EXISTS

2021-03-08 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-16432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-16432:
--

Assignee: Krisztian Kasa  (was: Vineet Garg)

> Improve plan for subquery with EXISTS
> -
>
> Key: HIVE-16432
> URL: https://issues.apache.org/jira/browse/HIVE-16432
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Krisztian Kasa
>Priority: Major
>
> If an EXISTS/NOT EXISTS subquery contains an aggregate (with no group by or 
> windowing) it is guaranteed to produce at least one row. Since such subquery 
> in WHERE clause tests for existence of row it could be replaced by true 
> during compile time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24199) Incorrect result when subquey in exists contains limit

2021-03-05 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24199:
---
Fix Version/s: 4.0.0

> Incorrect result when subquey in exists contains limit
> --
>
> Key: HIVE-24199
> URL: https://issues.apache.org/jira/browse/HIVE-24199
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:java}
> create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as 
> orc;
> insert into web_sales values
> (1, 1),
> (1, 2),
> (2, 1),
> (2, 2);
> select * from web_sales ws1
> where exists (select 1 from web_sales ws2 where ws1.ws_order_number = 
> ws2.ws_order_number limit 1);
> 1 1
> 1 2
> {code}
> {code:java}
> CBO PLAN:
> HiveSemiJoin(condition=[=($0, $2)], joinType=[semi])
>   HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
>   HiveProject(ws_order_number=[$0])
> HiveSortLimit(fetch=[1])  <-- This shouldn't be added
>   HiveProject(ws_order_number=[$0])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
> {code}
> Limit n on the right side of the join reduces the result set coming from the 
> right to only n record hence not all the ws_order_number values are included 
> which leads to correctness issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24685) Remove HiveSubQRemoveRelBuilder

2021-03-02 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24685:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Remove HiveSubQRemoveRelBuilder
> ---
>
> Key: HIVE-24685
> URL: https://issues.apache.org/jira/browse/HIVE-24685
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 4.0.0
>
>
> The class seems to be a close clone of {{RelBuilder}} created due to some 
> bugs existing in original implementation. Those issues seem to be fixed now 
> and we should be able to get rid of the copy. In the worst case scenario, if 
> we need to keep it for the time being, we could try to make it extend 
> {{RelBuilder}} and override only necessary methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24747) Backport HIVE-24569 to branch-3.1

2021-02-09 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24747.

Fix Version/s: 3.1.3
   Resolution: Fixed

> Backport HIVE-24569 to branch-3.1
> -
>
> Key: HIVE-24747
> URL: https://issues.apache.org/jira/browse/HIVE-24747
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24740) Invalid table alias or column reference: Can't order by an unselected column

2021-02-05 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24740:
--

Assignee: (was: Pengcheng Xiong)

> Invalid table alias or column reference: Can't order by an unselected column
> 
>
> Key: HIVE-24740
> URL: https://issues.apache.org/jira/browse/HIVE-24740
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleksiy Sayankin
>Priority: Blocker
>
> {code}
> CREATE TABLE t1 (column1 STRING);
> {code}
> {code}
> select substr(column1,1,4), avg(column1) from t1 group by substr(column1,1,4) 
> order by column1;
> {code}
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:87 Invalid table 
> alias or column reference 'column1': (possible column names are: _c0, _c1, 
> .(tok_function substr (tok_table_or_col column1) 1 4), .(tok_function avg 
> (tok_table_or_col column1)))
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5645)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5576)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.getOrderByExpression(CalcitePlanner.java:4326)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.beginGenOBLogicalPlan(CalcitePlanner.java:4230)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genOBLogicalPlan(CalcitePlanner.java:4136)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5326)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1864)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1810)
>   at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1571)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:562)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12538)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:315)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> 

[jira] [Resolved] (HIVE-24664) Support column aliases in Values clause

2021-02-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24664.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~kkasa]!

> Support column aliases in Values clause
> ---
>
> Key: HIVE-24664
> URL: https://issues.apache.org/jira/browse/HIVE-24664
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Enable explicitly specify column aliases in the first row of Values clause. 
> If not all the columns has alias specified generate one.
> {code:java}
> values(1, 2 b, 3 c),(4, 5, 6);
> {code}
> {code:java}
> _col1   b   c
>   1 2   3
>   4 5   6
> {code}
>  This is not an standard SQL feature but some database engines like Impala 
> supports it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-03 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23553.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~pgaref]! This was long overdue.

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

2021-01-27 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24564:
---
Fix Version/s: 4.0.0

> Extend PPD filter transitivity to be able to discover new opportunities
> ---
>
> Key: HIVE-24564
> URL: https://issues.apache.org/jira/browse/HIVE-24564
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> If a predicate references a value column of one of the parent ReduceSink 
> operators of a Join the predicate can not be copied and pushed down to the 
> other side of the join. However if we a parent equijoin exists in the branch 
> of the RS where 
>  1. the referenced value column is a key column of that join
>  2. and the other side of that join expression is the key column of the RS
>  the column in the predicate can be replaced and the new predicate can be 
> pushed down.
> {code:java}
>Join(... = wr_on)
>   / \
> ...  RS(key: wr_on)
>   |
>   Join(ws1.ws_on = ws2.ws_on)
>   (ws1.ws_on, ws2.ws_on, wr_on)
>   / \
>   RS(key:ws_on)  
> RS(key:ws_on)
> (value: wr_on)
>|  
>  |
>Join(ws1.ws_on = wr.wr_on)   
> TS(ws2)
>/\
>  RS(key:ws_on)  RS(key:wr_on)
>||
> TS(ws1)   TS(wr)
> {code}
> A predicate like
> {code}
> (wr_on in (...))
> {code}
> can not be pushed to TS(ws2) because wr_on is not a key column in 
> Join(ws1.ws_on = ws2.ws_on). But we know that wr_on is equals to ws_on 
> because the join from the left branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24685) Remove HiveSubQRemoveRelBuilder

2021-01-26 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272455#comment-17272455
 ] 

Jesus Camacho Rodriguez commented on HIVE-24685:


https://github.com/apache/hive/pull/1878

> Remove HiveSubQRemoveRelBuilder
> ---
>
> Key: HIVE-24685
> URL: https://issues.apache.org/jira/browse/HIVE-24685
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> The class seems to be a close clone of {{RelBuilder}} created due to some 
> bugs existing in original implementation. Those issues seem to be fixed now 
> and we should be able to get rid of the copy. In the worst case scenario, if 
> we need to keep it for the time being, we could try to make it extend 
> {{RelBuilder}} and override only necessary methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24685) Remove HiveSubQRemoveRelBuilder

2021-01-25 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24685:
---
Status: Patch Available  (was: Open)

> Remove HiveSubQRemoveRelBuilder
> ---
>
> Key: HIVE-24685
> URL: https://issues.apache.org/jira/browse/HIVE-24685
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> The class seems to be a close clone of {{RelBuilder}} created due to some 
> bugs existing in original implementation. Those issues seem to be fixed now 
> and we should be able to get rid of the copy. In the worst case scenario, if 
> we need to keep it for the time being, we could try to make it extend 
> {{RelBuilder}} and override only necessary methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24685) Remove HiveSubQRemoveRelBuilder

2021-01-25 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24685:
--


> Remove HiveSubQRemoveRelBuilder
> ---
>
> Key: HIVE-24685
> URL: https://issues.apache.org/jira/browse/HIVE-24685
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> The class seems to be a close clone of {{RelBuilder}} created due to some 
> bugs existing in original implementation. Those issues seem to be fixed now 
> and we should be able to get rid of the copy. In the worst case scenario, if 
> we need to keep it for the time being, we could try to make it extend 
> {{RelBuilder}} and override only necessary methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24633) Support CTE with column labels

2021-01-25 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24633.

Resolution: Fixed

Pushed to master, thanks [~kkasa]!

> Support CTE with column labels
> --
>
> Key: HIVE-24633
> URL: https://issues.apache.org/jira/browse/HIVE-24633
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> with cte1(a, b) as (select int_col x, bigint_col y from t1)
> select a, b from cte1{code}
> {code}
> a b
> 1 2
> 3 4
> {code}
> {code}
>  ::=
>   [  ] 
>   [  ] [  ] [  ]
>  ::=
>   WITH [ RECURSIVE ] 
>  ::=
>[ {   }... ]
>  ::=
>[]
>   AS  [  ]
>  ::=
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24633) Support CTE with column labels

2021-01-25 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24633:
---
Fix Version/s: 4.0.0

> Support CTE with column labels
> --
>
> Key: HIVE-24633
> URL: https://issues.apache.org/jira/browse/HIVE-24633
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> with cte1(a, b) as (select int_col x, bigint_col y from t1)
> select a, b from cte1{code}
> {code}
> a b
> 1 2
> 3 4
> {code}
> {code}
>  ::=
>   [  ] 
>   [  ] [  ] [  ]
>  ::=
>   WITH [ RECURSIVE ] 
>  ::=
>[ {   }... ]
>  ::=
>[]
>   AS  [  ]
>  ::=
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24646) Strict type checks are not enforced between bigints and doubles

2021-01-19 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24646.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Strict type checks are not enforced between bigints and doubles 
> 
>
> Key: HIVE-24646
> URL: https://issues.apache.org/jira/browse/HIVE-24646
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> When the {{hive.strict.checks.type.safety}} property is set to true, queries 
> with comparisons between bigints and doubles should fail according to the 
> description of the property. 
> At the moment a warning message is displayed in the console but the query 
> doesn't fail no matter the value of the property.
> {noformat}
> WARNING: Comparing a bigint and a double may result in a loss of precision.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24534) Prevent comparisons between characters and decimals types when strict checks enabled

2021-01-19 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24534.

Resolution: Fixed

> Prevent comparisons between characters and decimals types when strict checks 
> enabled
> 
>
> Key: HIVE-24534
> URL: https://issues.apache.org/jira/browse/HIVE-24534
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When we compare decimal and character types implicit conversions take place 
> that can lead to unexpected and surprising results. 
> {code:sql}
> create table t_str (str_col string);
> insert into t_str values ('1208925742523269458163819');select * from t_str 
> where str_col=1208925742523269479013976;
> {code}
> The SELECT query brings up one row while the filtering value is not the same 
> with the one present in the string column of the table. The problem is that 
> both types are converted to doubles and due to loss of precision the values 
> are deemed equal.
> Even if we change the implicit conversion to use another type (HIVE-24528) 
> there are always some cases that may lead to unexpected results. 
> The goal of this issue is to prevent comparisons between decimal and 
> character types when hive.strict.checks.type.safety is enabled and throw an 
> error. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders

2021-01-15 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266250#comment-17266250
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-24569 at 1/15/21, 5:59 PM:
--

[~prasanth_j], [~mustafaiman], could any of you review this patch? Thanks


was (Author: jcamachorodriguez):
[~prasanthj], [~mustafaiman], could any of you review this patch? Thanks

> LLAP daemon leaks file descriptors/log4j appenders
> --
>
> Key: HIVE-24569
> URL: https://issues.apache.org/jira/browse/HIVE-24569
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: llap-appender-gc-roots.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With HIVE-9756 query logs in LLAP are directed to different files (file per 
> query) using a Log4j2 routing appender. Without a purge policy in place, 
> appenders are created dynamically by the routing appender, one for each 
> query, and remain in memory forever. The dynamic appenders write to files so 
> each appender holds to a file descriptor. 
> Further work HIVE-14224 has mitigated the issue by introducing a custom 
> purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic 
> appenders (and closes the respective files) when the query is completed 
> (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion).
>  
> However, in the presence of multiple threads appending to the logs there are 
> race conditions. In an internal Hive cluster the number of file descriptors 
> started going up approx one descriptor leaking per query. After some 
> debugging it turns out that one thread (running the 
> QueryTracker#handleLogOnQueryCompletion) signals that the query has finished 
> and thus the purge policy should get rid of the respective appender (and 
> close the file) while another (Task-Executor-0) attempts to append another 
> log message for the same query. The initial appender is closed after the 
> request from the query tracker but a new one is created to accomodate the 
> message from the task executor and the latter is never removed thus creating 
> a leak. 
> Similar leaks have been identified and fixed for HS2 with the most similar 
> one being that described 
> [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041].
>  
> The problem relies on the timing of threads so it may not manifestate in all 
> versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via 
> lsof (or other similar command) with the following output:
> {noformat}
> # 1494391 is the PID of the LLAP daemon process
> ls -ltr /proc/1494391/fd
> ...
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> 
> 

[jira] [Commented] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders

2021-01-15 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266250#comment-17266250
 ] 

Jesus Camacho Rodriguez commented on HIVE-24569:


[~prasanthj], [~mustafaiman], could any of you review this patch? Thanks

> LLAP daemon leaks file descriptors/log4j appenders
> --
>
> Key: HIVE-24569
> URL: https://issues.apache.org/jira/browse/HIVE-24569
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: llap-appender-gc-roots.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With HIVE-9756 query logs in LLAP are directed to different files (file per 
> query) using a Log4j2 routing appender. Without a purge policy in place, 
> appenders are created dynamically by the routing appender, one for each 
> query, and remain in memory forever. The dynamic appenders write to files so 
> each appender holds to a file descriptor. 
> Further work HIVE-14224 has mitigated the issue by introducing a custom 
> purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic 
> appenders (and closes the respective files) when the query is completed 
> (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion).
>  
> However, in the presence of multiple threads appending to the logs there are 
> race conditions. In an internal Hive cluster the number of file descriptors 
> started going up approx one descriptor leaking per query. After some 
> debugging it turns out that one thread (running the 
> QueryTracker#handleLogOnQueryCompletion) signals that the query has finished 
> and thus the purge policy should get rid of the respective appender (and 
> close the file) while another (Task-Executor-0) attempts to append another 
> log message for the same query. The initial appender is closed after the 
> request from the query tracker but a new one is created to accomodate the 
> message from the task executor and the latter is never removed thus creating 
> a leak. 
> Similar leaks have been identified and fixed for HS2 with the most similar 
> one being that described 
> [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041].
>  
> The problem relies on the timing of threads so it may not manifestate in all 
> versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via 
> lsof (or other similar command) with the following output:
> {noformat}
> # 1494391 is the PID of the LLAP daemon process
> ls -ltr /proc/1494391/fd
> ...
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121744_fcf37921-4351-4368-95ee-b5be2592d89a-dag_1608659125567_0008_195.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 979 -> 
> 

[jira] [Commented] (HIVE-24638) Redundant filter in scalar subquery

2021-01-14 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265609#comment-17265609
 ] 

Jesus Camacho Rodriguez commented on HIVE-24638:


{quote}
Other idea was to replace this filter with project
{quote}
Yes, I discussed this with [~mustafaiman] but the problem would be that the 
trimmer (or some rules) may remove that project column if is not referenced in 
subsequent operators in the plan?

> Redundant filter in scalar subquery 
> 
>
> Key: HIVE-24638
> URL: https://issues.apache.org/jira/browse/HIVE-24638
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Vineet Garg
>Priority: Major
>
> Look at the query and CBO plan in 
> https://issues.apache.org/jira/browse/HIVE-24595 .
> Note that there is a filter to guarantee that subquery returns only one row: 
> "HiveFilter(condition=[<=(sq_count_check($0), 1)])" . This condition is 
> redundant as either sq_count_check fails in runtime or condition is true for 
> all rows.
> Look at the stacktrace
> {code:java}
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqualOrLessThan.evaluate(GenericUDFOPEqualOrLessThan.java:111)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
>  at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:113)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1004)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1028)
> {code}
> GenericUDFOPEqualOrLessThan is redundant here as GenericUDFSQCountCheck does 
> the same check.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24638) Redundant filter in scalar subquery

2021-01-14 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265606#comment-17265606
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-24638 at 1/15/21, 12:34 AM:
---

[~vgarg], thoughts? This should not be too difficult, {{sq\_count\_check}} 
would return a boolean rather than the value itself? Apparently this has an 
impact on vectorization ([~mustafaiman] can add further details).

Cc [~scarlin]


was (Author: jcamachorodriguez):
[~vgarg], thoughts? This should not be too difficult, {{sq_count_check }} would 
return a boolean rather than the value itself? Apparently this has an impact on 
vectorization ([~mustafaiman] can add further details).

Cc [~scarlin]

> Redundant filter in scalar subquery 
> 
>
> Key: HIVE-24638
> URL: https://issues.apache.org/jira/browse/HIVE-24638
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Priority: Major
>
> Look at the query and CBO plan in 
> https://issues.apache.org/jira/browse/HIVE-24595 .
> Note that there is a filter to guarantee that subquery returns only one row: 
> "HiveFilter(condition=[<=(sq_count_check($0), 1)])" . This condition is 
> redundant as either sq_count_check fails in runtime or condition is true for 
> all rows.
> Look at the stacktrace
> {code:java}
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqualOrLessThan.evaluate(GenericUDFOPEqualOrLessThan.java:111)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
>  at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:113)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1004)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1028)
> {code}
> GenericUDFOPEqualOrLessThan is redundant here as GenericUDFSQCountCheck does 
> the same check.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24638) Redundant filter in scalar subquery

2021-01-14 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265606#comment-17265606
 ] 

Jesus Camacho Rodriguez commented on HIVE-24638:


[~vgarg], thoughts? This should not be too difficult, {{sq_count_check }} would 
return a boolean rather than the value itself? Apparently this has an impact on 
vectorization ([~mustafaiman] can add further details).

Cc [~scarlin]

> Redundant filter in scalar subquery 
> 
>
> Key: HIVE-24638
> URL: https://issues.apache.org/jira/browse/HIVE-24638
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Priority: Major
>
> Look at the query and CBO plan in 
> https://issues.apache.org/jira/browse/HIVE-24595 .
> Note that there is a filter to guarantee that subquery returns only one row: 
> "HiveFilter(condition=[<=(sq_count_check($0), 1)])" . This condition is 
> redundant as either sq_count_check fails in runtime or condition is true for 
> all rows.
> Look at the stacktrace
> {code:java}
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFSQCountCheck.evaluate(GenericUDFSQCountCheck.java:70)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqualOrLessThan.evaluate(GenericUDFOPEqualOrLessThan.java:111)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
>  at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:113)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1004)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1028)
> {code}
> GenericUDFOPEqualOrLessThan is redundant here as GenericUDFSQCountCheck does 
> the same check.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24613) Support Values clause without Insert

2021-01-14 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24613:
---
Fix Version/s: 4.0.0

> Support Values clause without Insert
> 
>
> Key: HIVE-24613
> URL: https://issues.apache.org/jira/browse/HIVE-24613
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standalone:
> {code}
> VALUES(1,2,3),(4,5,6);
> {code}
> {code}
> 1 2   3
> 4 5   6
> {code}
> In subquery:
> {code}
> SELECT * FROM (VALUES(1,2,3),(4,5,6)) as FOO;
> {code}
> {code}
> 1 2   3
> 4 5   6
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24588) Run tests using specific log4j2 configuration conveniently

2021-01-11 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24588.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Run tests using specific log4j2 configuration conveniently
> --
>
> Key: HIVE-24588
> URL: https://issues.apache.org/jira/browse/HIVE-24588
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In order to reproduce a problem (e.g., HIVE-24569) or validate that a log4j2 
> configuration is working as expected it is necessary to run a test and 
> explicitly specify which configuration should be used. Moreover, after the 
> end of the test in question it is desirable to restore the old logging 
> configuration that was used before launching the test to avoid affecting the 
> overall logging output.
> The goal of this issue is to introduce a convenient & declarative way of 
> running tests with log4j2 configurations based on Jupiter extensions and 
> annotations. The test could like below:
> {code:java}
>   @Test
>   @Log4jConfig("test-log4j2.properties")
>   void testUseExplicitConfig() {
> // Do something and assert
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24027) Add support for `intersect` keyword in MV

2021-01-07 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24027:
--

Assignee: Krisztian Kasa

> Add support for `intersect` keyword in MV
> -
>
> Key: HIVE-24027
> URL: https://issues.apache.org/jira/browse/HIVE-24027
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> {noformat}
> explain create materialized view mv as  select distinct c_last_name, 
> c_first_name, d_date
> from store_sales, date_dim, customer
>   where store_sales.ss_sold_date_sk = date_dim.d_date_sk
>   and store_sales.ss_customer_sk = customer.c_customer_sk
>   and d_month_seq between 1186 and 1186 + 11
>   intersect
> select distinct c_last_name, c_first_name, d_date
> from catalog_sales, date_dim, customer
>   where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
>   and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
>   and d_month_seq between 1186 and 1186 + 11
>   intersect
> select distinct c_last_name, c_first_name, d_date
> from web_sales, date_dim, customer
>   where web_sales.ws_sold_date_sk = date_dim.d_date_sk
>   and web_sales.ws_bill_customer_sk = customer.c_customer_sk
>   and d_month_seq between 1186 and 1186 + 11
> {noformat}
> This query fails with the following error msg
> {noformat}
> Error: Error while compiling statement: FAILED: SemanticException Cannot 
> enable automatic rewriting for materialized view. Statement has unsupported 
> operator: intersect. (state=42000,code=4)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24027) Add support for `intersect` keyword in MV

2021-01-07 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24027.

Fix Version/s: 4.0.0
   Resolution: Fixed

This has been fixed in HIVE-24274.

> Add support for `intersect` keyword in MV
> -
>
> Key: HIVE-24027
> URL: https://issues.apache.org/jira/browse/HIVE-24027
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Priority: Major
> Fix For: 4.0.0
>
>
> {noformat}
> explain create materialized view mv as  select distinct c_last_name, 
> c_first_name, d_date
> from store_sales, date_dim, customer
>   where store_sales.ss_sold_date_sk = date_dim.d_date_sk
>   and store_sales.ss_customer_sk = customer.c_customer_sk
>   and d_month_seq between 1186 and 1186 + 11
>   intersect
> select distinct c_last_name, c_first_name, d_date
> from catalog_sales, date_dim, customer
>   where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
>   and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
>   and d_month_seq between 1186 and 1186 + 11
>   intersect
> select distinct c_last_name, c_first_name, d_date
> from web_sales, date_dim, customer
>   where web_sales.ws_sold_date_sk = date_dim.d_date_sk
>   and web_sales.ws_bill_customer_sk = customer.c_customer_sk
>   and d_month_seq between 1186 and 1186 + 11
> {noformat}
> This query fails with the following error msg
> {noformat}
> Error: Error while compiling statement: FAILED: SemanticException Cannot 
> enable automatic rewriting for materialized view. Statement has unsupported 
> operator: intersect. (state=42000,code=4)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24566) Add Parquet Stats Optimization

2020-12-24 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254633#comment-17254633
 ] 

Jesus Camacho Rodriguez commented on HIVE-24566:


[~belugabehr], yes, I think this approach could potentially improve performance 
for such queries. I guess you referred to 'single multi-threaded processor' to 
avoid launching any jobs to compute these queries. For tables with a large 
number of files, computing from metadata even if jobs are launched, would still 
be a useful optimization.

> Add  Parquet Stats Optimization
> ---
>
> Key: HIVE-24566
> URL: https://issues.apache.org/jira/browse/HIVE-24566
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Major
>
> Parquet files store min/max/count data in foot metadata.
> When a query is submitted to a Parquet table, and stats are not available, 
> Hive should launch a single multi-threaded processor that simply reads the 
> meta data of each Parquet file instead of walking through every single record 
> in the table. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

2020-12-23 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254362#comment-17254362
 ] 

Jesus Camacho Rodriguez commented on HIVE-24564:


[~kkasa], to understand clearly what you are trying to achieve, you would like 
to end up with a predicate {{ws2.ws_on IN RS(key:wr_on)}} on top of 
{{TS(ws2)}}? I guess {{wr.wr_on IN RS(key:ws_on)}} (both from {{TS(ws1)}} and 
{{TS(ws2)}}) are already generated with current logic?

> Extend PPD filter transitivity to be able to discover new opportunities
> ---
>
> Key: HIVE-24564
> URL: https://issues.apache.org/jira/browse/HIVE-24564
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a predicate references a value column of one of the parent ReduceSink 
> operators of a Join the predicate can not be copied and pushed down to the 
> other side of the join. However if we a parent equijoin exists in the branch 
> of the RS where 
>  1. the referenced value column is a key column of that join
>  2. and the other side of that join expression is the key column of the RS
>  the column in the predicate can be replaced and the new predicate can be 
> pushed down.
> {code:java}
>Join(... = wr_on)
>   / \
> ...  RS(key: wr_on)
>   |
>   Join(ws1.ws_on = ws2.ws_on)
>   (ws1.ws_on, ws2.ws_on, wr_on)
>   / \
>   RS(key:ws_on)  
> RS(key:ws_on)
> (value: wr_on)
>|  
>  |
>Join(ws1.ws_on = wr.wr_on)   
> TS(ws2)
>/\
>  RS(key:ws_on)  RS(key:wr_on)
>||
> TS(ws1)   TS(wr)
> {code}
> A predicate like
> {code}
> (wr_on in (...))
> {code}
> can not be pushed to TS(ws2) because wr_on is not a key column in 
> Join(ws1.ws_on = ws2.ws_on). But we know that wr_on is equals to ws_on 
> because the join from the left branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24556) Optimize DefaultGraphWalker for case when node has no grandchildren

2020-12-22 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24556.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks for your contribution [~jfs]!

> Optimize DefaultGraphWalker for case when node has no grandchildren
> ---
>
> Key: HIVE-24556
> URL: https://issues.apache.org/jira/browse/HIVE-24556
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Investigating query with large IN clause with constant strings (100k+) taking 
> significant time during compilation revealed a possible optimization within 
> DefaultGraphWalker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23987) Upgrade arrow version to 0.11.0

2020-12-17 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251388#comment-17251388
 ] 

Jesus Camacho Rodriguez commented on HIVE-23987:


[~b.maidics], [~ShubhamChaurasia], any news around this JIRA? Do you plan to 
push this forward? Thanks

> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2020-12-16 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250318#comment-17250318
 ] 

Jesus Camacho Rodriguez commented on HIVE-24519:


{quote}
In this test an MV is created with rewriting.time.window=5min. After that an 
insert executed on one of its source tables but the MV is considered to be up 
to date because of no timeout when rebuild is requested. Also the query 
rewritten to use the MV returns less record than the query with the original 
plan would return.
{quote}
[~kkasa], that should not be the behavior. For rebuild purposes, whether an MV 
is outdated or not should be determined using only the write id lists for the 
tables it uses.

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> ---
>
> Key: HIVE-24519
> URL: https://issues.apache.org/jira/browse/HIVE-24519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24527) Allow triggering materialized view rewriting for external tables

2020-12-14 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24527:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Allow triggering materialized view rewriting for external tables
> 
>
> Key: HIVE-24527
> URL: https://issues.apache.org/jira/browse/HIVE-24527
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Although we will not be able to check data staleness, this can be useful for 
> debugging purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24527) Allow triggering materialized view rewriting for external tables

2020-12-11 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24527:
---
Status: Patch Available  (was: Open)

> Allow triggering materialized view rewriting for external tables
> 
>
> Key: HIVE-24527
> URL: https://issues.apache.org/jira/browse/HIVE-24527
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Although we will not be able to check data staleness, this can be useful for 
> debugging purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24527) Allow triggering materialized view rewriting for external tables

2020-12-11 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24527:
--


> Allow triggering materialized view rewriting for external tables
> 
>
> Key: HIVE-24527
> URL: https://issues.apache.org/jira/browse/HIVE-24527
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Although we will not be able to check data staleness, this can be useful for 
> debugging purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2020-12-10 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24519:
--

Assignee: Krisztian Kasa

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> ---
>
> Key: HIVE-24519
> URL: https://issues.apache.org/jira/browse/HIVE-24519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24489) TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL metastore table

2020-12-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24489.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> TPC-DS dockerized tests fail due to stale entries in MIN_HISTORY_LEVEL 
> metastore table
> --
>
> Key: HIVE-24489
> URL: https://issues.apache.org/jira/browse/HIVE-24489
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failures can be seen here:
> [http://ci.hive.apache.org/job/hive-precommit/job/master/373/]
> The root cause is stale entries inside {{MIN_HISTORY_LEVEL}} table.
> {noformat}
> Caused by: MetaException(message:Unable to select from transaction database 
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique 
> constraint "min_history_level_pkey"
>  Detail: Key ("MHL_TXNID")=(7858) already exists.{noformat}
> The content of the respective table inside the docker image is shown below.
> {noformat}
> SELECT * FROM "MIN_HISTORY_LEVEL" ;
>  MHL_TXNID | MHL_MIN_OPEN_TXNID 
>  --+---
>  6853 | 6687
>  7480 | 6947
>  7481 | 6947
>  6870 | 6687
>  7858 | 7858
>  6646 | 5946
>  7397 | 6947
>  7399 | 6947
>  5946 | 5946
>  6947 | 6947
>  7769 | 6947{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-12-02 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24453:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for the review [~kkasa]!

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24144:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
> 
>
> Key: HIVE-24144
> URL: https://issues.apache.org/jira/browse/HIVE-24144
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
>   public String getIdentifierQuoteString() throws SQLException {
> return " ";
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24453:
---
Status: Patch Available  (was: Open)

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24453) DirectSQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24453:
---
Description: 
HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
Although the value for that field is always set after that patch, the value 
could be null if the database was created before the feature went in. DirectSQL 
should check for null value before parsing the integer, otherwise we hit an 
exception and fallback to ORM path:
{code}
2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
[pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
is not an error): null at 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
 at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
 at 
org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
{code}

  was:
HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
Although the value for that field is always set after that patch, the value 
could be null if the database was created before the feature went in. DirectSQL 
should check for null value before parsing the integer, otherwise we hit an 
exception and fallback to ORM path:
{noformat}
2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
[pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
is not an error): null at 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
 at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
 at 
org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
{noformat}


> DirectSQL error when parsing create_time value for database
> ---
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24453:
---
Summary: Direct SQL error when parsing create_time value for database  
(was: DirectSQL error when parsing create_time value for database)

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24453) DirectSQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24453:
--


> DirectSQL error when parsing create_time value for database
> ---
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {noformat}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24408) Upgrade Parquet to 1.11.1

2020-11-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24408.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks for your contribution [~csun]!

> Upgrade Parquet to 1.11.1
> -
>
> Key: HIVE-24408
> URL: https://issues.apache.org/jira/browse/HIVE-24408
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Parquet 1.11.1 has some bug fixes so Hive should consider to upgrade to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-12587) Support to add partitioned data set to TestPerfCliDriver

2020-11-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-12587.

Fix Version/s: 4.0.0
 Assignee: Stamatis Zampetakis
   Resolution: Fixed

> Support to add partitioned data set to TestPerfCliDriver
> 
>
> Key: HIVE-12587
> URL: https://issues.apache.org/jira/browse/HIVE-12587
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-12587) Support to add partitioned data set to TestPerfCliDriver

2020-11-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reopened HIVE-12587:


> Support to add partitioned data set to TestPerfCliDriver
> 
>
> Key: HIVE-12587
> URL: https://issues.apache.org/jira/browse/HIVE-12587
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-12587) Support to add partitioned data set to TestPerfCliDriver

2020-11-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-12587.

Resolution: Duplicate

> Support to add partitioned data set to TestPerfCliDriver
> 
>
> Key: HIVE-12587
> URL: https://issues.apache.org/jira/browse/HIVE-12587
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23965.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests

2020-11-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23742.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Remove unintentional execution of TPC-DS query39 in qtests
> --
>
> Key: HIVE-23742
> URL: https://issues.apache.org/jira/browse/HIVE-23742
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TPC-DS queries under clientpositive/perf are meant only to check plan 
> regressions so they should never be really executed thus the execution part 
> should be removed from query39.q and cbo_query39.q



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24395) Intermittent failures to initialize dockerized Postgres metastore in tests

2020-11-24 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24395.

Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Intermittent failures to initialize dockerized Postgres metastore in tests
> --
>
> Key: HIVE-24395
> URL: https://issues.apache.org/jira/browse/HIVE-24395
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Minor
> Fix For: 4.0.0
>
>
> In some cases tests relying on a dockerized Postgres metastore (see [Postgres 
> JUnit 
> rule|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/Postgres.java])
>  fail to establish a connection with the database; as a consequence the 
> HikariPool cannot be created and after multiple failed attempts the test 
> fails.
> The following exception appears in the logs indicating a problem with reading 
> from the socket. 
> {noformat}
> 2020-11-16T15:52:03,075 DEBUG [main] pool.HikariPool: HikariPool-1 - Cannot 
> acquire connection from data source
> org.postgresql.util.PSQLException: The connection attempt failed.
>   at 
> org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:297)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
>  ~[postgresql-42.2.14.jar:42.2.14]
>   at org.postgresql.jdbc.PgConnection.(PgConnection.java:217) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at org.postgresql.Driver.makeConnection(Driver.java:458) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at org.postgresql.Driver.connect(Driver.java:260) 
> ~[postgresql-42.2.14.jar:42.2.14]
>   at 
> com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
>  ~[HikariCP-2.6.1.jar:?]
>   at 
> com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
>  ~[HikariCP-2.6.1.jar:?]
>   at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:356) 
> ~[HikariCP-2.6.1.jar:?]
>   at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:199) 
> ~[HikariCP-2.6.1.jar:?]
>   at 
> com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:444) 
> ~[HikariCP-2.6.1.jar:?]
>   at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:515) 
> ~[HikariCP-2.6.1.jar:?]
>   at com.zaxxer.hikari.pool.HikariPool.(HikariPool.java:112) 
> ~[HikariCP-2.6.1.jar:?]
>   at com.zaxxer.hikari.HikariDataSource.(HikariDataSource.java:72) 
> ~[HikariCP-2.6.1.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.datasource.HikariCPDataSourceProvider.create(HikariCPDataSourceProvider.java:87)
>  ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.PersistenceManagerProvider.initPMF(PersistenceManagerProvider.java:235)
>  ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.PersistenceManagerProvider.lambda$updatePmfProperties$0(PersistenceManagerProvider.java:212)
>  ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.PersistenceManagerProvider.retry(PersistenceManagerProvider.java:521)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.PersistenceManagerProvider.updatePmfProperties(PersistenceManagerProvider.java:212)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:334) 
> [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) 
> [hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) 
> [hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:59) 
> [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:866)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:834)
>  

[jira] [Updated] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-19 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24387:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Metastore access through JDBC handler does not use correct database accessor
> 
>
> Key: HIVE-24387
> URL: https://issues.apache.org/jira/browse/HIVE-24387
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is some differences in the SQL syntax for each RDBMS generated by the 
> database accessor. For metastore, we always end up with the default accessor, 
> which lead to errors, e.g., when a limit query is executed for a 
> Postgres-backed metastore.
> {code}
> Error: java.io.IOException: java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: ERROR: syntax error at or near "{"
> Position: 200 (state=,code=0)
> SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
> "GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", 
> "TBL_COL_PRIV", "TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
> {LIMIT 1}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >