[jira] [Comment Edited] (HIVE-23143) Transactions: PPD in Delete deltas is broken

2020-04-06 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076431#comment-17076431
 ] 

Peter Vary edited comment on HIVE-23143 at 4/6/20, 4:02 PM:


Good point [~asomani]!
Currently we suggest the customers to try to avoid using "reserved" column 
names, but I absolutely agree that this is not a good solution. I you have a 
good idea/patch, I would be happy to review.

Thanks,
Peter


was (Author: pvary):
Good point [~asomani]! Any ideas how to solve this?

> Transactions: PPD in Delete deltas is broken
> 
>
> Key: HIVE-23143
> URL: https://issues.apache.org/jira/browse/HIVE-23143
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Abhishek Somani
>Priority: Major
>
> The optimization introduced in HIVE-16812 seems broken. PPD is not happening 
> for delete deltas, and in fact, also causes wrong results if data column 
> names conflict with ACID ROW__ID column names (bucket, originalTransactionId 
> etc).
> This seems to be happening because after ORC-491, all PPD happens in data 
> columns only for ACID orc files, so the filters for delete PPD never get 
> applied on metadata columns and try to apply to data columns instead. And 
> when the data columns have a column name (like "bucket" in the below 
> example), it returns wrong results. 
> Steps to repro:
> {code:java}
> set hive.fetch.task.conversion=none;
> set hive.query.results.cache.enabled=false;
> create table test(a int, bucket int) stored as orc 
> tblproperties("transactional"="true");
> insert into table test values (1, ), (2, ), (3, );
> delete from test where a = 2;
> select * from test; //Will return the deleted row as well
> set hive.txn.filter.delete.events=false;
> select * from test; //Correct results returned. Will not return the deleted 
> row
> {code}
> cc [~pvary] [~gopalv]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23143) Transactions: PPD in Delete deltas is broken

2020-04-06 Thread Abhishek Somani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076404#comment-17076404
 ] 

Abhishek Somani edited comment on HIVE-23143 at 4/6/20, 3:21 PM:
-

[~pvary] it doesn't help. I tested on current master branch (as of last week) 
that has the HIVE-22880 patch.

It doesn't help presumably because what HIVE-22880 is saying is ignore all 
SARGs for *data* columns. But the issue is *metadata* column SARGs (ie like on 
transactionids, bucket etc) when applied (even after ignoring *data* column 
sargs), are applied incorrectly.


was (Author: asomani):
[~pvary] it doesn't help. I tested on current master branch (as of last week) 
that has the HIVE-22880 patch.

It doesn't help presumably because what HIVE-22880 is saying is ignore all 
SARGs for *data* columns. But the issue is *metadata* column SARGs (ie like on 
transactionids, bucket etc) when applied, are applied incorrectly.

> Transactions: PPD in Delete deltas is broken
> 
>
> Key: HIVE-23143
> URL: https://issues.apache.org/jira/browse/HIVE-23143
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Abhishek Somani
>Priority: Major
>
> The optimization introduced in HIVE-16812 seems broken. PPD is not happening 
> for delete deltas, and in fact, also causes wrong results if data column 
> names conflict with ACID ROW__ID column names (bucket, originalTransactionId 
> etc).
> This seems to be happening because after ORC-491, all PPD happens in data 
> columns only for ACID orc files, so the filters for delete PPD never get 
> applied on metadata columns and try to apply to data columns instead. And 
> when the data columns have a column name (like "bucket" in the below 
> example), it returns wrong results. 
> Steps to repro:
> {code:java}
> set hive.fetch.task.conversion=none;
> set hive.query.results.cache.enabled=false;
> create table test(a int, bucket int) stored as orc 
> tblproperties("transactional"="true");
> insert into table test values (1, ), (2, ), (3, );
> delete from test where a = 2;
> select * from test; //Will return the deleted row as well
> set hive.txn.filter.delete.events=false;
> select * from test; //Correct results returned. Will not return the deleted 
> row
> {code}
> cc [~pvary] [~gopalv]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23143) Transactions: PPD in Delete deltas is broken

2020-04-06 Thread Abhishek Somani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076404#comment-17076404
 ] 

Abhishek Somani edited comment on HIVE-23143 at 4/6/20, 3:21 PM:
-

[~pvary] it doesn't help. I tested on current master branch (as of last week) 
that has the HIVE-22880 patch.

It doesn't help presumably because what HIVE-22880 is saying is ignore all 
SARGs for *data* columns. But the issue is *metadata* column SARGs (ie like on 
transactionids, bucket etc) when applied, are applied incorrectly.


was (Author: asomani):
[~pvary] it doesn't help. I tested on current master branch (as of last week) 
that has the HIVE-22880 patcg.

It doesn't help presumably because what HIVE-22880 is saying is ignore all 
SARGs for *data* columns. But the issue is *metadata* column SARGs (ie like on 
transactionids, bucket etc) when applied, are applied incorrectly.

> Transactions: PPD in Delete deltas is broken
> 
>
> Key: HIVE-23143
> URL: https://issues.apache.org/jira/browse/HIVE-23143
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Abhishek Somani
>Priority: Major
>
> The optimization introduced in HIVE-16812 seems broken. PPD is not happening 
> for delete deltas, and in fact, also causes wrong results if data column 
> names conflict with ACID ROW__ID column names (bucket, originalTransactionId 
> etc).
> This seems to be happening because after ORC-491, all PPD happens in data 
> columns only for ACID orc files, so the filters for delete PPD never get 
> applied on metadata columns and try to apply to data columns instead. And 
> when the data columns have a column name (like "bucket" in the below 
> example), it returns wrong results. 
> Steps to repro:
> {code:java}
> set hive.fetch.task.conversion=none;
> set hive.query.results.cache.enabled=false;
> create table test(a int, bucket int) stored as orc 
> tblproperties("transactional"="true");
> insert into table test values (1, ), (2, ), (3, );
> delete from test where a = 2;
> select * from test; //Will return the deleted row as well
> set hive.txn.filter.delete.events=false;
> select * from test; //Correct results returned. Will not return the deleted 
> row
> {code}
> cc [~pvary] [~gopalv]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)