[jira] [Updated] (HIVE-14980) Minor compaction when triggered simultaniously on the same table/partition deletes data

2016-10-15 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14980:
---
Component/s: Transactions

> Minor compaction when triggered simultaniously on the same table/partition 
> deletes data
> ---
>
> Key: HIVE-14980
> URL: https://issues.apache.org/jira/browse/HIVE-14980
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 2.1.0
>Reporter: Mahipal Jupalli
>Assignee: Mahipal Jupalli
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I have two tables (TABLEA, TABLEB). If I manually trigger compaction after 
> each INSERT into TABLEB from TABLEA, compactions are triggered on random 
> metastore asynchronously and are stepping on each other which is causing the 
> data to be deleted.
> Example here: 
> TABLEA - has 10k rows. 
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> Once all the compactions are complete, I should ideally see 20k rows in 
> TABLEB. But I see only 10k rows (Only the rows INSERTED before the last 
> compaction persist, the old rows are deleted. I believe the old delta files 
> are deleted). 
> To further confirm the bug, if I do only one compaction after two inserts, I 
> see 20k rows in TABLEB.
> Proposed Fix:
> I have identified the bug in the code, it requires an additional check in the 
> org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active 
> compactions on the table/partition. I will 'share the details of the fix once 
> I test it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14980) Minor compaction when triggered simultaniously on the same table/partition deletes data

2016-10-15 Thread Mahipal Jupalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahipal Jupalli updated HIVE-14980:
---
Description: 
I have two tables (TABLEA, TABLEB). If I manually trigger compaction after each 
INSERT into TABLEB from TABLEA, compactions are triggered on random metastore 
asynchronously and are stepping on each other which is causing the data to be 
deleted.

Example here: 
TABLEA - has 10k rows. 

insert into mj.tableb select * from mj.tablea;
alter table mj.tableb compact 'MINOR';
insert into mj.tableb select * from mj.tablea;
alter table mj.tableb compact 'MINOR';

Once all the compactions are complete, I should ideally see 20k rows in TABLEB. 
But I see only 10k rows (Only the rows INSERTED before the last compaction 
persist, the old rows are deleted. I believe the old delta files are deleted). 

To further confirm the bug, if I do only one compaction after two inserts, I 
see 20k rows in TABLEB.

Proposed Fix:
I have identified the bug in the code, it requires an additional check in the 
org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active 
compactions on the table/partition. I will 'share the details of the fix once I 
test it.

  was:
I have two tables (TABLEA, TABLEB). If I manually trigger compaction after each 
INSERT into TABLEB from TABLEA, compactions are triggered on random metastore 
asynchronously and are stepping on each other which is causing the data to be 
deleted.

Example here: 
TABLEA - has 10k rows. 

insert into mj.tableb select * from mj.tablea;
alter table mj.tableb compact 'MINOR';
insert into mj.tableb select * from mj.tablea;
alter table mj.tableb compact 'MINOR';

Once all the compactions are complete, I should ideally see 20k rows in the 
table. But I see only 10k rows (Only the rows INSERTED before the last 
compaction persist, the old rows are deleted. I believe the old delta files are 
deleted). 

To further confirm the bug, if I do only one compaction after two inserts, I 
see 20k rows in TABLEB.

Proposed Fix:
I have identified the bug in the code, it requires an additional check in the 
org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active 
compactions on the table/partition. I will 'share the details of the fix once I 
test it.


> Minor compaction when triggered simultaniously on the same table/partition 
> deletes data
> ---
>
> Key: HIVE-14980
> URL: https://issues.apache.org/jira/browse/HIVE-14980
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Mahipal Jupalli
>Assignee: Mahipal Jupalli
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I have two tables (TABLEA, TABLEB). If I manually trigger compaction after 
> each INSERT into TABLEB from TABLEA, compactions are triggered on random 
> metastore asynchronously and are stepping on each other which is causing the 
> data to be deleted.
> Example here: 
> TABLEA - has 10k rows. 
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> Once all the compactions are complete, I should ideally see 20k rows in 
> TABLEB. But I see only 10k rows (Only the rows INSERTED before the last 
> compaction persist, the old rows are deleted. I believe the old delta files 
> are deleted). 
> To further confirm the bug, if I do only one compaction after two inserts, I 
> see 20k rows in TABLEB.
> Proposed Fix:
> I have identified the bug in the code, it requires an additional check in the 
> org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active 
> compactions on the table/partition. I will 'share the details of the fix once 
> I test it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14980) Minor compaction when triggered simultaniously on the same table/partition deletes data

2016-10-15 Thread Mahipal Jupalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahipal Jupalli updated HIVE-14980:
---
Labels:   (was: newbie patch-pending)

> Minor compaction when triggered simultaniously on the same table/partition 
> deletes data
> ---
>
> Key: HIVE-14980
> URL: https://issues.apache.org/jira/browse/HIVE-14980
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Mahipal Jupalli
>Assignee: Mahipal Jupalli
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I have two tables (TABLEA, TABLEB). If I manually trigger compaction after 
> each INSERT into TABLEB from TABLEA, compactions are triggered on random 
> metastore asynchronously and are stepping on each other which is causing the 
> data to be deleted.
> Example here: 
> TABLEA - has 10k rows. 
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> Once all the compactions are complete, I should ideally see 20k rows in the 
> table. But I see only 10k rows (Only the rows INSERTED before the last 
> compaction persist, the old rows are deleted. I believe the old delta files 
> are deleted). 
> To further confirm the bug, if I do only one compaction after two inserts, I 
> see 20k rows in TABLEB.
> Proposed Fix:
> I have identified the bug in the code, it requires an additional check in the 
> org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active 
> compactions on the table/partition. I will 'share the details of the fix once 
> I test it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)