from:"Eugene Koifman \(JIRA\)"

[jira] [Comment Edited] (HIVE-19124) implement a basic major compactor for MM tables

2018-04-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428599#comment-16428599
 ] 

Eugene Koifman edited comment on HIVE-19124 at 4/6/18 5:27 PM:
---

The flow for Acid compaction is 
1. Initiator - uses some heuristics (like number of delta files, number of 
aborted txns, etc) to schedule compactions
2. Worker - picks an item from compaction queue (either put there by Initiator 
or via explicit Alter Table) runs a job to produce new files.
3. Cleaner - removes 'obsolete' files when it's safe to do so.

For MM tables the flow so far was
1. Initiator - look for sufficient number of aborted txns and schedule 
compaction.
2. Worker - delete the delta_x_x dirs where X is aborted
3. Cleaner - does nothing

compaction_queue/complete_compaction_queue are metastore tables representing 
the queue (the later keeps historical info) and driver SHOW COMPACTIONS.

With this patch, Alter Table Compact for MM table will do the IOW + delete 
aborted if any, but the auto initiated compaction will also do IOW.  Do we want 
that?  Should Initiator have some logic to decide if IOW is needed?  Maybe this 
can be a follow up ticket.

Longer term I'd like to remove the Cleaner part altogether and move that logic 
to Worker - maybe 3.1 timeframe.

{{//txnManager.closeTxnManager();}} - I don't think this is needed unless 
Session is shutdown.  

This is done in the cleaner
{noformat}
   // TODO: Also delete obsolete directories? How do we account for readers?
477 /*List obsolete = dir.getObsolete();
478 for (FileStatus stat : obsolete) {
479   filesToDelete.add(stat.getPath());
480 }*/
{noformat}

Can you explain this?
{{// TODO: move to global? should be ok if it's always the same thread.}}

There should be some logic to shutdown the session if there are any errors.  
I've seen situations where Session init fails, but it's still attached to 
ThreadLocal and so every Worker in that thread will always get a bad session.  
HIVE-18808 is an example, it has links to others

Another issue: currently compactor will not compact above the smallest open 
writeID in a table.
But this violates that.  So if 3 is open and 5 is committed, the IOW may 
produces base_10, for example, which doesn't have and data from 3.  We could 
add some logic to force the instance of the Driver created here, to create 
ValidTxnList with HWM set to minOpenTxn (from MIN_HISTORY table) - poor man's 
flashback query.



was (Author: ekoifman):
The flow for Acid compaction is 
1. Initiator - uses some heuristics (like number of delta files, number of 
aborted txns, etc) to schedule compactions
2. Worker - picks an item from compaction queue (either put there by Initiator 
or via explicit Alter Table) runs a job to produce new files.
3. Cleaner - removes 'obsolete' files when it's safe to do so.

For MM tables the flow so far was
1. Initiator - look for sufficient number of aborted txns and schedule 
compaction.
2. Worker - delete the delta_x_x dirs where X is aborted
3. Cleaner - does nothing

compaction_queue/complete_compaction_queue are metastore tables representing 
the queue (the later keeps historical info) and driver SHOW COMPACTIONS.

With this patch, Alter Table Compact for MM table will do the IOW + delete 
aborted if any, but the auto initiated compaction will also do IOW.  Do we want 
that?  Should Initiator have some logic to decide if IOW is needed?  Maybe this 
can be a follow up ticket.

Longer term I'd like to remove the Cleaner part altogether and move that logic 
to Worker - maybe 3.1 timeframe.

{{//txnManager.closeTxnManager();}} - I don't think this is needed unless 
Session is shutdown.  

This is done in the cleaner
{noformat}
   // TODO: Also delete obsolete directories? How do we account for readers?
477 /*List obsolete = dir.getObsolete();
478 for (FileStatus stat : obsolete) {
479   filesToDelete.add(stat.getPath());
480 }*/
{noformat}

Can you explain this?
{{// TODO: move to global? should be ok if it's always the same thread.}}

There should be some logic to shutdown the session if there are any errors.  
I've seen situations where Session init fails, but it's still attached to 
ThreadLocal and so every Worker in that thread will always get a bad session.  
HIVE-18808 is an example, it has links to others




> implement a basic major compactor for MM tables
> ---
>
> Key: HIVE-19124
> URL: https://issues.apache.org/jira/browse/HIVE-19124
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-19124.patch
>
>
> For now, it will run a query directly a

[jira] [Commented] (HIVE-19124) implement a basic major compactor for MM tables

2018-04-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428599#comment-16428599
 ] 

Eugene Koifman commented on HIVE-19124:
---

The flow for Acid compaction is 
1. Initiator - uses some heuristics (like number of delta files, number of 
aborted txns, etc) to schedule compactions
2. Worker - picks an item from compaction queue (either put there by Initiator 
or via explicit Alter Table) runs a job to produce new files.
3. Cleaner - removes 'obsolete' files when it's safe to do so.

For MM tables the flow so far was
1. Initiator - look for sufficient number of aborted txns and schedule 
compaction.
2. Worker - delete the delta_x_x dirs where X is aborted
3. Cleaner - does nothing

compaction_queue/complete_compaction_queue are metastore tables representing 
the queue (the later keeps historical info) and driver SHOW COMPACTIONS.

With this patch, Alter Table Compact for MM table will do the IOW + delete 
aborted if any, but the auto initiated compaction will also do IOW.  Do we want 
that?  Should Initiator have some logic to decide if IOW is needed?  Maybe this 
can be a follow up ticket.

Longer term I'd like to remove the Cleaner part altogether and move that logic 
to Worker - maybe 3.1 timeframe.

{{//txnManager.closeTxnManager();}} - I don't think this is needed unless 
Session is shutdown.  

This is done in the cleaner
{noformat}
   // TODO: Also delete obsolete directories? How do we account for readers?
477 /*List obsolete = dir.getObsolete();
478 for (FileStatus stat : obsolete) {
479   filesToDelete.add(stat.getPath());
480 }*/
{noformat}

Can you explain this?
{{// TODO: move to global? should be ok if it's always the same thread.}}

There should be some logic to shutdown the session if there are any errors.  
I've seen situations where Session init fails, but it's still attached to 
ThreadLocal and so every Worker in that thread will always get a bad session.  
HIVE-18808 is an example, it has links to others




> implement a basic major compactor for MM tables
> ---
>
> Key: HIVE-19124
> URL: https://issues.apache.org/jira/browse/HIVE-19124
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-19124.patch
>
>
> For now, it will run a query directly and only major compactions will be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19124) implement a basic major compactor for MM tables

2018-04-06 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19124:
--
Component/s: Transactions

> implement a basic major compactor for MM tables
> ---
>
> Key: HIVE-19124
> URL: https://issues.apache.org/jira/browse/HIVE-19124
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-19124.patch
>
>
> For now, it will run a query directly and only major compactions will be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18739) Add support for Export from Acid table

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427813#comment-16427813
 ] 

Eugene Koifman commented on HIVE-18739:
---

patch 13 for test - includes support (mostly) for Import - doesn't address 
security issue yet

> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch, HIVE-18739.13.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18739) Add support for Export from Acid table

2018-04-05 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.13.patch

> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch, HIVE-18739.13.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427721#comment-16427721
 ] 

Eugene Koifman commented on HIVE-17647:
---

yes, that would be useful

> DDLTask.generateAddMmTasks(Table tbl) and other random code should not start 
> transactions
> -
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17647.01.patch, HIVE-17647.patch
>
>
> This method (and other places) have 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427700#comment-16427700
 ] 

Eugene Koifman edited comment on HIVE-17647 at 4/5/18 10:41 PM:


if you look at acid_vectorization_original_tez.q.out, the same query shows 
rows

{noformat}
PREHOOK: query: select distinct 7 as seven, INPUT__FILE__NAME from 
over10k_orc_bucketed
PREHOOK: type: QUERY
PREHOOK: Input: default@over10k_orc_bucketed
PREHOOK: Output: hdfs://### HDFS PATH ###
POSTHOOK: query: select distinct 7 as seven, INPUT__FILE__NAME from 
over10k_orc_bucketed
POSTHOOK: type: QUERY
POSTHOOK: Input: default@over10k_orc_bucketed
POSTHOOK: Output: hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
{noformat}

Either way, removing rows from output is not really "masking"



was (Author: ekoifman):
if you look at acid_vectorization_original_tez.q.out, the same query shows 
rows

{noformat}
PREHOOK: query: select distinct 7 as seven, INPUT__FILE__NAME from 
over10k_orc_bucketed
PREHOOK: type: QUERY
PREHOOK: Input: default@over10k_orc_bucketed
PREHOOK: Output: hdfs://### HDFS PATH ###
POSTHOOK: query: select distinct 7 as seven, INPUT__FILE__NAME from 
over10k_orc_bucketed
POSTHOOK: type: QUERY
POSTHOOK: Input: default@over10k_orc_bucketed
POSTHOOK: Output: hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
{noformat}

> DDLTask.generateAddMmTasks(Table tbl) and other random code should not start 
> transactions
> -
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17647.01.patch, HIVE-17647.patch
>
>
> This method (and other places) have 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427700#comment-16427700
 ] 

Eugene Koifman commented on HIVE-17647:
---

if you look at acid_vectorization_original_tez.q.out, the same query shows 
rows

{noformat}
PREHOOK: query: select distinct 7 as seven, INPUT__FILE__NAME from 
over10k_orc_bucketed
PREHOOK: type: QUERY
PREHOOK: Input: default@over10k_orc_bucketed
PREHOOK: Output: hdfs://### HDFS PATH ###
POSTHOOK: query: select distinct 7 as seven, INPUT__FILE__NAME from 
over10k_orc_bucketed
POSTHOOK: type: QUERY
POSTHOOK: Input: default@over10k_orc_bucketed
POSTHOOK: Output: hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
7   hdfs://### HDFS PATH ###
{noformat}

> DDLTask.generateAddMmTasks(Table tbl) and other random code should not start 
> transactions
> -
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17647.01.patch, HIVE-17647.patch
>
>
> This method (and other places) have 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427541#comment-16427541
 ] 

Eugene Koifman edited comment on HIVE-17647 at 4/5/18 8:29 PM:
---

hmm. I use this in all Acid UTs (though not q files), e.g. TestTxnLoadData.
using virtual columns (except ROW__ID) will disable Vectorization but surely 
should return data

acid_vectorization_original.q has queries like
select ROW__ID, t, si, i from over10k_orc_bucketed where b = 4294967363 and t < 
100 order by ROW__ID;
which is using a VC.



was (Author: ekoifman):
hmm. I use this in all Acid UTs (though not q files), e.g. TestTxnLoadData.
using virtual columns (except ROW__ID) will disable Vectorization but surely 
should return data


> DDLTask.generateAddMmTasks(Table tbl) and other random code should not start 
> transactions
> -
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17647.patch
>
>
> This method (and other places) have 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427541#comment-16427541
 ] 

Eugene Koifman commented on HIVE-17647:
---

hmm. I use this in all Acid UTs (though not q files), e.g. TestTxnLoadData.
using virtual columns (except ROW__ID) will disable Vectorization but surely 
should return data


> DDLTask.generateAddMmTasks(Table tbl) and other random code should not start 
> transactions
> -
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17647.patch
>
>
> This method (and other places) have 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427463#comment-16427463
 ] 

Eugene Koifman commented on HIVE-17647:
---

select *, INPUT__FILE__NAME from T


> DDLTask.generateAddMmTasks(Table tbl) and other random code should not start 
> transactions
> -
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17647.patch
>
>
> This method (and other places) have 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427398#comment-16427398
 ] 

Eugene Koifman edited comment on HIVE-17647 at 4/5/18 6:29 PM:
---

In DDLTask
{noformat}
if (writeId == null) {
  throw new HiveException("Internal error - write ID not set for MM 
conversion"); 
 }
 could this include table name or path
{noformat}

Could you add a test to check that X lock is actually acquired?  
(TestDbTxnManager2) I'm not sure {{ddlWork.setNeedLock(true);}} will do 
anything.  LM relies on WriteEntity with WriteType.DD_EXCLUSIVE to get X lock...

Also, I don't any .q tests check that the data lands in the right place, i.e. 
into a delta.  that would be useful.


was (Author: ekoifman):
In DDLTask
{noformat}
if (writeId == null) {
  throw new HiveException("Internal error - write ID not set for MM 
conversion"); 
 }
 could this include table name or path
{noformat}

Could you add a test to check that X lock is actually acquired?  
(TestDbTxnManager2) I'm not sure {{ddlWork.setNeedLock(true);}} will do 
anything.  LM relies on WriteEntity with WriteType.DD_EXCLUSIVE to get X lock...


> DDLTask.generateAddMmTasks(Table tbl) and other random code should not start 
> transactions
> -
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17647.patch
>
>
> This method (and other places) have 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17647) DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427398#comment-16427398
 ] 

Eugene Koifman commented on HIVE-17647:
---

In DDLTask
{noformat}
if (writeId == null) {
  throw new HiveException("Internal error - write ID not set for MM 
conversion"); 
 }
 could this include table name or path
{noformat}

Could you add a test to check that X lock is actually acquired?  
(TestDbTxnManager2) I'm not sure {{ddlWork.setNeedLock(true);}} will do 
anything.  LM relies on WriteEntity with WriteType.DD_EXCLUSIVE to get X lock...


> DDLTask.generateAddMmTasks(Table tbl) and other random code should not start 
> transactions
> -
>
> Key: HIVE-17647
> URL: https://issues.apache.org/jira/browse/HIVE-17647
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17647.patch
>
>
> This method (and other places) have 
> {noformat}
>   if (txnManager.isTxnOpen()) {
> mmWriteId = txnManager.getCurrentTxnId();
>   } else {
> mmWriteId = txnManager.openTxn(new Context(conf), conf.getUser());
> txnManager.commitTxn();
>   }
> {noformat}
> this should throw if there is no open transaction.  It should never open one.
> In general the logic seems suspect.  Looks like the intent is to move all 
> existing files into a delta_x_x/ when a plain table is converted to MM table. 
>  This seems like something that needs to be done from under an Exclusive lock 
> to prevent concurrent Insert operations writing data under table/partition 
> root.  But this is too late to acquire locks which should be done from the 
> Driver.acquireLocks()  (or else have deadlock detector since acquiring them 
> here would bread all-or-nothing lock acquisition semantics currently required 
> w/o deadlock detector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19083) Make partition clause optional for INSERT

2018-04-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427262#comment-16427262
 ] 

Eugene Koifman commented on HIVE-19083:
---

[~vgarg] could you update the wiki on this?
https://cwiki.apache.org/confluence/display/hive/languagemanual+dml#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
https://cwiki.apache.org/confluence/display/hive/languagemanual+dml#LanguageManualDML-InsertingvaluesintotablesfromSQL

have all examples using PARTITION clause

> Make partition clause optional for INSERT
> -
>
> Key: HIVE-19083
> URL: https://issues.apache.org/jira/browse/HIVE-19083
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-19083.1.patch, HIVE-19083.2.patch, 
> HIVE-19083.3.patch, HIVE-19083.4.patch
>
>
> Partition clause should be optional for
>  * INSERT INTO VALUES
>  * INSERT OVERWRITE
>  * INSERT SELECT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19115) Merge: Semijoin hints are dropped by the merge

2018-04-05 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19115:
--
Component/s: Transactions
 Query Planning

> Merge: Semijoin hints are dropped by the merge
> --
>
> Key: HIVE-19115
> URL: https://issues.apache.org/jira/browse/HIVE-19115
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Transactions
>Reporter: Gopal V
>Priority: Major
>
> {code}
> create table target stored as orc as select ss_ticket_number, ss_item_sk, 
> current_timestamp as `ts` from tpcds_bin_partitioned_orc_1000.store_sales;
> create table source stored as orc as select sr_ticket_number, sr_item_sk, 
> d_date from tpcds_bin_partitioned_orc_1000.store_returns join 
> tpcds_bin_partitioned_orc_1000.date_dim where d_date_sk = sr_returned_date_sk;
> merge /* +semi(T, sr_ticket_number, S, 1) */ into target T using (select 
> * from source where year(d_date) = 1998) S ON T.ss_ticket_number = 
> S.sr_ticket_number and sr_item_sk = ss_item_sk 
> when matched THEN UPDATE SET ts = current_timestamp
> when not matched and sr_item_sk is not null and sr_ticket_number is not null 
> THEN INSERT VALUES(S.sr_ticket_number, S.sr_item_sk, current_timestamp);
> {code}
> The semijoin hints are ignored and the code says 
> {code}
>  todo: do we care to preserve comments in original SQL?
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L624
> in this case we do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18741) Add support for Import into Acid table

2018-04-04 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18741:
--
Status: Patch Available  (was: Open)

> Add support for Import into Acid table
> --
>
> Key: HIVE-18741
> URL: https://issues.apache.org/jira/browse/HIVE-18741
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18741.01.patch
>
>
> This should follow Load Data approach (or use load data directly)
> Note that import supports partition spec
> Does import support loading files not created by Export?  If so, similarly to 
> HIVE-19029 - should check for Acid meta columns and reject



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18741) Add support for Import into Acid table

2018-04-04 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18741:
--
Attachment: HIVE-18741.01.patch

> Add support for Import into Acid table
> --
>
> Key: HIVE-18741
> URL: https://issues.apache.org/jira/browse/HIVE-18741
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18741.01.patch
>
>
> This should follow Load Data approach (or use load data directly)
> Note that import supports partition spec
> Does import support loading files not created by Export?  If so, similarly to 
> HIVE-19029 - should check for Acid meta columns and reject



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19100) investigate TestStreaming failures

2018-04-04 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19100:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
 Release Note: n/a
   Status: Resolved  (was: Patch Available)

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-19100.01.patch, HIVE-19100.02.patch, 
> HIVE-19100.03.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19100) investigate TestStreaming failures

2018-04-04 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426084#comment-16426084
 ] 

Eugene Koifman commented on HIVE-19100:
---

there is a bunch of TestTezPerfCliDriver failures with age = 1 but other runs
https://builds.apache.org/job/PreCommit-HIVE-Build/9992/testReport
https://builds.apache.org/job/PreCommit-HIVE-Build/9991/testReport
contain identical failures

thanks Alan for the review
committed to master

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-19100.01.patch, HIVE-19100.02.patch, 
> HIVE-19100.03.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17687) CompactorMR.run() should update compaction_queue table for MM

2018-04-04 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425999#comment-16425999
 ] 

Eugene Koifman commented on HIVE-17687:
---

currently, compactor for MM will delete Aborted dirs in Worker and put the 
queue entry in READY FOR CLEANING state - via TxnHandler.markCompacted() but it 
should probably do TxnHandler.markCleaned() since Cleaner has nothing nothing 
to do for MM.

it'd be useful but not critical

> CompactorMR.run() should update compaction_queue table for MM
> -
>
> Key: HIVE-17687
> URL: https://issues.apache.org/jira/browse/HIVE-17687
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
>
> for MM it deletes Aborted dirs and bails.  Should probably update 
> compaction_queue so that it's clear why it doesn't have HadoopJobId etc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18570) ACID IOW implemented using base may delete too much data

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18570:
--
Target Version/s: 3.0.0
   Fix Version/s: (was: 3.0.0)

> ACID IOW implemented using base may delete too much data
> 
>
> Key: HIVE-18570
> URL: https://issues.apache.org/jira/browse/HIVE-18570
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Priority: Blocker
>
> Suppose we have a table with delta_0 insert data.
> Txn 1 starts an insert into delta_1.
> Txn 2 starts an IOW into base_2.
> Txn 2 commits.
> Txn 1 commits after txn 2 but its results would be invisible.
> Txn 2 deletes rows committed by txn 1 that according to standard ACID 
> semantics it could have never observed and affected; this sequence of events 
> is only possible under read-uncommitted isolation level (so, 2 deletes rows 
> written by 1 before 1 commits them). 
> This is if we look at IOW as transactional delete+insert. Otherwise we are 
> just saying IOW performs "semi"-transactional delete.
> If 1 ran an update on rows instead of an insert, and 2 still ran an 
> IOW/delete, row lock conflict (or equivalent) should cause one of them to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18570) ACID IOW implemented using base may delete too much data

2018-04-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424696#comment-16424696
 ] 

Eugene Koifman commented on HIVE-18570:
---

Given the current state of things the only way to prevent this is to make IOW 
take an X lock which would block all readers as well.  So perhaps there should 
be a "is strict" type of option to cause this behavior.  Longer term we should 
enhance LM to have a lock that blocks all writes but not reads for this (would 
be useful elsewhere as well).

> ACID IOW implemented using base may delete too much data
> 
>
> Key: HIVE-18570
> URL: https://issues.apache.org/jira/browse/HIVE-18570
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Suppose we have a table with delta_0 insert data.
> Txn 1 starts an insert into delta_1.
> Txn 2 starts an IOW into base_2.
> Txn 2 commits.
> Txn 1 commits after txn 2 but its results would be invisible.
> Txn 2 deletes rows committed by txn 1 that according to standard ACID 
> semantics it could have never observed and affected; this sequence of events 
> is only possible under read-uncommitted isolation level (so, 2 deletes rows 
> written by 1 before 1 commits them). 
> This is if we look at IOW as transactional delete+insert. Otherwise we are 
> just saying IOW performs "semi"-transactional delete.
> If 1 ran an update on rows instead of an insert, and 2 still ran an 
> IOW/delete, row lock conflict (or equivalent) should cause one of them to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19100:
--
Attachment: HIVE-19100.03.patch

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19100.01.patch, HIVE-19100.02.patch, 
> HIVE-19100.03.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424477#comment-16424477
 ] 

Eugene Koifman commented on HIVE-19100:
---

This turned out entirely self inflicted - consequence of "add partition" w/o 
any data allocating a writeId - which it doesn't need to.

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19100.01.patch, HIVE-19100.02.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19100:
--
Attachment: HIVE-19100.02.patch

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19100.01.patch, HIVE-19100.02.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19100:
--
Status: Patch Available  (was: Open)

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19100.01.patch, HIVE-19100.02.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19100:
--
Status: Open  (was: Patch Available)

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19100.01.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424435#comment-16424435
 ] 

Eugene Koifman commented on HIVE-19100:
---

I looked at 2 of the tests: testMultipleTransactionBatchCommits and 
testTransactionBatchAbortAndCommit
In both the difference is different wrtieIDs in delta name from what is 
expected.  This could be due to an additional write to the test table before 
the failing check or something else consuming a write id - I can't tell what 
could've caused the change.



> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19100.01.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18814) Support Add Partition For Acid tables

2018-04-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424427#comment-16424427
 ] 

Eugene Koifman commented on HIVE-18814:
---

I filed HIVE-19100 to follow up on tests

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch, 
> HIVE-18814.03.patch, HIVE-18814.04.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19100:
--
Attachment: HIVE-19100.01.patch

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19100.01.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19100:
--
Status: Patch Available  (was: Open)

> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19100.01.patch
>
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19100:
--
Description: 
{noformat}
[ERROR] Failures: 
[ERROR]   
TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
 expected:<11> but was:<12>
[ERROR]   
TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
expected:<1> but was:<2>
[ERROR]   
TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
 expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
 expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
 expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
 expected:<1> but was:<3>
[INFO] 
[ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0

{noformat}


  was:
[ERROR] Failures: 
[ERROR]   
TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
 expected:<11> but was:<12>
[ERROR]   
TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
expected:<1> but was:<2>
[ERROR]   
TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
 expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
 expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
 expected:<1> but was:<3>
[ERROR]   
TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
 expected:<1> but was:<3>
[INFO] 
[ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0



> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {noformat}
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-19100) investigate TestStreaming failures

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-19100:
-


> investigate TestStreaming failures
> --
>
> Key: HIVE-19100
> URL: https://issues.apache.org/jira/browse/HIVE-19100
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> [ERROR] Failures: 
> [ERROR]   
> TestStreaming.testInterleavedTransactionBatchCommits:1218->checkDataWritten2:619
>  expected:<11> but was:<12>
> [ERROR]   
> TestStreaming.testMultipleTransactionBatchCommits:1157->checkDataWritten2:619 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchAbortAndCommit:1138->checkDataWritten:566 
> expected:<1> but was:<2>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Delimited:861->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_DelimitedUGI:865->testTransactionBatchCommit_Delimited:881->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Json:1011->checkDataWritten:566 
> expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_Regex:928->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [ERROR]   
> TestStreaming.testTransactionBatchCommit_RegexUGI:932->testTransactionBatchCommit_Regex:949->checkDataWritten:566
>  expected:<1> but was:<3>
> [INFO] 
> [ERROR] Tests run: 26, Failures: 8, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-04-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch, 
> HIVE-18814.03.patch, HIVE-18814.04.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18814) Support Add Partition For Acid tables

2018-04-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424331#comment-16424331
 ] 

Eugene Koifman commented on HIVE-18814:
---

thanks for the review
Virtual columns like INPUT__FILE__NAME disable vectorization.  There is special 
handling to make ROW__ID vectorize.

How do you run TestStreaming?
https://builds.apache.org/job/PreCommit-HIVE-Build/9952/testReport/ for patch 4 
shows 8 TestStreaming failures.  I see the same failures locally with and w/o 
my patch.
I run {{mvn test -Dtest=TestStreaming}} from hcatalog/streaming.


> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch, 
> HIVE-18814.03.patch, HIVE-18814.04.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19084) Test case in Hive Query Language fails with a java.lang.AssertionError.

2018-04-02 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422963#comment-16422963
 ] 

Eugene Koifman commented on HIVE-19084:
---

FYI, [~steveyeom2017]

> Test case in Hive Query Language fails with a java.lang.AssertionError.
> ---
>
> Key: HIVE-19084
> URL: https://issues.apache.org/jira/browse/HIVE-19084
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Transactions
> Environment: uname -a
> Linux pts00607-vm3 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:46 UTC 
> 2018 ppc64le ppc64le ppc64le GNU/Linux
>Reporter: Alisha Prabhu
>Priority: Major
> Attachments: HIVE-19084.1.patch
>
>
> The test case testInsertOverwriteForPartitionedMmTable in 
> TestTxnCommandsForMmTable.java and TestTxnCommandsForOrcMmTable.java fails 
> with a java.lang.AssertionError.
> Maven command used is mvn 
> -Dtest=TestTxnCommandsForMmTable#testInsertOverwriteForPartitionedMmTable test
> The test case fails as the listStatus function of the FileSystem does not 
> guarantee to return the List of files/directories status in a sorted order.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19084) Test case in Hive Query Language fails with a java.lang.AssertionError.

2018-04-02 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19084:
--
Component/s: (was: Hive)
 Transactions
 Test

> Test case in Hive Query Language fails with a java.lang.AssertionError.
> ---
>
> Key: HIVE-19084
> URL: https://issues.apache.org/jira/browse/HIVE-19084
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Transactions
> Environment: uname -a
> Linux pts00607-vm3 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:46 UTC 
> 2018 ppc64le ppc64le ppc64le GNU/Linux
>Reporter: Alisha Prabhu
>Priority: Major
> Attachments: HIVE-19084.1.patch
>
>
> The test case testInsertOverwriteForPartitionedMmTable in 
> TestTxnCommandsForMmTable.java and TestTxnCommandsForOrcMmTable.java fails 
> with a java.lang.AssertionError.
> Maven command used is mvn 
> -Dtest=TestTxnCommandsForMmTable#testInsertOverwriteForPartitionedMmTable test
> The test case fails as the listStatus function of the FileSystem does not 
> guarantee to return the List of files/directories status in a sorted order.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19058) add object owner to HivePrivilegeObject

2018-04-02 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19058:
--
Attachment: HIVE-19058.03.patch

> add object owner to HivePrivilegeObject
> ---
>
> Key: HIVE-19058
> URL: https://issues.apache.org/jira/browse/HIVE-19058
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19058.01.patch, HIVE-19058.02.patch, 
> HIVE-19058.03.patch
>
>
> this can enable HiveAuthorizer to create policies based on the owner of the 
> object - for example, only let the owner of a table read/write it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-04-02 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422949#comment-16422949
 ] 

Eugene Koifman commented on HIVE-18747:
---

+1 patch 6

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch, HIVE-18747.05.patch, 
> HIVE-18747.06.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18814) Support Add Partition For Acid tables

2018-04-02 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422916#comment-16422916
 ] 

Eugene Koifman commented on HIVE-18814:
---

it works just like Load Data - open a transaction, create a delta, copy/move 
the data there.  This ensures proper transactional semantics (wrt data anyway). 
 ROW__IDs are attached at read/compaction time.

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch, 
> HIVE-18814.03.patch, HIVE-18814.04.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-04-01 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Attachment: HIVE-18814.04.patch

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch, 
> HIVE-18814.03.patch, HIVE-18814.04.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18814) Support Add Partition For Acid tables

2018-04-01 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421721#comment-16421721
 ] 

Eugene Koifman commented on HIVE-18814:
---

Failures are not related.  There is a number of TestStreaming failures that 
fail w/o this patch
[~alangates] could you review please

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch, 
> HIVE-18814.03.patch, HIVE-18814.04.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-31 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Attachment: HIVE-18814.03.patch

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch, 
> HIVE-18814.03.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19058) add object owner to HivePrivilegeObject

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19058:
--
Attachment: HIVE-19058.02.patch

> add object owner to HivePrivilegeObject
> ---
>
> Key: HIVE-19058
> URL: https://issues.apache.org/jira/browse/HIVE-19058
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19058.01.patch, HIVE-19058.02.patch
>
>
> this can enable HiveAuthorizer to create policies based on the owner of the 
> object - for example, only let the owner of a table read/write it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Attachment: HIVE-18814.02.patch

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.01.patch, HIVE-18814.02.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18021) Insert overwrite on acid table with Union All optimizations

2018-03-29 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419951#comment-16419951
 ] 

Eugene Koifman commented on HIVE-18021:
---

I think it may generate the same ROW__IDs in each subdir for IOW case - need to 
check.


> Insert overwrite on acid table with Union All optimizations
> ---
>
> Key: HIVE-18021
> URL: https://issues.apache.org/jira/browse/HIVE-18021
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> This is a followup from HIVE-14988.
> T is unbucketed acid table
> {noformat}
> insert into T select a,b from S union all select a,b from S1
> {noformat}
> will create a separate subdirectory for each leg of the union in the target 
> table
> (automatically on Tez, with some props enabled on MR)
> Regular Insert will make each subdirectory be a delta_x_x_0, delta_x_x_1.  
> See HIVE-15899.
> There is no such suffix mechanism for base_x/.  
> Need to figure how this should work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17661) DBTxnManager.acquireLocks() - MM tables should use shared lock for Insert

2018-03-29 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419949#comment-16419949
 ] 

Eugene Koifman commented on HIVE-17661:
---

+1 



> DBTxnManager.acquireLocks() - MM tables should use shared lock for Insert
> -
>
> Key: HIVE-17661
> URL: https://issues.apache.org/jira/browse/HIVE-17661
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-17661.patch
>
>
> {noformat}
> case INSERT:
>   assert t != null;
>   if(AcidUtils.isFullAcidTable(t)) {
> compBuilder.setShared();
>   }
>   else {
> if 
> (conf.getBoolVar(HiveConf.ConfVars.HIVE_TXN_STRICT_LOCKING_MODE)) {
> {noformat}
> _if(AcidUtils.isFullAcidTable(t)) {_ 
> should probably be 
> _if(AcidUtils.isAcidTable(t)) {_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-29 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419947#comment-16419947
 ] 

Eugene Koifman edited comment on HIVE-18747 at 3/29/18 11:46 PM:
-

There are a few new checkstyle problems, otherwise
+1 patch 3

is the change in TxnHandler.getValidWriteIdsForTable() specific to this ticket 
or a general fix?


was (Author: ekoifman):
+1 patch 3

is the change in TxnHandler.getValidWriteIdsForTable() specific to this ticket 
or a general fix?

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-29 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419947#comment-16419947
 ] 

Eugene Koifman commented on HIVE-18747:
---

+1 patch 3

is the change in TxnHandler.getValidWriteIdsForTable() specific to this ticket 
or a general fix?

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18741) Add support for Import into Acid table

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18741:
--
Description: 
This should follow Load Data approach (or use load data directly)

Note that import supports partition spec

Does import support loading files not created by Export?  If so, similarly to 
HIVE-19029 - should check for Acid meta columns and reject

  was:
This should follow Load Data approach (or use load data directly)

Note that import supports partition spec


> Add support for Import into Acid table
> --
>
> Key: HIVE-18741
> URL: https://issues.apache.org/jira/browse/HIVE-18741
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> This should follow Load Data approach (or use load data directly)
> Note that import supports partition spec
> Does import support loading files not created by Export?  If so, similarly to 
> HIVE-19029 - should check for Acid meta columns and reject



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-19081) Add partition should prevent loading acid files

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-19081:
-


> Add partition should prevent loading acid files
> ---
>
> Key: HIVE-19081
> URL: https://issues.apache.org/jira/browse/HIVE-19081
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> similar to HIVE-19029
> {{Alter Table T add Partition ...} T is acid should check to make sure input 
> files were not copied from another Acid table, i.e. make sure the files don't 
> have Acid metadata columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19029) Load Data should prevent loading acid files

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19029:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch, HIVE-19029.02.patch, 
> HIVE-19029.04.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19029) Load Data should prevent loading acid files

2018-03-29 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419871#comment-16419871
 ] 

Eugene Koifman commented on HIVE-19029:
---

tried various failed tests locally - they either pass or fail w/o the patch in 
the same way 

committed to master

thanks Jason for the review

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch, HIVE-19029.02.patch, 
> HIVE-19029.04.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Status: Patch Available  (was: Open)

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.01.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Attachment: HIVE-18814.01.patch

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.01.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Attachment: (was: HIVE-18814.wip.patch)

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Docs Text: 
todo: update 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions
 wrt transactional tables
1. locking
2. copy/rename

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.wip.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a delete on this data (added via Add 
> partition) will use row_ids with txnid:0 so a later upgrade that sees 
> un-compacted may generate row_ids with different txnid (assuming this is 
> fixed by then)
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19029) Load Data should prevent loading acid files

2018-03-29 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419418#comment-16419418
 ] 

Eugene Koifman commented on HIVE-19029:
---

todo: make sure Add partition(HIVE-18814) Import (HIVE-18741) does the same 
check 

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch, HIVE-19029.02.patch, 
> HIVE-19029.04.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18774) ACID: Use the _copy_N files copyNumber as the implicit statement-id

2018-03-27 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416236#comment-16416236
 ] 

Eugene Koifman commented on HIVE-18774:
---

This is not as simple:
Union All 
may create
warehouse/t/HIVE_UNION_SUBDIR_1/00_0
warehouse/t/HIVE_UNION_SUBDIR_2/00_0
or
warehouse/t/1/00_0
warehouse/t/2/00_0
and I suppose
warehouse/t/1/00_0_copy_1
warehouse/t/2/00_0_copy_1

More generally, if something other than Hive is writing, it may not be safe to 
assume 1 level subfolder depth.

> ACID: Use the _copy_N files copyNumber as the implicit statement-id
> ---
>
> Key: HIVE-18774
> URL: https://issues.apache.org/jira/browse/HIVE-18774
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Eugene Koifman
>Priority: Major
>
> When upgrading flat ORC files to ACID, use the _copy_N numbering as a 
> statement-id to avoid having to align the row numbering between _copy_1 and 
> _copy_2 files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19029) Load Data should prevent loading acid files

2018-03-27 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19029:
--
Attachment: HIVE-19029.04.patch

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch, HIVE-19029.02.patch, 
> HIVE-19029.04.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18739) Add support for Export from Acid table

2018-03-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414837#comment-16414837
 ] 

Eugene Koifman commented on HIVE-18739:
---

I think relying on the user to create the right security config is a must.
We don't have explicit checks to make sure the user has configured 
Authentication or that the user  didn't configure Ranger policy to allow 
everyone access to everything.  Appropriate security config is always the end 
user's responsibility.

> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19058) add object owner to HivePrivilegeObject

2018-03-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19058:
--
Status: Patch Available  (was: Open)

> add object owner to HivePrivilegeObject
> ---
>
> Key: HIVE-19058
> URL: https://issues.apache.org/jira/browse/HIVE-19058
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19058.01.patch
>
>
> this can enable HiveAuthorizer to create policies based on the owner of the 
> object - for example, only let the owner of a table read/write it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19058) add object owner to HivePrivilegeObject

2018-03-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19058:
--
Attachment: HIVE-19058.01.patch

> add object owner to HivePrivilegeObject
> ---
>
> Key: HIVE-19058
> URL: https://issues.apache.org/jira/browse/HIVE-19058
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19058.01.patch
>
>
> this can enable HiveAuthorizer to create policies based on the owner of the 
> object - for example, only let the owner of a table read/write it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-19058) add object owner to HivePrivilegeObject

2018-03-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-19058:
-


> add object owner to HivePrivilegeObject
> ---
>
> Key: HIVE-19058
> URL: https://issues.apache.org/jira/browse/HIVE-19058
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> this can enable HiveAuthorizer to create policies based on the owner of the 
> object - for example, only let the owner of a table read/write it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19056) IllegalArgumentException in FixAcidKeyIndex when ORC file has 0 rows

2018-03-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414810#comment-16414810
 ] 

Eugene Koifman commented on HIVE-19056:
---

+1

> IllegalArgumentException in FixAcidKeyIndex when ORC file has 0 rows
> 
>
> Key: HIVE-19056
> URL: https://issues.apache.org/jira/browse/HIVE-19056
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-19056.1.patch
>
>
> {noformat}
> ERROR recovering 
> /Users/jdere/dev/hwx/gerrit/hive2-gerrit/ql/target/tmp/TestFixAcidKeyIndex.testValidKeyIndex.orc
> java.lang.IllegalArgumentException: Seek to a negative row number -1
> at 
> org.apache.orc.impl.RecordReaderImpl.seekToRow(RecordReaderImpl.java:1300)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.seekToRow(RecordReaderImpl.java:101)
> at 
> org.apache.hadoop.hive.ql.io.orc.FixAcidKeyIndex.recoverFile(FixAcidKeyIndex.java:232)
> at 
> org.apache.hadoop.hive.ql.io.orc.FixAcidKeyIndex.recoverFiles(FixAcidKeyIndex.java:132)
> at 
> org.apache.hadoop.hive.ql.io.orc.FixAcidKeyIndex.main(FixAcidKeyIndex.java:104)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18739) Add support for Export from Acid table

2018-03-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414430#comment-16414430
 ] 

Eugene Koifman commented on HIVE-18739:
---

That is precisely the model for SQL Standard auth.
If Ranger is used, yes I'd expect this to be configured in Ranger when this 
'scratch' DB is created.  Even if there is a way to check Ranger policy 
programmatically, shouldn't the end user be able to configure things as they 
wish?  

> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18739) Add support for Export from Acid table

2018-03-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414419#comment-16414419
 ] 

Eugene Koifman commented on HIVE-18739:
---

I didn't mean temp in the  "CREATE TEMPORARY" sense - that would be great but 
like I said before this doesn't support partitioned tables.

bq. Can you elaborate on the policy, will this patch include it?
I don't understand what this means

> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18739) Add support for Export from Acid table

2018-03-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414349#comment-16414349
 ] 

Eugene Koifman commented on HIVE-18739:
---

The temp table is created and populated by the user who submits the Export 
command.  
A meaningful default security policy would be configured so that this table is 
only readable by the user creating it in which case there is no issue.  If the 
default is for all objects to fully public then the problem lies with that.

Regarding compaction, exporting any file produced by it will have ROW_IDs 
embedded in it.  These do not make sense in a different table/cluster w/o all 
the metadata in the metastore.  


> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19029) Load Data should prevent loading acid files

2018-03-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414301#comment-16414301
 ] 

Eugene Koifman commented on HIVE-19029:
---

{{//Te is just a simple way to generate test dataß}}  - it's not.  I'll clean 
up on commit.

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch, HIVE-19029.02.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19029) Load Data should prevent loading acid files

2018-03-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414107#comment-16414107
 ] 

Eugene Koifman commented on HIVE-19029:
---

[~jdere] could you review please

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch, HIVE-19029.02.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19029) Load Data should prevent loading acid files

2018-03-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19029:
--
Attachment: HIVE-19029.02.patch

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch, HIVE-19029.02.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18739) Add support for Export from Acid table

2018-03-22 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410648#comment-16410648
 ] 

Eugene Koifman commented on HIVE-18739:
---

you can't use a temp table for this - it doesn't support partitioned tables

> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19029) Load Data should prevent loading acid files

2018-03-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19029:
--
Attachment: HIVE-19029.01.patch

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19029) Load Data should prevent loading acid files

2018-03-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19029:
--
Status: Patch Available  (was: Open)

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-19029.01.patch
>
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-19030) Update Wiki with new rules for Load Data

2018-03-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-19030:
-


> Update Wiki with new rules for Load Data
> 
>
> Key: HIVE-19030
> URL: https://issues.apache.org/jira/browse/HIVE-19030
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Deepak Jaiswal
>Priority: Major
>
> [~djaiswal] could you please update
> https://cwiki.apache.org/confluence/display/hive/languagemanual+dml#LanguageManualDML-Loadingfilesintotables
> with latest rules based on HIVE-18125



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19029) Load Data should prevent loading acid files

2018-03-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19029:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-17361

> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-19029) Load Data should prevent loading acid files

2018-03-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-19029:
-


> Load Data should prevent loading acid files
> ---
>
> Key: HIVE-19029
> URL: https://issues.apache.org/jira/browse/HIVE-19029
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {{Load Data into T}} where T is acid should check to make sure input files 
> were not copied from another Acid table, i.e. make sure the files don't have 
> Acid metadata columns.
> AcidUtils.MetaData.isRawFormat()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18774) ACID: Use the _copy_N files copyNumber as the implicit statement-id

2018-03-22 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18774:
-

Assignee: Eugene Koifman

> ACID: Use the _copy_N files copyNumber as the implicit statement-id
> ---
>
> Key: HIVE-18774
> URL: https://issues.apache.org/jira/browse/HIVE-18774
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Eugene Koifman
>Priority: Major
>
> When upgrading flat ORC files to ACID, use the _copy_N numbering as a 
> statement-id to avoid having to align the row numbering between _copy_1 and 
> _copy_2 files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408730#comment-16408730
 ] 

Eugene Koifman commented on HIVE-18825:
---

+1 patch 6 pending tests

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.05.patch, 
> HIVE-18825.06.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18751) ACID table scan through get_splits UDF doesn't receive ValidWriteIdList configuration.

2018-03-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408726#comment-16408726
 ] 

Eugene Koifman commented on HIVE-18751:
---

[~sankarh], should
{noformat}
  // Pass the ValidTxnList and ValidTxnWriteIdList snapshot configurations 
corresponding to the input query
  HiveConf driverConf = driver.getConf();
  String validTxnString = driverConf.get(ValidTxnList.VALID_TXNS_KEY);
  if (validTxnString != null) {
jc.set(ValidTxnList.VALID_TXNS_KEY, validTxnString);
  }
  String validWriteIdString = 
driverConf.get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY);
  if (validWriteIdString != null) {
jc.set(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY, 
validWriteIdString);
  }
{noformat}
do some sort of check to make sure that this value was set as expected? 

this patch crossed with HIVE-18825 and could've caused a really bad bug I think

> ACID table scan through get_splits UDF doesn't receive ValidWriteIdList 
> configuration.
> --
>
> Key: HIVE-18751
> URL: https://issues.apache.org/jira/browse/HIVE-18751
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, UDF, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18751.01.patch
>
>
> Per table write ID (HIVE-18192) have replaced global transaction ID with 
> write ID to version data files in ACID/MM tables,
> To ensure snapshot isolation, need to generate ValidWriteIdList for the given 
> txn/table and use it when scan the ACID/MM tables.
> In case of get_splits UDF which runs on ACID table scan query won't receive 
> it properly through configuration (hive.txn.tables.valid.writeids) and hence 
> throws exception. 
> TestAcidOnTez.testGetSplitsLocks is the test failing for the same. Need to 
> fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408489#comment-16408489
 ] 

Eugene Koifman edited comment on HIVE-18825 at 3/21/18 8:00 PM:


GenericUDTFGetSplits.createPlanFragment() now calls compileAndRespond(String, 
true) which removes ValidTxnList from Conf.  But createPlanFragment() then 
tries to access it.  Is that intentional?

if it is, shouldn't validTxnListsGenerated be unset?


was (Author: ekoifman):
GenericUDTFGetSplits.createPlanFragment() now calls compileAndRespond(String, 
true) which removes ValidTxnList from Conf.  But createPlanFragment() then 
tries to access it.  Is that intentional?

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.05.patch, 
> HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408489#comment-16408489
 ] 

Eugene Koifman commented on HIVE-18825:
---

GenericUDTFGetSplits.createPlanFragment() now calls compileAndRespond(String, 
true) which removes ValidTxnList from Conf.  But createPlanFragment() then 
tries to access it.  Is that intentional?

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.05.patch, 
> HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18948) Acquire locks before generating the valid transaction list

2018-03-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408437#comment-16408437
 ] 

Eugene Koifman commented on HIVE-18948:
---

WriteEntity also has a WriteType which determines lock type - can that be built 
from AST?

Also, could you add a guess at how much effort this is?

> Acquire locks before generating the valid transaction list
> --
>
> Key: HIVE-18948
> URL: https://issues.apache.org/jira/browse/HIVE-18948
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Priority: Critical
>
> HIVE-18825 moves the valid transaction list generation logic before query 
> optimization. In order to support lock-based concurrency control correctly, 
> the logic to acquire the locks has to be moved before query optimization too, 
> and before the valid transaction list is generated.
> This requires a bit of work/refactoring, since lock acquisition logic relies 
> on read/write entities, and it is heavily dependent on the QueryPlan object 
> too, e.g., it sets some properties in the file sink descriptors for the plan. 
> Currently, all these data structures (except for read entities) are only 
> available after query has been optimized, hence we will need to 1) generate 
> some of this data structures before query optimization, e.g., write entities, 
> and 2) create and propagate some of the properties so they are set during 
> query optimization, e.g., those properties contained in the file sink 
> descriptors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-16669) Fine tune Compaction to take advantage of Acid 2.0

2018-03-19 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405648#comment-16405648
 ] 

Eugene Koifman commented on HIVE-16669:
---

if compactor runs in a txn, it must write to min_history.  Suppose txnid:7 is 
A, txnid:70 is Compactor.  71 starts and sees 7 as A and 70 as O.  If 70 makes 
7 empty we need to make sure {{cleanEmptyAbortedTxns()}} doesn't remove 7's 
entry from TXNS since files produced by 70 are not visible to 71 and if 70 
reads older files, it will treat 7's data as committed (if 
{{cleanEmptyAbortedTxns()}} runs).



> Fine tune Compaction to take advantage of Acid 2.0
> --
>
> Key: HIVE-16669
> URL: https://issues.apache.org/jira/browse/HIVE-16669
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16669.wip.patch
>
>
> * There is little point using 2.0 vectorized reader since there is no 
> operator pipeline in compaction
> * If minor compaction just concats delete_delta files together, then the 2 
> stage compaction should always ensure that we have a limited number of Orc 
> readers to do the merging and current OrcRawRecordMerger should be fine
> * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18739) Add support for Export from Acid table

2018-03-19 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405285#comment-16405285
 ] 

Eugene Koifman commented on HIVE-18739:
---

https://reviews.apache.org/r/66148/

> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18739) Add support for Export from Acid table

2018-03-19 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Summary: Add support for Export from Acid table  (was: Add support for 
Export from unpartitioned Acid table)

> Add support for Export from Acid table
> --
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HIVE-18740) Add support for Export from partitioned Acid table

2018-03-19 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-18740.
---
Resolution: Won't Fix

was done as part of HIVE-18739

> Add support for Export from partitioned Acid table
> --
>
> Key: HIVE-18740
> URL: https://issues.apache.org/jira/browse/HIVE-18740
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> figure out how to translate (partial) partition spec from Export command into 
> a "where" clause



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-19 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.12.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-19 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405273#comment-16405273
 ] 

Eugene Koifman commented on HIVE-18739:
---

TestCommands.testNoopReplEximCommands failure is related

patch 12 address it

[~sershe] could you review please

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch, 
> HIVE-18739.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-19 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.11.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch, HIVE-18739.11.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-18 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.10.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch, HIVE-18739.10.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18978) ConditionalTask.addDependentTask(Task t) adds t in the wrong place

2018-03-16 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18978:
--
Description: 
{\{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
{noformat}
/**
* Add a dependent task on the current conditional task. The task will not be a 
direct child of
* conditional task. Actually it will be added as child task of associated tasks.
*
* @return true if the task got added false if it already existed
*/
@Override
public boolean addDependentTask(Task dependent) {
  boolean ret = false;
  if (getListTasks() != null) {
ret = true;
for (Task tsk : getListTasks()) {
  ret = ret & tsk.addDependentTask(dependent);
}
  }
  return ret;
}
{noformat}
So let’s say, the tasks in the ConditionalTask are A,B,C, but they have 
children.
{noformat}
CondTask
  |--A
 |--A1
|-A2
  |--B
 |--B1
  |--C
|--C1
{noformat}
The way ConditionalTask.addDependent() is implemented, MyTask becomes a sibling 
of A1,
 B1 and C1. So even if only 1 branch of ConditionalTask is executed (and 
parallel task
 execution is enabled), there is no guarantee (as I see) that MyTask runs after 
A2 or
 B1 or C1, which is really what is needed.

 

Once this is done add a .q file test that records a plan for Export from Acid: 
HIVE-18739

  was:
{\{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
{noformat}
/**
* Add a dependent task on the current conditional task. The task will not be a 
direct child of
* conditional task. Actually it will be added as child task of associated tasks.
*
* @return true if the task got added false if it already existed
*/
@Override
public boolean addDependentTask(Task dependent) {
  boolean ret = false;
  if (getListTasks() != null) {
ret = true;
for (Task tsk : getListTasks()) {
  ret = ret & tsk.addDependentTask(dependent);
}
  }
  return ret;
}
{noformat}
So let’s say, the tasks in the ConditionalTask are A,B,C, but they have 
children.
{noformat}
CondTask
  |--A
 |--A1
|-A2
  |--B
 |--B1
  |--C
|--C1
{noformat}
The way ConditionalTask.addDependent() is implemented, MyTask becomes a sibling 
of A1,
 B1 and C1. So even if only 1 branch of ConditionalTask is executed (and 
parallel task
 execution is enabled), there is no guarantee (as I see) that MyTask runs after 
A2 or
 B1 or C1, which is really what is needed.

 

Once this is done add a .q file test that records a plan for Export from Acid: 
HIVE-18978


> ConditionalTask.addDependentTask(Task t) adds t in the wrong place
> --
>
> Key: HIVE-18978
> URL: https://issues.apache.org/jira/browse/HIVE-18978
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {\{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
> {noformat}
> /**
> * Add a dependent task on the current conditional task. The task will not be 
> a direct child of
> * conditional task. Actually it will be added as child task of associated 
> tasks.
> *
> * @return true if the task got added false if it already existed
> */
> @Override
> public boolean addDependentTask(Task dependent) {
>   boolean ret = false;
>   if (getListTasks() != null) {
> ret = true;
> for (Task tsk : getListTasks()) {
>   ret = ret & tsk.addDependentTask(dependent);
> }
>   }
>   return ret;
> }
> {noformat}
> So let’s say, the tasks in the ConditionalTask are A,B,C, but they have 
> children.
> {noformat}
> CondTask
>   |--A
>  |--A1
> |-A2
>   |--B
>  |--B1
>   |--C
> |--C1
> {noformat}
> The way ConditionalTask.addDependent() is implemented, MyTask becomes a 
> sibling of A1,
>  B1 and C1. So even if only 1 branch of ConditionalTask is executed (and 
> parallel task
>  execution is enabled), there is no guarantee (as I see) that MyTask runs 
> after A2 or
>  B1 or C1, which is really what is needed.
>  
> Once this is done add a .q file test that records a plan for Export from 
> Acid: HIVE-18739



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-16 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.09.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch, 
> HIVE-18739.09.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18978) ConditionalTask.addDependentTask(Task t) adds t in the wrong place

2018-03-16 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18978:
--
Description: 
{\{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
{noformat}
/**
* Add a dependent task on the current conditional task. The task will not be a 
direct child of
* conditional task. Actually it will be added as child task of associated tasks.
*
* @return true if the task got added false if it already existed
*/
@Override
public boolean addDependentTask(Task dependent) {
  boolean ret = false;
  if (getListTasks() != null) {
ret = true;
for (Task tsk : getListTasks()) {
  ret = ret & tsk.addDependentTask(dependent);
}
  }
  return ret;
}
{noformat}
So let’s say, the tasks in the ConditionalTask are A,B,C, but they have 
children.
{noformat}
CondTask
  |--A
 |--A1
|-A2
  |--B
 |--B1
  |--C
|--C1
{noformat}
The way ConditionalTask.addDependent() is implemented, MyTask becomes a sibling 
of A1,
 B1 and C1. So even if only 1 branch of ConditionalTask is executed (and 
parallel task
 execution is enabled), there is no guarantee (as I see) that MyTask runs after 
A2 or
 B1 or C1, which is really what is needed.

 

Once this is done add a .q file test that records a plan for Export from Acid: 
HIVE-18978

  was:
{{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
{noformat}
/**
* Add a dependent task on the current conditional task. The task will not be a 
direct child of
* conditional task. Actually it will be added as child task of associated tasks.
*
* @return true if the task got added false if it already existed
*/
@Override
public boolean addDependentTask(Task dependent) {
  boolean ret = false;
  if (getListTasks() != null) {
ret = true;
for (Task tsk : getListTasks()) {
  ret = ret & tsk.addDependentTask(dependent);
}
  }
  return ret;
}
{noformat}


 So let’s say, the tasks in the ConditionalTask are A,B,C, but they have 
children.
{noformat}
CondTask
  |--A
 |--A1
|-A2
  |--B
 |--B1
  |--C
|--C1
{noformat}

 The way ConditionalTask.addDependent() is implemented, MyTask becomes a 
sibling of A1,
 B1 and C1.  So even if only 1 branch of ConditionalTask is executed (and 
parallel task
 execution is enabled), there is no guarantee (as I see) that MyTask runs 
after A2 or
 B1 or C1, which is really what is needed.



> ConditionalTask.addDependentTask(Task t) adds t in the wrong place
> --
>
> Key: HIVE-18978
> URL: https://issues.apache.org/jira/browse/HIVE-18978
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {\{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
> {noformat}
> /**
> * Add a dependent task on the current conditional task. The task will not be 
> a direct child of
> * conditional task. Actually it will be added as child task of associated 
> tasks.
> *
> * @return true if the task got added false if it already existed
> */
> @Override
> public boolean addDependentTask(Task dependent) {
>   boolean ret = false;
>   if (getListTasks() != null) {
> ret = true;
> for (Task tsk : getListTasks()) {
>   ret = ret & tsk.addDependentTask(dependent);
> }
>   }
>   return ret;
> }
> {noformat}
> So let’s say, the tasks in the ConditionalTask are A,B,C, but they have 
> children.
> {noformat}
> CondTask
>   |--A
>  |--A1
> |-A2
>   |--B
>  |--B1
>   |--C
> |--C1
> {noformat}
> The way ConditionalTask.addDependent() is implemented, MyTask becomes a 
> sibling of A1,
>  B1 and C1. So even if only 1 branch of ConditionalTask is executed (and 
> parallel task
>  execution is enabled), there is no guarantee (as I see) that MyTask runs 
> after A2 or
>  B1 or C1, which is really what is needed.
>  
> Once this is done add a .q file test that records a plan for Export from 
> Acid: HIVE-18978



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18978) ConditionalTask.addDependentTask(Task t) adds t in the wrong place

2018-03-16 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18978:
-

Assignee: Eugene Koifman

> ConditionalTask.addDependentTask(Task t) adds t in the wrong place
> --
>
> Key: HIVE-18978
> URL: https://issues.apache.org/jira/browse/HIVE-18978
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
> {noformat}
> /**
> * Add a dependent task on the current conditional task. The task will not be 
> a direct child of
> * conditional task. Actually it will be added as child task of associated 
> tasks.
> *
> * @return true if the task got added false if it already existed
> */
> @Override
> public boolean addDependentTask(Task dependent) {
>   boolean ret = false;
>   if (getListTasks() != null) {
> ret = true;
> for (Task tsk : getListTasks()) {
>   ret = ret & tsk.addDependentTask(dependent);
> }
>   }
>   return ret;
> }
> {noformat}
>  So let’s say, the tasks in the ConditionalTask are A,B,C, but they have 
> children.
> {noformat}
> CondTask
>   |--A
>  |--A1
> |-A2
>   |--B
>  |--B1
>   |--C
> |--C1
> {noformat}
>  The way ConditionalTask.addDependent() is implemented, MyTask becomes a 
> sibling of A1,
>  B1 and C1.  So even if only 1 branch of ConditionalTask is executed (and 
> parallel task
>  execution is enabled), there is no guarantee (as I see) that MyTask runs 
> after A2 or
>  B1 or C1, which is really what is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-15 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.08.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch, HIVE-18739.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18693) Snapshot Isolation does not work for Micromanaged table when a insert transaction is aborted

2018-03-15 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18693:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
 Release Note: n/a
   Status: Resolved  (was: Patch Available)

committed to master
thanks Steve for the contribution

> Snapshot Isolation does not work for Micromanaged table when a insert 
> transaction is aborted
> 
>
> Key: HIVE-18693
> URL: https://issues.apache.org/jira/browse/HIVE-18693
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18693.01.patch, HIVE-18693.02.patch, 
> HIVE-18693.03.patch, HIVE-18693.04.patch, HIVE-18693.05.patch, 
> HIVE-18693.06.patch
>
>
> TestTxnCommands2#writeBetweenWorkerAndCleaner with minor 
> changes (changing delete command to insert command) fails on MM table.
> Specifically the last SELECT commands returns wrong results. 
> But this test works fine with full ACID table. 
> ==
> MM table inserts were not making entries into TXN_COMPONENTS, thus the 
> {{txnHandler.cleanEmptyAbortedTxns();}} logic in {{Initiator}} can wipe out 
> TXNS entry for an aborted transaction before the relevant files on disk are 
> removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18693) Snapshot Isolation does not work for Micromanaged table when a insert transaction is aborted

2018-03-15 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18693:
--
Description: 
TestTxnCommands2#writeBetweenWorkerAndCleaner with minor 
changes (changing delete command to insert command) fails on MM table.

Specifically the last SELECT commands returns wrong results. 

But this test works fine with full ACID table. 
==
MM table inserts were not making entries into TXN_COMPONENTS, thus the 
{{txnHandler.cleanEmptyAbortedTxns();}} logic in {{Initiator}} can wipe out 
TXNS entry for an aborted transaction before the relevant files on disk are 
removed.

  was:
TestTxnCommands2#writeBetweenWorkerAndCleaner with minor 
changes (changing delete command to insert command) fails on MM table.

Specifically the last SELECT commands returns wrong results. 

But this test works fine with full ACID table. 


> Snapshot Isolation does not work for Micromanaged table when a insert 
> transaction is aborted
> 
>
> Key: HIVE-18693
> URL: https://issues.apache.org/jira/browse/HIVE-18693
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Attachments: HIVE-18693.01.patch, HIVE-18693.02.patch, 
> HIVE-18693.03.patch, HIVE-18693.04.patch, HIVE-18693.05.patch, 
> HIVE-18693.06.patch
>
>
> TestTxnCommands2#writeBetweenWorkerAndCleaner with minor 
> changes (changing delete command to insert command) fails on MM table.
> Specifically the last SELECT commands returns wrong results. 
> But this test works fine with full ACID table. 
> ==
> MM table inserts were not making entries into TXN_COMPONENTS, thus the 
> {{txnHandler.cleanEmptyAbortedTxns();}} logic in {{Initiator}} can wipe out 
> TXNS entry for an aborted transaction before the relevant files on disk are 
> removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18693) Snapshot Isolation does not work for Micromanaged table when a insert transaction is aborted

2018-03-15 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400748#comment-16400748
 ] 

Eugene Koifman commented on HIVE-18693:
---

+1

> Snapshot Isolation does not work for Micromanaged table when a insert 
> transaction is aborted
> 
>
> Key: HIVE-18693
> URL: https://issues.apache.org/jira/browse/HIVE-18693
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Attachments: HIVE-18693.01.patch, HIVE-18693.02.patch, 
> HIVE-18693.03.patch, HIVE-18693.04.patch, HIVE-18693.05.patch, 
> HIVE-18693.06.patch
>
>
> TestTxnCommands2#writeBetweenWorkerAndCleaner with minor 
> changes (changing delete command to insert command) fails on MM table.
> Specifically the last SELECT commands returns wrong results. 
> But this test works fine with full ACID table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18864) ValidWriteIdList snapshot seems incorrect if obtained after allocating writeId by current transaction.

2018-03-13 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397472#comment-16397472
 ] 

Eugene Koifman commented on HIVE-18864:
---

+1

> ValidWriteIdList snapshot seems incorrect if obtained after allocating 
> writeId by current transaction.
> --
>
> Key: HIVE-18864
> URL: https://issues.apache.org/jira/browse/HIVE-18864
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18864.01.patch, HIVE-18864.02.patch
>
>
> For multi-statement txns, it is possible that write on a table happens after 
> a read. Let's see the below scenario.
>  # Committed txn=9 writes on table T1 with writeId=5.
>  # Open txn=10. ValidTxnList(open:null, txn_HWM=10),
>  # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5).
>  # Open txn=11, writes on table T1 with writeid=6.
>  # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5).
>  # Write table T1 from txn=10 with writeId=7.
>  # Read table T1 from txn=10. {color:#d04437}*ValidWriteIdList(open:null, 
> write_HWM=7)*. – This read will able to see rows added by txn=11 which is 
> still open.{color}
> {color:#d04437}So, it is needed to rebuild the open/aborted list of 
> ValidWriteIdList based on txn_HWM. Any writeId allocated by txnId > txn_HWM 
> should be marked as open. In this example, *ValidWriteIdList(open:6, 
> write_HWM=7)* should be generated.{color}
> {color:#33}cc{color} [~ekoifman], [~thejas]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

< 7 8 9 10 11 12 13 14 15 16 >

1101 - 1200 of 4472 matches

Mail list logo