[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago.  It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.
[This is not possible currently since we only support autoCommit=true.  The 
reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
locks in the snapshot.  The cleaner won't delete anything for a given 
compaction (partition) if there are locks on it.  Thus for duration of the 
transaction, nothing will be deleted so it's safe to use base_5]


This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.



Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
the other


A more concrete example (autoCommit=true)  (given 2.1 codebase)
Suppose base_2 exists.
1. lock table T
2. Lock in the snapshot, txnid 3 open, hwm 5.
3. a long GC pause or more practically the query is submitted but there are no 
resources to start App Master which is where getAcidState() is called from.
4. getAcidState() is called

It's not unreasonable that during #3 txnid 3 commits and base_5 is produced and 
is seen in #4.




  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Attachment: HIVE-13369.branch-1.patch

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch, 
> HIVE-13369.6.patch, HIVE-13369.branch-1.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Status: Open  (was: Patch Available)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch, HIVE-13369.6.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Attachment: HIVE-13369.6.patch

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch, HIVE-13369.6.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Status: Patch Available  (was: Open)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch, HIVE-13369.6.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Status: Open  (was: Patch Available)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Attachment: HIVE-13369.5.patch

patch 5 fixes test failure caused by patch 4

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch, HIVE-13369.5.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Attachment: HIVE-13369.4.patch

patch 4 addressing [~owen.omalley]'s comments

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch, HIVE-13369.4.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Status: Patch Available  (was: Open)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Attachment: HIVE-13369.3.patch

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Status: Open  (was: Patch Available)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago.  It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.
[This is not possible currently since we only support autoCommit=true.  The 
reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
locks in the snapshot.  The cleaner won't delete anything for a given 
compaction (partition) if there are locks on it.  Thus for duration of the 
transaction, nothing will be deleted so it's safe to use base_5]


This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.



Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
the other

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago.  It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.
[This is not possible currently since we only support autoCommit=true.  The 
reason is the a query (0) opens txn (if appropriate), (1) acquires locks (2) 
locks in the snapshot.  The cleaner won't delete anything for a given 
compaction (partition) if there are locks on it.  Thus for duration of the 
transaction, nothing will be deleted so it's safe to use base_5]


This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.



Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
the other

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago.  It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.
[This is not possible currently since we only support autoCommit=true.  The 
reason is the a query (1) acquires locks (2) locks in the snapshot.  The 
cleaner won't delete anything for a given compaction (partition) if there are 
locks on it.  Thus for duration of the transaction, nothing will be deleted so 
it's safe to use base_5]


This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.



Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
the other

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Target Version/s: 1.3.0, 2.2.0, 2.1.1  (was: 1.3.0, 2.2.0, 2.1.1, 2.0.2)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago or (at least in theory) by 
> autoCommit one due to long GC pause for example.   It acquires locks after 
> the Cleaner checks LM state and calls getAcidState(). This request will 
> choose base_5 but it won't see delta_16_16 and delta_17_17 and thus return 
> the snapshot w/o modifications made by those txns.
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago or (at least in theory) by autoCommit 
one due to long GC pause for example.   It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.

This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.



Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
the other

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago or (at least in theory) by autoCommit 
one due to long GC pause for example.   It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.

This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.


  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  


> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago or (at least in theory) by autoCommit 
one due to long GC pause for example.   It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.

This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.


  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to aborted the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  


> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-01 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13369:
-
Attachment: HIVE-13369.2.patch

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-06-27 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13369:
-
Status: Patch Available  (was: Open)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-13369.1.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-06-27 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13369:
-
Attachment: HIVE-13369.1.patch

Patch 1 is about the first part in the problem description. It also reduce the 
frequency of AcidOpenTxnsCounterService logging.

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-13369.1.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-05-31 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13369:
---
Target Version/s: 1.3.0, 2.2.0, 2.1.1, 2.0.2  (was: 1.3.0, 2.0.0, 2.1.0, 
2.2.0)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-05-31 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13369:
---
Target Version/s: 2.0.0, 1.3.0, 2.1.0, 2.2.0  (was: 1.3.0, 2.0.0, 2.2.0)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-05-31 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13369:
---
Target Version/s: 2.0.0, 1.3.0, 2.2.0  (was: 1.3.0, 2.0.0, 2.1.0)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-05-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Target Version/s: 2.0.0, 1.3.0, 2.1.0  (was: 1.3.0, 2.0.0)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-05-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Assignee: Wei Zheng  (was: Eugene Koifman)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-04-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to aborted the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.


> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to aborted the txn.  This can't 
> (reasonably) happen with auto commit mode, but with multi statement txns it's 
> possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-03-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   * preserve the history, we can't use a base directory that includes a
   * transaction id that we must exclude."

which is correct but there is nothing in the code that does this.


> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)