[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-04-02 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch, HIVE-18747.05.patch, 
> HIVE-18747.06.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-04-01 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Patch Available  (was: Open)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch, HIVE-18747.05.patch, 
> HIVE-18747.06.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-04-01 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Attachment: HIVE-18747.06.patch

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch, HIVE-18747.05.patch, 
> HIVE-18747.06.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-04-01 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Open  (was: Patch Available)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch, HIVE-18747.05.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-31 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Patch Available  (was: Open)

Attached 05.patch with
 * Rebase against master
 * Bug fix where writeIdHwm is set to 0 if no entries in TXN_TO_WRITE_ID table.

Request [~ekoifman] to take a look!

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch, HIVE-18747.05.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-31 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Attachment: HIVE-18747.05.patch

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch, HIVE-18747.05.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-31 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Open  (was: Patch Available)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch, HIVE-18747.05.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-30 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Patch Available  (was: Open)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-30 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Attachment: HIVE-18747.04.patch

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch, HIVE-18747.04.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-30 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Open  (was: Patch Available)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-27 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Attachment: HIVE-18747.03.patch

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-27 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Patch Available  (was: Open)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-27 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Attachment: (was: HIVE-18747.03.patch)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-27 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Open  (was: Patch Available)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-27 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Patch Available  (was: Open)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-27 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Attachment: HIVE-18747.03.patch

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch, 
> HIVE-18747.03.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-27 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Open  (was: Patch Available)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-22 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Patch Available  (was: Open)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-22 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Attachment: HIVE-18747.02.patch

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch, HIVE-18747.02.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-22 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Open  (was: Patch Available)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
> There are 2 parts to it. When txn X is opened, it records Y=select 
> min(txn_id) from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) 
> table, i.e. it adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it 
> removes its own entry from MIN_HISTORY. In the absence of Aborted 
> transactions, MIN_HISTORY gives us the smallest open txnid across all active 
> reader snapshots.  Let Z=select min(opentxnid) from MIN_HISTORY. We can 
> delete entries from TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since 
> every active reader sees txns < Z as committed.
> If S is aborted txns, we retain the metadata about it in TXNS as long as any 
> data written S may be visible to some reader in the system so that the reader 
> knows to skip this data.  The rules for when that is are complex but wrt to 
> TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
> it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
> min(Z,A).  
> If no open or aborted txns exist in the system, then we need to enable 
> cleanup using latest allocated value of NEXT_TXN_ID table. Delete condition 
> would be TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  
> Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
> immediately after cleaning up aborted txns metadata from TXNS table.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-22 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Description: 
Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
and table write ID in TXN_TO_WRITE_ID meta table. 

The entries in this table is used to generate ValidWriteIdList for the given 
ValidTxnList to ensure snapshot isolation. 

When table or database is dropped, then these entries are cleaned-up. But, it 
is necessary to clean-up for active tables too for better performance.

TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The state 
of each Write ID (open, committed, aborted) is determined by the state of the 
parent transaction.  In order to be able to get a WriteIdList that is accurate 
wrt ValidTxnList that is locked in at the start of the transaction, we have to 
retain txnid<->writeid mapping even after the transaction ends. This is because 
a reader at Snapshot Isolation that started when transaction X was open, should 
continue to ignore the data written by X even after X commits.

So we need a mechanism to know when it is safe to remove TXN_TO_WRITE_ID.  
There are 2 parts to it. When txn X is opened, it records Y=select min(txn_id) 
from TXNS where txn_state=’o’ in MIN_HISTORY(txnid,opentxnid) table, i.e. it 
adds (X, Y) to MIN_HISTORY.  On commit (and abort) of X, it removes its own 
entry from MIN_HISTORY. In the absence of Aborted transactions, MIN_HISTORY 
gives us the smallest open txnid across all active reader snapshots.  Let 
Z=select min(opentxnid) from MIN_HISTORY. We can delete entries from 
TXN_TO_WRITE_ID once TXN_TO_WRITE_ID.T2W_TXNID < Z since every active reader 
sees txns < Z as committed.

If S is aborted txns, we retain the metadata about it in TXNS as long as any 
data written S may be visible to some reader in the system so that the reader 
knows to skip this data.  The rules for when that is are complex but wrt to 
TXN_TO_WRITE_ID, if A=select min(TXN_ID) from TXNS where TXN_STATE=’a’, then 
it’s safe to delete from TXN_TO_WRITE_ID when TXN_TO_WRITE_ID.T2W_TXNID < 
min(Z,A).  

If no open or aborted txns exist in the system, then we need to enable cleanup 
using latest allocated value of NEXT_TXN_ID table. Delete condition would be 
TXN_TO_WRITE_ID.T2W_TXNID < min(Z,A,NEXT_TXN_ID.ntxn_next).  

Also, it is proposed to trigger cleanup on TXN_TO_WRITE_ID from initiator 
immediately after cleaning up aborted txns metadata from TXNS table.

 

  was:
Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
and table write ID in TXN_TO_WRITE_ID meta table. 

The entries in this table is used to generate ValidWriteIdList for the given 
ValidTxnList to ensure snapshot isolation. 

When table or database is dropped, then these entries are cleaned-up. But, it 
is necessary to clean-up for active tables too for better performance.

Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which is 
referred by any active ValidTxnList snapshot as open/aborted txn. If no 
references found in this table for any txn, then it is eligible for cleanup.

After clean-up, need to maintain just one entry (highest txn <= 
min_uncommitted_txn) per table to mark as LWM (low water mark).


> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> TXN_TO_WRITE_ID table keeps a mapping of Transaction ID to Write ID.  The 
> state of each Write ID (open, committed, aborted) is determined by the state 
> of the parent transaction.  In order to be able to get a WriteIdList that is 
> accurate wrt ValidTxnList that is locked in at the start of the transaction, 
> we have to retain txnid<->writeid mapping even after the transaction ends. 
> This is because a reader at Snapshot Isolation that started when transaction 
> X was open, should continue to ignore the data written by X even after X 
> commits.
> So we need a mechanism to know when it is safe to 

[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-18747:
--
Labels: ACID pull-request-available  (was: ACID)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which 
> is referred by any active ValidTxnList snapshot as open/aborted txn. If no 
> references found in this table for any txn, then it is eligible for cleanup.
> After clean-up, need to maintain just one entry (highest txn <= 
> min_uncommitted_txn) per table to mark as LWM (low water mark).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Status: Patch Available  (was: Open)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which 
> is referred by any active ValidTxnList snapshot as open/aborted txn. If no 
> references found in this table for any txn, then it is eligible for cleanup.
> After clean-up, need to maintain just one entry (highest txn <= 
> min_uncommitted_txn) per table to mark as LWM (low water mark).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Description: 
Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
and table write ID in TXN_TO_WRITE_ID meta table. 

The entries in this table is used to generate ValidWriteIdList for the given 
ValidTxnList to ensure snapshot isolation. 

When table or database is dropped, then these entries are cleaned-up. But, it 
is necessary to clean-up for active tables too for better performance.

Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which is 
referred by any active ValidTxnList snapshot as open/aborted txn. If no 
references found in this table for any txn, then it is eligible for cleanup.

After clean-up, need to maintain just one entry (highest txn <= 
min_uncommitted_txn) per table to mark as LWM (low water mark).

  was:
Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
and table write ID in TXN_TO_WRITE_ID meta table. 

The entries in this table is used to generate ValidWriteIdList for the given 
ValidTxnList to ensure snapshot isolation. 

When table or database is dropped, then these entries are cleaned-up. But, it 
is necessary to clean-up for active tables too for better performance.

Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which is 
referred by any active ValidTxnList snapshot as open/aborted txn. If no 
references found in this table for any txn, then it is eligible for cleanup.

After clean-up, need to maintain just one entry (highest committed txn) per 
table to mark as LWM (low water mark).


> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which 
> is referred by any active ValidTxnList snapshot as open/aborted txn. If no 
> references found in this table for any txn, then it is eligible for cleanup.
> After clean-up, need to maintain just one entry (highest txn <= 
> min_uncommitted_txn) per table to mark as LWM (low water mark).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Attachment: HIVE-18747.01.patch

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID
> Fix For: 3.0.0
>
> Attachments: HIVE-18747.01.patch
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which 
> is referred by any active ValidTxnList snapshot as open/aborted txn. If no 
> references found in this table for any txn, then it is eligible for cleanup.
> After clean-up, need to maintain just one entry (highest committed txn) per 
> table to mark as LWM (low water mark).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.

2018-03-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18747:

Summary: Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL. 
 (was: Cleaner for TXN_TO_WRITE_ID table entries/MIN_HISTORY_LEVEL.)

> Cleaner for TXN_TO_WRITE_ID table entries using MIN_HISTORY_LEVEL.
> --
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID
> Fix For: 3.0.0
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which 
> is referred by any active ValidTxnList snapshot as open/aborted txn. If no 
> references found in this table for any txn, then it is eligible for cleanup.
> After clean-up, need to maintain just one entry (highest committed txn) per 
> table to mark as LWM (low water mark).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)