[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344578#comment-16344578
 ] 

Appy commented on HBASE-17852:
--

bq. My intent was not to squash your opinions, but to avoid being blocked if 
you were not interested/busy as seemed might be the case.
That's reasonable. Sorry for the delay on my part. 

[~elserj] i believe you. Everyone makes mistakes from time-to-time, i'm certain 
i must have done too. Always happy with "acknowledge, learn, and move past 
them" way. All's good (between us two).


> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1631#comment-1631
 ] 

Josh Elser commented on HBASE-17852:


bq. I'm not messing with you, Appy. Check the push logs/comments on the other 
JIRA issue.. I swear to you that I did not push this until after I heard back 
from you. My guess is that this is due to me using git-am or cherry picking a 
commit from a local branch.

My apologies, Appy. I am wrong. I apparently got impatient and pushed this 
because there was silence from the Dec 6th mention and the Jan 12th re-ping. My 
intent was not to squash your opinions, but to avoid being blocked if you were 
not interested/busy as seemed might be the case.

If you have since changed your mind about the reduced patch hitting master, my 
offer to revert stands. My apologies again for arguing with you while in the 
wrong.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344425#comment-16344425
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

Hmm, did I insult someone savagely, [~elserj]?  

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344416#comment-16344416
 ] 

Josh Elser commented on HBASE-17852:


[~vrodionov] that's also plenty shit-slinging from you too on the matter. 
Thanks.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344407#comment-16344407
 ] 

Josh Elser commented on HBASE-17852:


Also, again, if you want this reverted, please say so.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344404#comment-16344404
 ] 

Josh Elser commented on HBASE-17852:


bq. I did say it was okay to go in master, but that's like 4 days after the 
commit - 2018-01-16T12:46:19-0800

I'm not messing with you, [~appy]. Check the push logs/comments on the other 
JIRA issue.. I swear to you that I did not push this until after I heard back 
from you. My guess is that this is due to me using git-am or cherry picking a 
commit from a local branch.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344380#comment-16344380
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
that's precisely the reason why i can't trust you.
{quote}

You can start discussion about trust and respect in HBase community and I 
assure you I have a lot to say about.  


> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344359#comment-16344359
 ] 

Appy commented on HBASE-17852:
--

If your mental radar doesn't tick-off in loud red alarms between the time of 
choosing the second approach and getting someone to commit it, that's precisely 
the reason why i can't trust you.


> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344341#comment-16344341
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}

I did say it was okay to go in master, but that's like 4 days after the commit 
- 2018-01-16T12:46:19-0800

{quote}

OK, there was an issue found during QA testing - HBASE-19568. It turned out 
that HBASE-17852 fixes the issue. Let us say I have had two options:
 # Find out which part of HBASE-17852 fixes the issue and create smaller 
HBASE-19568- specific patch
 # Apply HBASE-17852 patch directly (with some refactoring part stripped down)  
    
  
So I have chosen the latter one. Reasons: time, time, time. We can revert 
HBASE-19568 back if there are so many objections. 


> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344327#comment-16344327
 ] 

Appy commented on HBASE-17852:
--

Commit date is 12th jan
{noformat}
commit a5601c8eac6bfcac7d869574547f505d44e49065
Author: Vladimir Rodionov 
AuthorDate: Wed Jan 10 16:26:09 2018 -0800
Commit: Josh Elser 
CommitDate: Fri Jan 12 13:13:17 2018 -0500

HBASE-19568: Restore of HBase table using incremental backup doesn't 
restore rows from an earlier incremental backup

Signed-off-by: Josh Elser 
{noformat}

I did say it was okay to go in master, but that's like 4 days after the commit 
- 2018-01-16T12:46:19-0800

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344314#comment-16344314
 ] 

Appy commented on HBASE-17852:
--

Okay, f**k it, I really don't want to waste anymore of my time fighting some 
fight. It's obvious from events what happened here, and that it shouldn't have 
- makes me very sad and angry.
I leave its further handling to PMC.
At the very least, someone lost my basic trust and respect.




> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344310#comment-16344310
 ] 

Josh Elser commented on HBASE-17852:


bq. I'll can't believe that because I can't believe that..

 [~appy], truly, boss, if you weren't giving your blessing on the fix going 
into master, say so and I'll revert it when next at a computer. I was operating 
under the assumption that we had time to address design and not look 
gift-contribtuion(horses) in the mouth.

The rest of this is a product of some heavy-handedness about the busted Yetus 
after the JIRA upgrade.

Not trying to tell you something different than what you think happened, did. 
Trying to express that I thought you were ok with this plan against master (not 
branch-2).

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344307#comment-16344307
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}

Wasn't the said patch objected against committing by multiple members of the 
community?

{quote}

 

Calm down,  [~appy]. We are not doing anything criminal here. The result of 
these two patches is what you have agreed on personally :

https://issues.apache.org/jira/browse/HBASE-17852?focusedCommentId=16327774=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16327774

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344302#comment-16344302
 ] 

stack commented on HBASE-17852:
---

{quote}The majority of this code (but not all) went into master in HBASE-19568 
btw.
{quote}
The majority of 'HBASE-17852 Add Fault tolerance to HBASE-14417 (Support bulk 
loaded files in incremental backup)', a contentious issue, went into another 
commit named 'HBASE-19568 Restore of HBase table using incremental backup 
doesn't restore rows from an earlier incremental backup' with no outline of 
what made it and what did not, and no changeset explaination. There is no 
release note. The two JIRAs are not even linked.
{quote}Nope, it turned out that this patch (HBASE-17852) also fixes the issue 
raised in HBASE-19568, that is why it was committed (with refactoring code 
stripped down). No conspiracy here.   
{quote}
But hang on, now the patch here on 'fault tolerance' fixes issues over in the 
'restore rows' issue, -HBASE-19568?-

I can see how [~appy] might arrive at his assessment.

On the 'declarations', the first offers options free of context or explanation.

This one I find interesting:

 # Use procedure framework:  Short answer - no. I will wait until procv2 
becomes more mature and robust. I do not want to build new feature on a 
foundation of a new feature. Too risky in my opinion. NO

when we are talking about a hbase3 (possibly) feature and when there is no 
alternative.

Anyway, keeping it short.

 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344287#comment-16344287
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}

he had random urge to delete all previous 9 patches from this jira

{quote}

No conspiracy here as well. I was not able to submit patch v10 due to some 
Apache Jira issues and had to remove all previous patches to be able to submit 
v10. 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344272#comment-16344272
 ] 

Appy commented on HBASE-17852:
--

bq. There was nothing malicious intending to happen here
I'll can't believe that because I can't believe that
-  he started fixing the other jira from clean slate and somehow mysteriously 
ended up with exact same diff as was here, and which we all were against.
- he had random urge to delete all previous 9 patches from this jira, but not 
from phase1 jira HBASE-14030 or phase2 jira HBASE-14123, which both have like 
40 patches each


> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344258#comment-16344258
 ] 

Appy commented on HBASE-17852:
--

bq. Nope, it turned out that this patch (HBASE-17852) also fixes the issue 
raised in HBASE-19568, that is why it was committed (with refactoring code 
stripped down).
Not a justification!
Did you not use the patch in this jira to fix HBASE-19568?
Wasn't the said patch objected against committing by multiple members of the 
community?
Did you brought to anyone's attention, who raised the objections 
(me/stack/andrew/[~mdrob]), the fact that you were committing these changes.

bq. No conspiracy here.  Besides this, I thought that we have agreed on pushing 
this to the master branch and continue working on a critical changes after 
that? 
You really think that'd work? People can match timestamps, you committed 4 days 
before i even replied back!


> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344231#comment-16344231
 ] 

Josh Elser commented on HBASE-17852:


{quote}
HBASE-19568 had basically everything that was objected in the reviews here, why 
wasn't it brought to the attention of people who raised objections? The 
title/reason of that jira reason doesn't matter.
I see it as a really sly move - going behind community and committed changes 
which were heavily objected against, by using separate jira.
{quote}

[~appy], let's take a step back, please. I called this out to your attention -- 
I was under the impression that, based on your earlier comment 
([here|https://issues.apache.org/jira/browse/HBASE-17852?focusedCommentId=16327774=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16327774])
 that you were OK of this implementation landing in master as-is.

HBASE-19568 was used to commit to master (with what I thought was your 
blessing) while we continue to use this JIRA issue to flesh out design because 
of all of the discussion that has happened. If I misunderstood you or poorly 
asked you the question, let's take that over to HBASE-19568 and get a revert in 
place. There was nothing malicious intending to happen here.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344228#comment-16344228
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

Nope, it turned out that this patch (HBASE-17852) also fixes the issue raised 
in HBASE-19568, that is why it was committed (with refactoring code stripped 
down). No conspiracy here.   

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344219#comment-16344219
 ] 

Appy commented on HBASE-17852:
--

Forget all the design discussion, that's not important anymore.

HBASE-19568 had basically everything that was objected in the reviews here, why 
wasn't it brought to the attention of people who raised objections?  The 
title/reason of that jira reason doesn't matter.
I see it as a really sly move - going behind community and committed changes 
which were heavily objected against, by using separate jira.

Ping reviewers of other jira: [~elserj] [~tedyu] 
Ping [~stack] [~apurtell]

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344163#comment-16344163
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

I will quote myself

{quote}

I will rebase patch to the current master. The majority of this code (but not 
all) went into master in HBASE-19568 btw.

{quote}

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-29 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344160#comment-16344160
 ] 

Appy commented on HBASE-17852:
--

I see only patch v10 in the attached files, and all it's doing is changing name 
of BackupSystemTable to BackupMetaTable. It's far from what the title says - 
"Add Fault Tolerance". What am i missing?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-25 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339615#comment-16339615
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

[~appy] we have fully functional module already, but you suggest rewriting 
20%-40% of code.  That is why my response is so strong. As for procv2, I have 
heard a lot from other developers who worked on procv2-related  bugs. 

Backup is not like table create, truncate, split etc - it is in its own league. 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-25 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339026#comment-16339026
 ] 

Appy commented on HBASE-17852:
--

Man(lightly shaking head side-to-side)...such strong responses when we are 
trying to scope out needed work/design changes for a better B in 2.1. Please 
work with me here..smile.

Why do you believe procv2 is new feature? It's being used for core HBase 
functionality - create, delete tables, etc since 1.2 release.
What would make it mature & robust enough for B in your opinion?


> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-24 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338635#comment-16338635
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

[~appy] wrote:

{quote}
Of the top of my head, I think the main areas to touch upon are:
 - Make backups concurrent
 - Use procedure framework: Long-standing request. The procv2 framework has 
features like locking, queuing operations, etc. Replication is already moving 
to it. I don't see a reason why backup can't too.
 - Can't use CP hooks for incremental backup. Backup should/will become first 
class feature - more important and critical than Coprocessor.
 - There should be some basic access control, if only, limiting everything to 
ADMIN (like RS group recently did in HBASE-19483)

{quote}

OK, 
h4. Concurrent backups

It is doable, but ...
 # Will require transaction management support - it complicates implementations 
a lot. We will need to provide full isolation of operations and complex 
conflict resolutions on commit. And rollback?
 # Complicates testing, as well - a lot. Imagine all different possible 
collisions between create, merge, delete sessions

What I suggest is a slightly different approach:
 # Make restore operations concurrent
 # Implement fair queuing for *create-merge-delete* sessions
 # *create-merge-restore* executions will be serialized (one-by-one), but from 
user's point of view they will run, kind of, in parallel. 

YES/NO
h4. Use procedure framework 

Short answer - no. I will wait until procv2 becomes more mature and robust. I 
do not want to build new feature on a foundation of a new feature. Too risky in 
my opinion. NO
h4. Can't use CP hooks for incremental backup

Currently backup lives in a separate module and we would like to keep it there. 
There is no need for the tight integration of a HBase core and backup and 
therefore, CP is the only our option here. NO
h4. Access control

Currently, only ADMIN can run backups/restore/delete/merge operations, but we 
do not enforce this explicitly, so we should probably, do the access right 
check *before* starting critical operation. YES.

 

[~appy], [~elserj] - comments?

 

 

 

 

 

 

 

 

 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-23 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336001#comment-16336001
 ] 

Josh Elser commented on HBASE-17852:


{quote}I'm fine with this landing in master.
 I'll try to take a thorough look at the code after 2.0 release (If i miss 
that, i'll consider myself ineligible for casting any +/- 1).
{quote}
Thanks Appy. Your input is appreciated. I think the direction you're proposing 
makes sense, but it might be premature to push this forward right now. I've 
been seeing some funkiness in branch-2 work around procv2. Letting it burn in 
on branch-2 first is probably a good idea. I'm glad we can help Vlad move 
forward now and revisit this later.

I'm +1 on this one. Committing it now to master.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v10.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331092#comment-16331092
 ] 

Hadoop QA commented on HBASE-17852:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  4m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
27s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} hbase-backup: The patch generated 0 new + 162 
unchanged - 2 fixed = 162 total (was 164) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} The patch hbase-it passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
57s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 53s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m  
8s{color} | {color:green} hbase-backup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-17852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906516/HBASE-17852-v10.patch 
|
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 550f5f3def54 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 09ffbb5b68 |
| maven | version: Apache Maven 3.5.2 

[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329767#comment-16329767
 ] 

Hadoop QA commented on HBASE-17852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HBASE-17852 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.6.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-17852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12887458/HBASE-17852-v1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11096/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v10.patch, 
> HBASE-17852-v2.patch, HBASE-17852-v3.patch, HBASE-17852-v4.patch, 
> HBASE-17852-v5.patch, HBASE-17852-v6.patch, HBASE-17852-v7.patch, 
> HBASE-17852-v8.patch, HBASE-17852-v9.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329648#comment-16329648
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}

For now, what do you think are the biggest blockers for making procv2 + backup 
happen [~vrodionov]?

{quote}

If we could do procv2 implementation w/o getting into server <- backup 
dependency, then no blockers.  

 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329642#comment-16329642
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

I will rebase patch to the current master. The majority of this code (but not 
all) went into master in HBASE-19568 btw.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329597#comment-16329597
 ] 

Hadoop QA commented on HBASE-17852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HBASE-17852 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.6.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-17852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12887458/HBASE-17852-v1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11094/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329577#comment-16329577
 ] 

Appy commented on HBASE-17852:
--

I said hbase-backup --> hbase-server above because backup needs snapshot. Our 
dependencies are in a state of orgy right now, otherwise following would have 
been perfect shape to be in.
 !screenshot-1.png! 
That said, we should still be able to do procv2+backup without all the other 
refactoring.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch, screenshot-1.png
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329538#comment-16329538
 ] 

Appy commented on HBASE-17852:
--

Adding more, so the likely dependencies will end up being:
hbase-backup --> hbase-server
hbase-backup --> hbase-procedure

B's functionalities will be implementations of 
Procedure/StateMachineProcedure and use 
masterServices.getMasterProcedureExecutor().submitProcedure() to get stuff 
done. I do see some deps issues, but we can come with solutions. One thing we 
should definitely try to stay away from is, merging the code back in 
hbase-server module.
For now, what do you think are the biggest blockers for making procv2 + backup 
happen [~vrodionov]? You're right, we should definitely discuss concrete 
design/problems/solutions before staring with the refactoring. Can help with 
design review. 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329494#comment-16329494
 ] 

Appy commented on HBASE-17852:
--

Replication is doing it, but it's already in hbase-server module so it's 
definitely not the ideal example. But I think its possible to do procv2 + 
backup without tight integration with hbase-server i.e. while keeping things in 
separate module. Won't be surprised if it requires some refactoring/small 
design improvements in proc2 code itself, but that'll be all for good. Maybe 
backup module become the poster face for "Building features with procv2" and we 
make replication do the same.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329466#comment-16329466
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

So, we are returning back to procV2 and tight integration with hbase-server? 
[~appy], we used to have this before, but had to move everything from 
hbase-server more than a year ago by request from [~stack]. Therefore, I need 
[~stack] +1 on this plan before I start working on refactoring again.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-16 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327774#comment-16327774
 ] 

Appy commented on HBASE-17852:
--

I'm fine with this landing in master.
I'll try to take a thorough look at the code after 2.0 release (If i miss that, 
i'll consider myself ineligible for casting any +/- 1).
Of the top of my head, I think the main areas to touch upon are:
- Make backups concurrent
- Use procedure framework: Long-standing request. The procv2 framework has 
features like locking, queuing operations, etc. Replication is already moving 
to it. I don't see a reason why backup can't too.
- Can't use CP hooks for incremental backup. Backup should/will become first 
class feature - more important and critical than Coprocessor.
- There should be some basic access control, if only, limiting everything to 
ADMIN (like RS group recently did in HBASE-19483)

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-12 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324439#comment-16324439
 ] 

Josh Elser commented on HBASE-17852:


bq. I'd prefer to see this land in master, then we take the concept back to the 
drawing board and, with all of your help, we revisit this and come up with a 
design and implementation that works for concurrent backup sessions (as Vlad 
has this on the Phase4 roadmap already).

Ping [~appy].

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-06 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280521#comment-16280521
 ] 

Josh Elser commented on HBASE-17852:


bq. Yea, retry would be good. File a JIRA?

HBASE-19441. Tagged in the "Phase 4" umbrella.

{quote}
So, I'd say, there are many things implicitly broken with current design.

Strong -1 on shipping it unless they are fixed.
{quote}

[~appy], while I appreciate the keen eye you're applying here, how can we move 
forward here? I know it's very frustrating for Vlad to have something that he's 
already built+tested be taken back to the drawing board abruptly. Do you truly 
feel like this feature is harmful as compared to what the current 
implementation is?

I'd prefer to see this land in master, then we take the concept back to the 
drawing board and, with all of your help, we revisit this and come up with a 
design and implementation that works for concurrent backup sessions (as Vlad 
has this on the Phase4 roadmap already).

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-01 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275203#comment-16275203
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

[~appy],

# Only Admin user can run backups, therefore, there is no need to run multiple 
backups in parallel. Admin can run them in a single backup command.
# Restore can be run in parallel with other commands. That is artificial 
limitation and can be removed easily. It means Admin can run backups session 
and multiple restore sessions in parallel. I personally, do not see or 
anticipate strong request to allow multiple backup sessions in parallel. I 
advise you to go through doc and you fill find and easy to work-around parallel 
sessions by combining them into single one, [~appy]
# There is no issues with cross - RPC in backup case, because RPC call is a 
single hop and, hence, deadlock - free
# Failure of BackupObserver to record bulk loaded file with result in bulk load 
failure - yes. *But I do not see an alternative here*, do you? We need to 
record *all bulk loaded file names and store them persistently before bulk load 
operation completes*. Do you have an idea, how can this be achieved, w/o 
failing bulk load itself and w/o touching hbase core code? 

The only thing I agree here is support for parallel deletes, merges and if we 
will introduce this support we can easily add multiple backup session support 
for free.

I personally, was very impressed by you, guys, you spent so much time looking 
for design and implementation flaws, when time was running out literally, 
during this week. Good job. Why haven't you done this couple months before? 




> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-01 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275107#comment-16275107
 ] 

Appy commented on HBASE-17852:
--

bq. To try to move the conversation forward, I tend to agree with Vlad that I 
don't seen an inherent problem with the rollback-via-snapshot implementation

The inherent problem with rollback-via-snapshot approach is - one operation is 
taking "exclusive lock" on the backup meta table, and that too in a very weird 
way.
It's weird because:
1) It behaves like exclusive lock in certain cases. (We only restore on 
failure, i.e. exclusion kicks in only on failures. That leads to waterfall of 
issues mentioned below.)
2) Some other operations on that table are following "exclusion" semantics (via 
locking a row), while others not.

As a result of which we see so many problems:
1) Different table for incremental backup data: The problem is not that there's 
a different table, that's fine, but the reason which led to it.
2) You can't run any other command in parallel! No restores (data loss, 
services are down, everything is on fire, oh but there's a cron job taking 
backup, so i can't do zilch!?), no merges, no deletes (prod cluster, running 
out of space, i have to wait for backup before i can free up space?). That's 
just absurd.
3) Other successful commands are rolled back silently. If an operator 
add/remove/delete sets, they are gone if a totally different thing fails!
4) During restore, backup table goes offline, cron job attempts backup and 
fails.

Others:
- And then the issues around cross RS RPC from observer during bulk load. Was 
the alternative suggested yesterday considered in the design? Was there any 
alternative that was considered?
- (Ref: Bulk loads) Backups are very important. But more important is user 
being able to load their data and use it. Preventing user to work with their 
data by putting backup in load path and failing everything if backup doesn't 
work is plain wrong. Find a different way to backup bulk load data without 
affecting core read/write paths.

So, I'd say, there are many things implicitly broken with current design.

Strong -1 on shipping it unless they are fixed.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-01 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274762#comment-16274762
 ] 

Mike Drob commented on HBASE-17852:
---

bq. This is much easier to fix, than concurrent backup sessions support, 
because restore does not access meta table.
Restore doesn't need to update the set of backup files (to remove references to 
no longer referenced files?) If we backup, add data, incremental backup, add 
data, restore to first backup, add data, incremental backup this will all work 
correctly without the Restore having needed to update any backup state? Where 
do I look for how this works?

bq. No, this is client -side operation. Can someone queue hbck?
It's client-side... kind of. We're encouraging folks to automate these 
operations, comparing to hbck isn't the same.

bq. "manual cleanup" is only running the hbase backup repair command. I don't 
feel like that is too onerous and goes back to my original feelings (acceptable 
limitation to get this in the hands of users).
Yea, this is probably ok. I thought we still had a pretty hairy situation here.

bq. Specifically, the client does a checkAndPut to specifics coordinates in the 
backup table and throws an exception when that fails. Remember that backups are 
client driven (per some design review from a long time ago), so queuing is 
tough to reason about (we have no "centralized" execution system to use). At a 
glance, it seems pretty straightforward to add some retry/backoff semantics to 
BackupSystemTable#startBackupExclusiveOperation(). Isn't exactly a "queue", but 
it would ease the pain you allude to.
Yea, retry would be good. File a JIRA?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-01 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274744#comment-16274744
 ] 

Josh Elser commented on HBASE-17852:


bq. The current options appear to be wait until the backup finishes (maybe ok, 
depending on sizes/bandwidth/etc...) or cancel the nightly backup (very bad, 
especially if we have to do manual cleanup of things).

"manual cleanup" is only running the {{hbase backup repair}} command. I don't 
feel like that is too onerous and goes back to my original feelings (acceptable 
limitation to get this in the hands of users).

bq. Can an admin queue sessions? That would help the user experience quite a 
bit until we get parallel sessions. (Not that I'm suggesting that this is 
either necessary or sufficient; I would much rather see effort towards a proper 
solution rather than temporary workaround after workaround, but queued 
operations may be useful in other contexts.)

Specifically, the client does a checkAndPut to specifics coordinates in the 
backup table and throws an exception when that fails. Remember that backups are 
client driven (per some design review from a long time ago), so queuing is 
tough to reason about (we have no "centralized" execution system to use). At a 
glance, it seems pretty straightforward to add some retry/backoff semantics to 
{{BackupSystemTable#startBackupExclusiveOperation()}}. Isn't exactly a "queue", 
but it would ease the pain you allude to.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-01 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274742#comment-16274742
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
The problem is that backups and restores cannot occur simultaneously.
{quote}
This is much easier to fix, than concurrent backup sessions support, because 
restore does not access meta table.  
{quote}
Can an admin queue sessions? That would help the user experience quite a bit 
until we get parallel sessions. 
{quote}
No, this is client -side operation. Can someone queue *hbck*?



> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-01 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274706#comment-16274706
 ] 

Mike Drob commented on HBASE-17852:
---

bq. because backup sessions always update shared records.
This sounds like a design flaw.

bq. what is the use case when Admin starts two sessions in parallel if he can 
run them serially
Can an admin queue sessions? That would help the user experience quite a bit 
until we get parallel sessions. (Not that I'm suggesting that this is either 
necessary or sufficient; I would much rather see effort towards a proper 
solution rather than temporary workaround after workaround, but queued 
operations may be useful in other contexts.)

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-01 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274700#comment-16274700
 ] 

Mike Drob commented on HBASE-17852:
---

The problem is that backups and restores cannot occur simultaneously.

Let's say that we have a hypothetical system set to backup nightly (via cron or 
some other non-interactive mechanism). While this full system backup is 
running, some problem is detected with a single table and it is determined that 
the correct course of action is to restore that table. Given that we base 
backup and restore operations on snapshots, this should be straightforward - 
the large backup can continue to run while a restore of the specific table (to 
the last known good state) is put in place without waiting for the backup to 
complete.

The current options appear to be wait until the backup finishes (maybe ok, 
depending on sizes/bandwidth/etc...) or cancel the nightly backup (very bad, 
especially  if we have to do manual cleanup of things). I think the position 
that I'm slowly arriving to is that we shouldn't be recommending nightly 
backups at all to folks - this is probably a use case better served by 
replication and having a wider variety of sinks available instead of only 
another HBase cluster (HBASE-18846 might help with this?). That said we would 
still need some kind of bulk restore wrappers. Let me think on this more...

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-12-01 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274659#comment-16274659
 ] 

Josh Elser commented on HBASE-17852:


bq. Hmm... I don't think we can publish backup/restore without HBASE-16391 in a 
2.0 release. I'd like to have confidence that the feature is rock solid before 
telling users that it's ok to use, parallel operations seems like a major 
shortcoming to me.

Let's dig in on this some more, [~mdrob]. B is much more of an 
"administrative function" as opposed to a "client feature". My general 
expectation would be that, most aggressively, HBase admins (a couple of people) 
would run incremental backups on the order of "hours", e.g. incremental backup 
every 8 hours . I could see the extremely paranoid wanting to do incremental 
backups every hour over some collection of tables which _could_ cause issues if 
we can only execute one backup operation at a time (I'm thinking along the 
lines of 3 backup sets, incremental backups every hour, merging of those 
backups every few hours, full backup every day, etc).

As such, my opinion differs in that I don't see the lack of concurrent backup 
operations being a major impediment for "most" users. I completely agree with 
you that there will be some users in which this limitation would be problematic 
on what they want to use it, but, even for these edge cases, B without this 
would still have value to them. I think getting this feature into the hands of 
users (with the extremely clear caveats on current implementation) would 
actually better serve the feature than letting it fester more on JIRA. Thoughts?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274008#comment-16274008
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
When we switch away from restore-via-snapshot and have proper transactions, 
does that mean this extra table will go away?
{quote}

Yes, but you need to understand, that proper Tx management is a hard task in 
this case. It is even harder than classic Tx management. DB Tx got rollbacked 
automatically in case of a collision (updates to the same record), but we have 
to merge these updates correctly, because backup sessions always update shared 
records. Is it worth doing? Only Admin can run backups and what is the use case 
when Admin starts two sessions in parallel if he can run them serially?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273958#comment-16273958
 ] 

Mike Drob commented on HBASE-17852:
---

When we switch away from restore-via-snapshot and have proper transactions, 
does that mean this extra table will go away?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273955#comment-16273955
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
There were concerns above on cross RS rpc to write the paths, I was trying to 
think of easiest way of avoiding that. How about returning the map as part of 
response here and then issue rpc to master from client side. It's easy and 
safer to retry from client side if remote resource isn't available.
I'd suggest going extra step, an easy one though - collect all paths on client 
side and do single put request. That'll give two benefits:
Will make it transactional incremental backup
If put fails repeatedly, you can either fail bulk load altogether, or throw 
error to user telling that these bulk loaded files failed to backup and that 
only full backup will include them.
{quote}

I will think about this and will get back to you, [~appy] shortly. Thanks, for 
suggestion.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273953#comment-16273953
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
What happens if during an ongoing backup, i create some backup sets, but then 
the backup fails? Snapshot restore will remove my backup sets?
{quote}

Yes. Any modifications to backup meta table *during* backup create/merge/delete 
session, which *fails* will be lost. It is the limitation currently. As a 
simple workaround, any updates (*backup sets operations only*) to backup meta 
table can be disabled during these sessions. 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273901#comment-16273901
 ] 

Appy commented on HBASE-17852:
--

Few questions:
Pardon me if my high level analysis of design is off. Is following correct 
description of current design?
Start bulkload from client -> each RS gets its RPC for prepare and then do the 
actual bulkload --> Internally when bulk load is 
done,BackupObserver#postBulkLoadHFile writes paths to backup table.
And to avoid full backup failures from affecting incremental backups (due to 
snapshot restore), you are putting bulk loaded paths data in a separate table, 
right?

There were concerns above on cross RS rpc to write the paths, I was trying to 
think of easiest way of avoiding that. How about returning the [map as part of 
response 
here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L2251]
 and then issue rpc to master from client side. It's easy and safer to retry 
from client side if remote resource isn't available.
I'd suggest going extra step, an easy one though -  collect all paths on client 
side and do single put request. That'll give two benefits:
- Will make it transactional incremental backup
- If put fails repeatedly, you can either fail bulk load altogether, or throw 
error to user telling that these bulk loaded files failed to backup and that 
only full backup will include them. 

What happens if during an ongoing backup, i create some backup sets, but then 
the backup fails? Snapshot restore will remove my backup sets?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273856#comment-16273856
 ] 

Mike Drob commented on HBASE-17852:
---

Hmm... I don't think we can publish backup/restore without HBASE-16391 in a 2.0 
release. I'd like to have confidence that the feature is rock solid before 
telling users that it's ok to use, parallel operations seems like a major 
shortcoming to me.

Maybe this isn't the right JIRA to discuss this, apologies for stepping into 
the crossfire here. I left a few comments on the RB, will continue to look 
after reading more of the general design.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273661#comment-16273661
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

Yes, concurrent backup support will require some code rewrite. Rollback - via 
-snapshot won't work in this case probably, but this is internal implementation 
details and they won't affect users - backward compatibility is a must here.

We do not have any specific timeouts for backups - only those, low level HBase 
timeouts for RPC and distributed procedure ops. If they time out - backup 
fails.  

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273568#comment-16273568
 ] 

Mike Drob commented on HBASE-17852:
---

Thank you for pointing at the parent ticket, I missed that there was design doc 
in there.

I'm worried about our current design choices for future concurrent backup 
design. Please correct my gaps in understanding here:

Current approach, which is limited to single backup operation involves snapshot 
the backup state table (what you refer to as backup system table, but I think 
state is more appropriate term) and then if failure then we restore the state? 
In future are we going to have multiple tables in backup namespace for each 
table to be backed up so that we can have concurrent approaches? Otherwise the 
concurrent backup solution will be a complete rewrite I expect.

Do backup operations have timeouts? I don't see them in the code, but could be 
looking at the wrong place.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273529#comment-16273529
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

The approach described in this JIRA description and in the parent ticket, 
[~mdrob].

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273491#comment-16273491
 ] 

Mike Drob commented on HBASE-17852:
---

bq. not a generic questions: Why have you chosen this design approach, 
especially when this approach has been discussed many times with other 
developers before
Can you point me at a design document that covers this?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273426#comment-16273426
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
I was about to write that I thought it was a no-brainer to blindly run a repair 
as a part of the BackupDriver, but now I wonder about the following:
Take two administrators running backups, unaware of each other. Admin1 starts a 
backup on Table1. Before Admin1's backup finishes, Admin2 tries to do a backup 
on Table2. Could Admin2 preempt/fail Admin1's backup by running a hbase backup 
repair while Admin1 is using the system?
In other words: does hbase backup repair have the ability to differentiate 
between "user is currently executing a backup" and "stale state exists in the 
table from an aborted/unfinished operation"?
{quote}

All operations are serialized. Admin 2 will fail and will be waiting until 
first operation is complete (successfully or not). Multiple parallel backup 
sessions support is on roadmap for 2.1 release: 
https://issues.apache.org/jira/browse/HBASE-16391.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273414#comment-16273414
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
I'm not sure what this is intended to prove. Sometimes I get patches right on 
the first try, sometimes it takes twenty tries.
{quote}

Nothing, actually except that when contributor submit a patch he expects 
comments/questions related to the code of a patch not a generic questions: Why 
have you chosen this design approach, especially when this approach has been 
discussed many times with other developers before. It is very hard and time 
consuming to explain everything from a  very beginning for  a person who wants 
to participate in a review, but is not familiar with the code. I have two 
committers on the feature [~te...@apache.org] and [~elserj] who have spent a 
lot of time working on the code. I trust them and although I appreciate help 
from other developers, I expect them to spend some time digging into the full 
feature code, before trying to review a particular patch (one of more than 100 
already). This requires some commitment. 

Any question on the patch itself? 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273399#comment-16273399
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
It seems like the feature is being moved out because it's incomplete...
{quote}
Really, what is missing? Have you read the doc? Everything described in the B 
documentation have been implemented and tested.  I am running integration tests 
now on a scale, this is probably the last what developer is supposed to do 
before declaring feature fully complete? We are still at alpha stage, plenty 
time to harden the feature before beta-1 or 2.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273376#comment-16273376
 ] 

Josh Elser commented on HBASE-17852:


bq. Operators who'd rather avoid reading logs and having to run repair tools 
are 'lazy'.

bq. Do not we still have hbck for this reason? Repair \[...\] which happens 
periodically in HBase cluster.

Let me also expand on this: I would consider "lazy" as a virtue for operators. 
The system should automatically handle as much as possible. There's a 
fundamental difference between what hbck is and what `hbase backup repair` is: 
HBCK is fixing things that inadvertently happen server-side (hopefully, only 
around bugs which has since been fixed) whereas hbase-backup are completely 
client-driven. For example, something as benign as a user ctrl-C'ing a backup 
because they mis-typed the backup name or table being backed up would cause the 
backup table to need a repair.

bq. This is the question actually, should we do repair automatically or we need 
to inform user, that there was abnormal failure of a last backup/merge/delete 
command and user need to run repair.

I was about to write that I thought it was a no-brainer to blindly run a repair 
as a part of the BackupDriver, but now I wonder about the following:

Take two administrators running backups, unaware of each other. Admin1 starts a 
backup on Table1. Before Admin1's backup finishes, Admin2 tries to do a backup 
on Table2. Could Admin2 preempt/fail Admin1's backup by running a {{hbase 
backup repair}} while Admin1 is using the system?

In other words: does {{hbase backup repair}} have the ability to differentiate 
between "user is currently executing a backup" and "stale state exists in the 
table from an aborted/unfinished operation"?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273371#comment-16273371
 ] 

Mike Drob commented on HBASE-17852:
---

bq. The reason is the simplicity of the implementation. Is not this obvious?
It's not obvious, hence the need for clarifying questions. We're all 
collaborators here, Vlad, not adversaries. I haven't reviewed the code, so in 
this instance I'm a messenger and attempted mediator.

bq. Should I have spent time trying to implement Tx management instead?
Maybe!

bq. Did I answer original question? I thought that we are technical guys and we 
need technical answers. It seems that I was wrong.
I'm reminded of advice I got early in my software engineer career - it's easy 
to write code, it's less easy to write correct code, and it is actively hard to 
know which code to write.

The technical answer may have been obvious like you assert, but it's not a 
complete answer. Understanding how the operators will need to use this feature 
and how they will interact with it is important in building something that is 
useful to them.

bq. User intervention is required only if user kills backup process or it dies 
on a client side, for some other reason.
There's lots of reasons that a process might die on the client side. Seems we 
may disagree on the frequency here.

bq. Do not we still have hbck for this reason?
Sure, we can extend hbck to take care of these failures as well. Does it 
currently do so? I have no idea. Probably not, given that I don't think hbck 
works with hbase-2.0 due to AMv2.

bq. Moving feature out of beta-1 only because someone does not like attitude of 
a contributor
It seems like the feature is being moved out because it's incomplete...

And some earlier comments:
bq. But for lazy operators...
Lazy operators are the best kind. They are the ones that automate things, the 
ones that prepare and test for failure so that they don't get called in the 
middle of the night, the ones that actually make sure that the ship stays 
sailing.

bq. The patch is no 8 already
I'm not sure what this is intended to prove. Sometimes I get patches right on 
the first try, sometimes it takes twenty tries.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273329#comment-16273329
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

The reason is the simplicity of the implementation. Is not this obvious? Should 
I have spent time trying to implement Tx management instead? I doubt. Did I 
answer original question? I thought that we are technical guys and we need 
technical answers. It seems that I was wrong. 

User intervention is required only if user kills backup process or it dies on a 
client side, for some other reason. All cluster side failures get repaired 
automatically. I see nothing painful for users here, [~mdrob], especially when 
I will implement auto-repair feature. This is the question actually, should we 
do repair automatically or we need to inform user, that there was abnormal 
failure of a last backup/merge/delete command and user need to run repair.

Do not we still have *hbck* for this reason? Repair all the s**t which happens 
periodically in HBase cluster.

Moving feature out of beta-1 only because someone does not like *attitude of a 
contributor* means that something is not going well in HBase community.  

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273167#comment-16273167
 ] 

Mike Drob commented on HBASE-17852:
---

[~vrodionov] - I think Josh clarified Stack's question to explain why he posits 
that you "didn't answer the question"

bq. That's the technical explanation for why it is implemented as such, but I 
think the spirit of the question is more: "what are the reasons for making this 
choice and is there something that could be done to make this less painful for 
users?"

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273141#comment-16273141
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

My attitude? Nice. Maybe yours? I tried several times to explain you obvious 
things, but you still not getting them. 




> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272169#comment-16272169
 ] 

stack commented on HBASE-17852:
---

Moving out of beta-1.  I ask questions and get rubbish back.

Contributor has wrong attitude. Operators who'd rather avoid reading logs and 
having to run repair tools are 'lazy'.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272106#comment-16272106
 ] 

Hadoop QA commented on HBASE-17852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
41s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
17s{color} | {color:red} hbase-backup: The patch generated 4 new + 179 
unchanged - 18 fixed = 183 total (was 197) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 670 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m 
16s{color} | {color:red} The patch 384 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
17s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
54m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
16s{color} | {color:green} hbase-backup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-17852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899932/HBASE-17852-v9.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux b272d49628c9 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271978#comment-16271978
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
You don't answer the question
{quote}

What question? What does "corrupt" mean? Why do I need to restore meta table? I 
am afraid, I can't add anything else to my answers above.

{quote}
Don't follow. An operator sets up a cron job. Works great for a few days. Then 
it stops. Operator needs to figure that he has to run a repair. Operator sets 
up two cron jobs? Or cron probes first for breakage...
{quote}

Stops means fails. If cron job fails, operator will need to intervene, read 
logs, manuals and figure out that repair is required. Not a big deal, imo. We 
clearly log message, that repair tool has to be run. But for lazy operators I 
will add auto-repir mode of execution (see above ticket).

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271903#comment-16271903
 ] 

stack commented on HBASE-17852:
---

bq. Can you post your comments on RB, Stack?

Traditionally it is the contributors' job keeping the feedback in order and 
making sure it all addressed whether in JIRA or RB. Not addressing reviewers 
feedback or dropping it w/o comment is a total no-no.

bq. I have explained this many times already ... 

You don't answer the question. You just make asserts that we have to rollback 
w/o justification other than backups 'become corrupt' or a backup is only 
'safe' if it completes? Sounds like it needs to be 'transactional' but you 
don't describe the transaction (correct me if I'm wrong). I don't get why a 
completed backup can't just write a completion marker to the backup table. W/o 
it the backup is corrupt/incomplete and we just move on.

bq. Running backup repair automatically in case of a backup failure won't hurt 
and can be incorporated into cron job

Don't follow. An operator sets up a cron job. Works great for a few days. Then 
it stops. Operator needs to figure that he has to run a repair. Operator sets 
up two cron jobs? Or cron probes first for breakage...



> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271873#comment-16271873
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
f the standard-procedures would be to run a repair blindly, why can't this be 
encapsulated in BackupDriver? Making the user's life easier is certainly 
beneficial.
{quote}

I can add auto-repair mode of execution for create/merge/delete.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271843#comment-16271843
 ] 

Josh Elser commented on HBASE-17852:


I think the only outstanding code-review comment from [~stack] was 
consolidation of two log messages into one (other questions were "why the bulk 
load backup table" which I think we better understand now and the use of 
TableDescriptorBuilder which I had already dinged and Vlad has fixed).

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271835#comment-16271835
 ] 

Josh Elser commented on HBASE-17852:


{quote}
bq.Why do this? Why not just mark the backup as corrupt and move on? (Why 
does an incomplete back-up freeze all backups – which you say above  I'm 
trying to understand).

I have explained this many times already ... Restoring meta table in case of a 
backup failure is a necessary step to make future backups possible. We write 
some data during backup create, which is safe only of backup succeeds, such as 
last WAL roll timestamp per table-per RS. If backup fails, this data becomes 
corrupt w/o restoring meta table from snapshot. 
{quote}

That's the technical explanation for why it is implemented as such, but I think 
the spirit of the question is more: "what are the reasons for making this 
choice and is there something that could be done to make this less painful for 
users?"

{quote}
bq.  What if its a cron job? Does this inability at moving on past failure make 
it so backup cannot be cron'd?

Running backup repair automatically in case of a backup failure won't hurt and 
can be incorporated into cron job
{quote}

If the standard-procedures would be to run a repair blindly, why can't this be 
encapsulated in BackupDriver? Making the user's life easier is certainly 
beneficial.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-28 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269756#comment-16269756
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
Why do this? Why not just mark the backup as corrupt and move on? (Why does an 
incomplete back-up freeze all backups – which you say above  I'm trying to 
understand).
{quote}

I have explained this many times already ... Restoring meta table in case of a 
backup failure is a necessary step to make future backups possible. We write 
some data during backup create, which is safe only of backup succeeds, such as 
last WAL roll timestamp per table-per RS. If backup fails, this data becomes 
corrupt w/o restoring meta table from snapshot. 

{quote}
What if its a cron job? Does this inability at moving on past failure make it 
so backup cannot be cron'd?
{quote}

Running backup repair automatically in case of a  backup failure won't hurt and 
can be incorporated into cron job

{quote}
If we weren't snapshotting/restoring the backup table, we wouldn't have to make 
a separate table to hold bulkloaded files? Is that so? (I'm not asking for a 
rewrite...).
{quote}

Yes, correct.

{quote}
I am asking questions to try and understand what is going on in here. When the 
response is terse or lean on info, I'm going to ask another question... and so 
on. As to whether snapshot/restore of the meta backup table is 'bad' or not, 
I'm still trying to understand why we would go to the extreme of offlining a 
whole table – even though rare when in error and then it seems, this offlining 
is making it so we have to add yet another table just to hold bulk loaded 
files... Pardon my being slow.
{quote}

Yes, the second table has been added long after the initial implementation was 
complete as a result of hardening bulk load support feature. You may consider 
this a s work-around, but it is pretty lightweight work-around. W/o snapshots, 
we have to make all the changes to meta table fully transactional ones. I think 
it is much harder.






> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-28 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269733#comment-16269733
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
What of my review comments are addressed in latest patch?
{quote}

Can you post your comments on RB, Stack? 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-28 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269677#comment-16269677
 ] 

stack commented on HBASE-17852:
---

Write-up helps. Some questions.

bq. When operation fails on a server side, we handle this failure by cleaning 
up partial data in backup destination, followed by restoring backup meta-table 
from a snapshot.

Why do this? Why not just mark the backup as corrupt and move on? (Why does an 
incomplete back-up freeze all backups -- which you say above  I'm trying to 
understand).

bq. When operation fails on a client side (abnormal termination, for example), 
next time user will try create/merge/delete he(she) will see error message, 
that system is in inconsistent state and repair is required, he(she) will need 
to run backup repair tool.

What if its a cron job? Does this inability at moving on past failure make it 
so backup cannot be cron'd?

bq. To avoid multiple writers to the backup system table (backup client and 
BackupObserver's) we introduce small table ONLY to keep listing of bulk loaded 
files.

If we weren't snapshotting/restoring the backup table, we wouldn't have to make 
a separate table to hold bulkloaded files? Is that so? (I'm not asking for a 
rewrite...).

bq. Your are the only one who is objecting snapshot-based approach, but I am 
still waiting for a single argument why is this bad?

I am asking questions to try and understand what is going on in here. When the 
response is terse or lean on info, I'm going to ask another question... and so 
on. As to whether snapshot/restore of the meta backup table is 'bad' or not, 
I'm still trying to understand why we would go to the extreme of offlining a 
whole table -- even though rare when in error and then it seems, this offlining 
is making it so we have to add yet another table just to hold bulk loaded 
files... Pardon my being slow.

What of my review comments are addressed in latest patch?

Thanks.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-24 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265492#comment-16265492
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
Let me just assume this stuff is handled, but a walk through of what happens 
when the backup table goes away in different scenarios would be good.
Is the above answered? (Copied from earlier in this dialog).
{quote}

When backup meta table goes away, bulk load will continue because bul load 
observers do not write to main meta table. When second table (for bulk loaded 
files) gets offlined - bulk loading fails.

  

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261687#comment-16261687
 ] 

Hadoop QA commented on HBASE-17852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 6s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
16s{color} | {color:red} hbase-backup: The patch generated 3 new + 179 
unchanged - 18 fixed = 182 total (was 197) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
26s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
47m  8s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
28s{color} | {color:green} hbase-backup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-17852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898743/HBASE-17852-v8.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux bd11d8af32a0 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3b2b22b5fa |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 

[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261499#comment-16261499
 ] 

Hadoop QA commented on HBASE-17852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
26s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
15s{color} | {color:red} hbase-backup: The patch generated 6 new + 187 
unchanged - 10 fixed = 193 total (was 197) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
52s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
52m  1s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
59s{color} | {color:green} hbase-backup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
26s{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 80m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-17852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898716/HBASE-17852-v7.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 13b57d82c6f1 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 984e0ecfc4 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 

[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-20 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260212#comment-16260212
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
My gut reaction is that the number of backups which would need to be retained 
in the system (e.g. rows in the hbase backup "system" table) would have to be 
quite large to even grow beyond a single region (many thousands to millions). 
As such, the snapshot restore isn't much more than grabbing the write lock and 
replacing some one data file and some Region metadata. This is on my list today 
to investigate confirm.
{quote}

Yes, [~elserj], you are right. Backup system table for vast majority of 
deployments will fit a single region. It is a metadata - not a data. Therefore, 
 creation of snapshot and restoring from snapshot is a very lightweight 
operation. That is was a major reason I have chosen rollback-via-snapshot 
approach. 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-20 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259911#comment-16259911
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
2) recover from client side failure (and, probably, implicitly meant to include 
un-handled server-side failure conditions too).
{quote}

For 2. we have backup repair tool, client will be asked to run repair tool next 
time he/she will try to run backup/restore/merge/delete

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-20 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259504#comment-16259504
 ] 

Josh Elser commented on HBASE-17852:


bq. I am not asking for any particular implementation, to be clear. I'm just 
trying to understand and am having trouble digesting full restore of a meta 
table whatever the size or traffic on error. It strikes me as whack (You seem 
to at least agree it 'overkill')

Got it. To clarify my previous message, by "overkill" I only mean "non-ideal". 
As in, there is likely a more complicated solution that could accomplish the 
same net-effect with less computation+time required. I didn't mean to say that 
I believed using a snapshot and table-restore is invalid or wrong. My gut 
reaction is that the number of backups which would need to be retained in the 
system (e.g. rows in the hbase backup "system" table) would have to be quite 
large to even grow beyond a single region (many thousands to millions). As 
such, the snapshot restore isn't much more than grabbing the write lock and 
replacing some one data file and some Region metadata. This is on my list today 
to investigate confirm.

To try to move the conversation forward, I tend to agree with Vlad that I don't 
seen an inherent problem with the rollback-via-snapshot implementation. 
Architecturally, Vlad is using the snapshot feature exactly how it was intended 
to be used (shallow copy and restore of a table).

bq. the idea to offline a system table and then restore from a snapshot on 
error with clients 'advised' to stop writing as some-sort of 2PC

Let's revisit this again: in the parent JIRA issue, Vlad outlined two-cases. 1) 
Recover from a "server-side" failure and 2) recover from client side failure 
(and, probably, implicitly meant to include un-handled server-side failure 
conditions too).

For #1, clients don't need to do anything special (specifically mentioned on 
the parent issue). Mutual exclusion is already built in to manage the 
serialized state in the backup "system" table. So, we're just looking at the 
cost of these steps. Offline+snapshot+online should be one of these rock-solid 
features of the system.

For #2, we're in this situation that you outline. Per the concerns you raised 
about "coordination" (the special handshake, to use another metaphor), this 
seems mitigate-able via return code of the {{hbase backup}} and a prominent 
error message in this case. I don't know if either presently exist (Could you 
comment, [~vrodionov]?).

Both of these are predicated on the mutual exclusion of multiple clients at a 
higher level. Obviously, a finer grain exclusion strategy is desirable for 
multiple reasons, but, given my current understanding, I don't see any 
fundamental problem with this approach.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257974#comment-16257974
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
This isn't helpful and, likely, directly harmful :\
{quote}

What about "what the fuck Vlad", Josh? Is it harmful?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257972#comment-16257972
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

There is nothing wrong with snapshot approach until someone proves it is wrong. 
I am waiting for your arguments, Stack. 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257968#comment-16257968
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
Vlad, you seem to be doing your utmost to sabotage the delivery of this 
feature. 
{quote}

Do you really believe in that? Josh is just too polite, in my opinion. He is 
trying to be good with you. I am just who I am. I am too straightforward, 
Stack. The only person who sabotage this feature here is you.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257930#comment-16257930
 ] 

stack commented on HBASE-17852:
---

bq. Stack, are you essentially asking why this isn't implemented on top of 
ProcV2?
bq.  I think at this point, it would be more productive if we can say more 
"there is something implicitly broken with this approach" instead of "there is 
a more elegant implementation to be had".

I am not asking for any particular implementation, to be clear. I'm just trying 
to understand and am having trouble digesting full restore of a meta table 
whatever the size or traffic on error. It strikes me as whack (You seem to at 
least agree it 'overkill'). There seems to be no write-up on the approach here 
ahead of piecemeal code drops (w/o overarching description of what all is 
entailed) so only way to figure it as best as I can ascertain, is via this 
really pleasant back and forth w the author.

Vlad, you seem to be doing your utmost to sabotage the delivery of this 
feature. The sort of answers you give us reviewers is one thing. Will operators 
who run into issues w/ this feature get the same treatment?



> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257920#comment-16257920
 ] 

Josh Elser commented on HBASE-17852:


{code}
In hbase2, we have builders for the below instead...

1381 HTableDescriptor tableDesc = new 
HTableDescriptor(getTableNameForBulkLoadedData(conf));
{code}

I had left a similar comment on RB. This was fixed in v6 (patchset 5 on RB). I 
think the majority of other changes were suggestions I had left on RB -- have 
not explicitly checked, just going off of the "issues" being resolved.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257919#comment-16257919
 ] 

Josh Elser commented on HBASE-17852:


bq. This is going to confuse. 'system' tables have a particular meaning in 
hbase.

Should be easy enough to rename with your IDE of choice, right Vlad? Avoiding 
overloading terminology is always a good idea. "BackupMetadata" and 
"BackupBulkLoadFiles"? (just pitching ideas)

bq. The snapshot/restore of a whole system table strikes me as a bunch of 
moving parts. I have to ask why we got such an extreme? 2PC is tough-enough w/o 
offlining/restore of whole meta table. During restore, all clients are frozen 
out or something so they can't pollute the restored version? Restore is not 
atomic, right? We couldn't have something like a row-per-backup with a success 
tag if all went well (I've not been following closely – pardon all the 
questions).

Stack, are you essentially asking why this isn't implemented on top of ProcV2? 
I'm trying to read between the lines but am not sure if I'm inventing something 
that isn't there. There are definitely areas of the code in which the 
acknowledgement has already been made about a better implementation can be 
done. For example, clients _are_ "frozen out" right now from concurrent 
operations (a nod that backups, merges, and restores could be done 
concurrently). I think at this point, it would be more productive if we can say 
more "there is something implicitly broken with this approach" instead of 
"there is a more elegant implementation to be had". I don't think anyone is 
arguing against that.

Yes, rolling back the entire backup "system" table is overkill (for what may 
sometimes be deleting a single row/column -- the ACTIVE_SNAPSHOT as mentioned 
in the parent) and would take much longer that it could necessarily need to.

bq. You suggest I review code. I have been reviewing code. Thats how we got 
here.

And thank you for that. I know your intentions are good. We're all ultimately 
working towards a common goal here.

bq. Sure, you can start from very beginning, Stack. Go ahead.

This isn't helpful and, likely, directly harmful :\

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257737#comment-16257737
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
So, the idea to offline a system table and then restore from a snapshot on 
error with clients 'advised' to stop writing as some-sort of 2PC got buy-in 
from others? This is 'fault-tolerance'? Is there a write-up somewhere that 
explains why we have to offline and then restore a whole table (whatever its 
size) just because a particular op failed and how it is more simple and elegant 
than other soluntions (what others?), I'd like to read it. Otherwise, I just 
don't get it (neither will the operator whose cron job failed because backup 
table was gone when it ran).
{quote}

Stack, you just out of context right now, but I appreciate you want to spend so 
much time digging into my code once again. Thanks.
Your are the only one who is objecting snapshot-based approach, but I am still 
waiting for a single argument why is this bad?


> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257729#comment-16257729
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
You suggest I review code. I have been reviewing code. Thats how we got here.
{quote}

Sure, you can start from very beginning, Stack. Go ahead.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257706#comment-16257706
 ] 

Hadoop QA commented on HBASE-17852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
 7s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
25s{color} | {color:red} hbase-backup: The patch generated 4 new + 69 unchanged 
- 6 fixed = 73 total (was 75) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
23s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
64m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
16s{color} | {color:green} hbase-backup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-17852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898274/HBASE-17852-v6.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 5973dfc868e3 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ca74ec7740 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9901/artifact/patchprocess/diff-checkstyle-hbase-backup.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9901/testReport/ |
| modules | C: hbase-backup U: hbase-backup |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9901/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically 

[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257621#comment-16257621
 ] 

stack commented on HBASE-17852:
---

My comments above are not in RB. Were they addressed?

Patches should include description. Helps reviewers and those trying to 
follow-behind. Yours have none.

You don't use the suggested patch-making tool either in-spite of an earlier 
request.

So, the idea to offline a system table and then restore from a snapshot on 
error with clients 'advised' to stop writing as some-sort of 2PC got buy-in 
from others? This is 'fault-tolerance'? Is there a write-up somewhere that 
explains why we have to offline and then restore a whole table (whatever its 
size) just because a particular op failed and how it is more simple and elegant 
than other soluntions (what others?), I'd like to read it. Otherwise, I just 
don't get it (neither will the operator whose cron job failed because backup 
table was gone when it ran).

You suggest I review code. I have been reviewing code. Thats how we got here.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257590#comment-16257590
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

You can find them as fixed on RB:
https://reviews.apache.org/r/63155/

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257580#comment-16257580
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
The snapshot/restore of a whole system table strikes me as a bunch of moving 
parts. 
{quote}

That is only one backup system table. 

{quote}
I have to ask why we got such an extreme?
{quote}

What is so extreme here? Snapshot of a system table? I consider this approach 
much more simple and elegant than 
others?

{quote}
During restore, all clients are frozen out or something so they can't pollute 
the restored version?
{quote}

Yes. During table restore operation, all clients (of this table) must  be 
stopped.  In theory, this is not a hard requirement - it is just an advice. But,
we truncate table, before restore and this, definitely, may affect unexpectedly 
incoming writes. Any database system, which allows writes to a table during 
restore of a table? 

Stack, if you have doubts in the implementation, I suggest you to go over code 
and find places where you think the code has issues.



> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257572#comment-16257572
 ] 

stack commented on HBASE-17852:
---

bq. v6 addresses some of the RB comments

Which comments were addressed?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257561#comment-16257561
 ] 

stack commented on HBASE-17852:
---

bq. That is why we take snapshot of a backup system table and restore this 
table from snapshot, previously taken, in case of a command 
(create/delete/merge) failure.

Was this written up somewhere previously and the design shopped before others 
with buy-in?

The snapshot/restore of a whole system table strikes me as a bunch of moving 
parts. I have to ask why we got such an extreme? 2PC is tough-enough w/o 
offlining/restore of whole meta table. During restore, all clients are frozen 
out or something so they can't pollute the restored version? Restore is not 
atomic, right? We couldn't have something like a row-per-backup with a success 
tag if all went well (I've not been following closely -- pardon all the 
questions).



> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257502#comment-16257502
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

Backup/Delete/Merge operations must be executed in a transactional manner. 
Backup system table keeps data (meta-data) which allows to run backups and 
others commands. During backup create, delete or merge, we update backup system 
table multiple times and do not want these updates to be partial ones (when 
operation fails), because *it will prevent further backups/deletes/merges after 
a failure*.

That is why we take snapshot of a backup system table and restore this table 
from snapshot, previously taken, in case of a command (create/delete/merge) 
failure.

By consistency of data I mean - no partial updates should be visible to a user 
after operation completes (either successfully or not). Partial updates in a 
backup system tables == corruption of a system table and MUST be avoided. When 
corruption happens - the only way to restore backup system is to truncate 
backup system table and re-run all backups in full mode.




> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256202#comment-16256202
 ] 

stack commented on HBASE-17852:
---

Thanks for the pointer. I'd not read it previous. It does not answer my 
question though, "Why would you restore a backup system table from a snapshot 
when a 'backup' fails? Backups are of user-space tables. How does this impinge 
on the backup 'system' table?"

"in case if operation fails to restore meta - data consistency in a backup 
system table..."

Yeah, which operation? Which meta? A backup meta? What consistency needs to be 
maintained in the backup table?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-16 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256193#comment-16256193
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

Please, refer to a parent ticket for description what we perform in case of a 
failure
https://issues.apache.org/jira/browse/HBASE-15227

In a few words, we take backup system table snapshot before 
backup/merge/delete/ and restore this table from snapshot back
in case if operation fails to restore meta - data consistency in a backup 
system table

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256183#comment-16256183
 ] 

stack commented on HBASE-17852:
---

bq. When backup fails, we restore backup system table from snapshot.

Why would you restore a backup system table from a snapshot when a 'backup' 
fails? Backups are of user-space tables. How does this impinge on the backup 
'system' table?

bq. If Observers write to the same table as general backup operation, some data 
from Observers may be lost when we restore table from snapshot. I thought, I 
explained that.

Where?

Is there a writeup on how this all works? (It is not in the user-guide)

bq. They are system from the point of view of a user. checkSystemTable checks 
backup system table.

This is going to confuse. 'system' tables have a particular meaning in hbase.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-16 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256126#comment-16256126
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
In the patch, it is called a ' Backup System table name for bulk loaded files' 
... but it is not a system table? Is that so? And this is different from the 
BackupSystemTable which also is not a system table, right?
BackupSystemTable does a checkSystemTable(); It is checking system tables, or 
not?
{quote}

They are system from the point of view of a user. checkSystemTable checks 
backup system table. 

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2017-11-16 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256124#comment-16256124
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

{quote}
How do general backup ops and bulk loaded files effect each other?
{quote}

When backup fails, we restore backup system table from snapshot. If Observers 
write to the same table as general backup operation, some data from Observers 
may be lost when we restore table from snapshot. I thought, I explained that.

Second table keeps only bulk load related references. We do not care about 
consistency of this table, because bulk load is idempotent operation and can be 
repeated after failure. Partially written data in second table does not affect 
on BackupHFileCleaner plugin, because this data (list of bulk loaded files) 
correspond to a files which are *have not been loaded yet successfully* and, 
hence - are not visible to the system





> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >