[jira] [Commented] (HBASE-27238) Backport Backup/Restore to 2.x
[ https://issues.apache.org/jira/browse/HBASE-27238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680071#comment-17680071 ] Mallikarjun commented on HBASE-27238: - [~bbeaudreault] Thanks for taking out time to review this Patch. > Backport Backup/Restore to 2.x > -- > > Key: HBASE-27238 > URL: https://issues.apache.org/jira/browse/HBASE-27238 > Project: HBase > Issue Type: New Feature > Components: backport, backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 2.6.0 > > > Backport backup/restore to 2.x branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HBASE-27582) Errorprone cleanup in hbase-backup
[ https://issues.apache.org/jira/browse/HBASE-27582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun reassigned HBASE-27582: --- Assignee: Mallikarjun > Errorprone cleanup in hbase-backup > -- > > Key: HBASE-27582 > URL: https://issues.apache.org/jira/browse/HBASE-27582 > Project: HBase > Issue Type: Task >Reporter: Bryan Beaudreault >Assignee: Mallikarjun >Priority: Minor > > I noticed a bunch of javac warnings in backporting the backups feature to > branch-2. The same problems exist in master branch. Let's cleanup error prone > warnings in both branches once the backport lands. > See > [https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4770/10/artifact/yetus-general-check/output/diff-compile-javac-root.txt] > for initial set to fix. Mostly in tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HBASE-27582) Errorprone cleanup in hbase-backup
[ https://issues.apache.org/jira/browse/HBASE-27582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-27582 started by Mallikarjun. --- > Errorprone cleanup in hbase-backup > -- > > Key: HBASE-27582 > URL: https://issues.apache.org/jira/browse/HBASE-27582 > Project: HBase > Issue Type: Task >Reporter: Bryan Beaudreault >Assignee: Mallikarjun >Priority: Minor > > I noticed a bunch of javac warnings in backporting the backups feature to > branch-2. The same problems exist in master branch. Let's cleanup error prone > warnings in both branches once the backport lands. > See > [https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4770/10/artifact/yetus-general-check/output/diff-compile-javac-root.txt] > for initial set to fix. Mostly in tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HBASE-27238) Backport Backup/Restore to 2.x
[ https://issues.apache.org/jira/browse/HBASE-27238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-27238 started by Mallikarjun. --- > Backport Backup/Restore to 2.x > -- > > Key: HBASE-27238 > URL: https://issues.apache.org/jira/browse/HBASE-27238 > Project: HBase > Issue Type: New Feature > Components: backport, backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Backport backup/restore to 2.x branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27238) Backport Backup/Restore to 2.x
[ https://issues.apache.org/jira/browse/HBASE-27238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-27238: Description: Backport backup/restore to 2.x branch. > Backport Backup/Restore to 2.x > -- > > Key: HBASE-27238 > URL: https://issues.apache.org/jira/browse/HBASE-27238 > Project: HBase > Issue Type: New Feature > Components: backport, backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Backport backup/restore to 2.x branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27238) Backport Backup/Restore to 2.x
Mallikarjun created HBASE-27238: --- Summary: Backport Backup/Restore to 2.x Key: HBASE-27238 URL: https://issues.apache.org/jira/browse/HBASE-27238 Project: HBase Issue Type: New Feature Components: backport, backuprestore Reporter: Mallikarjun Assignee: Mallikarjun -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558902#comment-17558902 ] Mallikarjun edited comment on HBASE-26322 at 6/26/22 4:11 PM: -- #1. This is something I did not think of. Even now I don't see if there is a way to solve this problem. Do you have any suggestions? #2. Even during backup rsgroup is considered and taken backup, #1 was about this. Sorry, can you elaborate on what you are confused about? was (Author: rda3mon): #1. This is something I did not think of. Even now I don't see if there is a way to solve this problem. Do you have any suggestions? #2. Even during backup rsgroup is considered and taken backup. This was this #1. Sorry, can you elaborate on what you are confused about? > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558902#comment-17558902 ] Mallikarjun commented on HBASE-26322: - #1. This is something I did not think of. Even now I don't see if there is a way to solve this problem. Do you have any suggestions? #2. Even during backup rsgroup is considered and taken backup. This was this #1. Sorry, can you elaborate on what you are confused about? > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558900#comment-17558900 ] Mallikarjun edited comment on HBASE-26034 at 6/26/22 3:55 PM: -- That part I did not solve for. Each backup would take the all WAL files of those regionservers resulting into more data than what is necessary. This problem exists only for Incremental backups since they depend on WAL files. was (Author: rda3mon): That part I could not solve for. Each backup would take the all WAL files of those regionservers resulting into more data than what is necessary. This problem exists only for Incremental backups since they depend on WAL files. > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-4 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558900#comment-17558900 ] Mallikarjun commented on HBASE-26034: - That part I could not solve for. Each backup would take the all WAL files of those regionservers resulting into more data than what is necessary. This problem exists only for Incremental backups since they depend on WAL files. > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-4 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555619#comment-17555619 ] Mallikarjun edited comment on HBASE-26034 at 6/17/22 1:44 PM: -- In existing implementation, one can take single backup at a time. Which takes exclusive system wide lock and resulting into some problems, esp if you have rsgroup enabled with multiple tenants wanting to take backups with different intervals. Following are the list of changes in the PR. # Remove exclusive system wide lock and replace with table level locks with checkAndPut. Repairs in case of abruptly dead jobs. This helps in taking parallel table backup and configuring independent RPO # Take snapshot of backup table at the begining of backup and restore the snapshot at the end was unnecessary. This is removed as it serves no purpose and simplifying the logic. These are the 2 changes. Because of posibility of multiple backups happening at any point of time. Had to change BackupId to List while handling sessions. These are the changes in this PR. [~zhangduo] I have listed the changes above. If want any other information, please ask. P.S: Thank you very much for taking time to look into this. was (Author: rda3mon): One can take single backup at a time. Which takes exclusive system wide lock and resulting into following problems, esp if you have rsgroup enabled with multiple tenants wanting to take backups with different intervals. Following are the list of changes in the PR. # Remove exclusive system wide lock and replace with table level locks with checkAndPut. Repairs in case of abruptly dead jobs. This helps in taking parallel backup and configuring independent RPO # Take snapshot of backup table at the begining of backup and restore the snapshot at the end was unnecessary. This is removed as it serves no purpose. These are the 2 changes. Because of posibility of multiple backups happening at any point of time. Had to change BackupId to List while handling sessions. These are the changes in this PR. [~zhangduo] I have listed the changes above. If want any other information, please ask. P.S: Thank you very much for taking time to look into this. > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-4 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555619#comment-17555619 ] Mallikarjun commented on HBASE-26034: - One can take single backup at a time. Which takes exclusive system wide lock and resulting into following problems, esp if you have rsgroup enabled with multiple tenants wanting to take backups with different intervals. Following are the list of changes in the PR. # Remove exclusive system wide lock and replace with table level locks with checkAndPut. Repairs in case of abruptly dead jobs. This helps in taking parallel backup and configuring independent RPO # Take snapshot of backup table at the begining of backup and restore the snapshot at the end was unnecessary. This is removed as it serves no purpose. These are the 2 changes. Because of posibility of multiple backups happening at any point of time. Had to change BackupId to List while handling sessions. These are the changes in this PR. [~zhangduo] I have listed the changes above. If want any other information, please ask. P.S: Thank you very much for taking time to look into this. > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-4 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1795#comment-1795 ] Mallikarjun edited comment on HBASE-26322 at 6/17/22 1:26 PM: -- Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's and other problems. (Because wals are retained until next successful backup is completed) Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and ever growing WAL's will fill up the disk easily (Also WAL's are not compressed resulting into faster disk fill up). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. was (Author: rda3mon): Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's and other problems. (Because wals are retained until next successful backup is completed) Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and ever growing WAL's will fill up the disk easily (Also WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1795#comment-1795 ] Mallikarjun edited comment on HBASE-26322 at 6/17/22 1:26 PM: -- Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's and other problems. (Because wals are retained until next successful backup is completed) Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and ever growing WAL's will fill up the disk easily (Also WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. was (Author: rda3mon): Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's and other problems. (Because wals are retained until next successful backup is completed) Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and ever growing WAL's will fill up the disk easily (since WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1795#comment-1795 ] Mallikarjun edited comment on HBASE-26322 at 6/17/22 1:26 PM: -- Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's and other problems. (Because wals are retained until next successful backup is completed) Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and ever growing WAL's will fill up the disk easily (since WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. was (Author: rda3mon): Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's and other problems. (Because wals are retained until next successful backup is completed) Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and every growing WAL's will fill up the disk easily (since WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1795#comment-1795 ] Mallikarjun edited comment on HBASE-26322 at 6/17/22 1:25 PM: -- Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's and other problems. (Because wals are retained until next successful backup is completed) Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and every growing WAL's will fill up the disk easily (since WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. was (Author: rda3mon): Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's, and other problems. Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and every growing WAL's will fill up the disk easily (since WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1795#comment-1795 ] Mallikarjun edited comment on HBASE-26322 at 6/17/22 1:24 PM: -- Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A regionservers are part of RsgroupA. rs1B, rs2B, rs3B regionserver are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's, and other problems. Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and every growing WAL's will fill up the disk easily (since WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. was (Author: rda3mon): Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A are part of RsgroupA. rs1B, rs2B, rs3B are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's, and other problems. Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and every growing WAL's will fill up the disk easily (since WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1795#comment-1795 ] Mallikarjun commented on HBASE-26322: - Backup currently doesn't understand rsgroups. Which results into 2 problems. Say there are 2 rsgroups. RsgroupA, RsgroupB. tableA is part of RsgroupA and tableB is part of RsgroupB. rs1A, rs2A, rs3A are part of RsgroupA. rs1B, rs2B, rs3B are part of rsgroupB. Problem 1: When you enable backup on tableA, then only rs1A, rs2A, rs3A should participate in backup (WAL's of these regionservers are backed up). Since backup doesn't understand rsgroup, all regionservers participate in backup rs1A, rs2A, rs3A, rs1B, rs2B, rs3B. Which means, you need to plan for additional capacity requirement for additional WAL's, and other problems. Problem 2: BackupLogCleaner also doesn't understand rsgroups with incremental backup enabled. This can result into a big problem. In the above example, say backup is configured for only TableA. Hence BackupLogCleaner cleans up WAL's of only rs1A, rs2A, rs3A once a backup is completed. WAL's of rs1B, rs2B, rs3B are never cleaned up because there is no table backup configured and every growing WAL's will fill up the disk easily (since WAL's are not compressed). [~zhangduo] Hope this is enough details. Please ask anything you did not understand. P.S: Thank you very much for taking time to look into this. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26322: Affects Version/s: 3.0.0-alpha-2 (was: 3.0.0-alpha-1) > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452402#comment-17452402 ] Mallikarjun edited comment on HBASE-26322 at 12/2/21, 1:16 PM: --- [~zhangduo] [~stack] [~anoop.hbase] Kindly help me with this review when you can make some free time. Thanks was (Author: rda3mon): [~zhangduo] [~stack] [~anoop.hbase] Kindly help me with this review when you can make some free time. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-2 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452404#comment-17452404 ] Mallikarjun commented on HBASE-26034: - [~zhangduo] [~stack] [~anoop.hbase] Kindly help me on this review when you can find some free time. > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452404#comment-17452404 ] Mallikarjun edited comment on HBASE-26034 at 12/2/21, 1:16 PM: --- [~zhangduo] [~stack] [~anoop.hbase] Kindly help me on this review when you can find some free time. Thanks was (Author: rda3mon): [~zhangduo] [~stack] [~anoop.hbase] Kindly help me on this review when you can find some free time. > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452402#comment-17452402 ] Mallikarjun commented on HBASE-26322: - [~zhangduo] [~stack] [~anoop.hbase] Kindly help me with this review when you can make some free time. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-2 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430482#comment-17430482 ] Mallikarjun edited comment on HBASE-26034 at 10/19/21, 11:31 AM: - [~zhangduo] [~stack] [~anoop.hbase] Patch for this is ready for review. Please someone have a look when you have some time. was (Author: rda3mon): [~zhangduo] [~stack] [~anoop.hbase] Patch for this is ready for review. Please someone have a look when you have some time. [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13404591] > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430482#comment-17430482 ] Mallikarjun commented on HBASE-26034: - [~zhangduo] [~stack] [~anoop.hbase] Patch for this is ready for review. Please someone have a look when you have some time. [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13404591] > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26343) Extend RSGroup to support data isolation to achieve true multitenancy in Hbase
[ https://issues.apache.org/jira/browse/HBASE-26343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26343: Description: RSGroups currently only provide isolation on serving layer, but not on the data layer. And there is a need for providing data isolation between rsgroups to achieve true multitenancy in hbase leading to independently scale individual rsgroups on need bases. Some of the aspects to be covered in this umbrella project are # Provide data isolation between different RSGroups # Add balancer support to understand this construct while performing balancer activity # Extend support on various ancillary services such as export snapshot, cluster replication, etc was: RSGroups currently only provide isolation on serving layer, but not on the data layer. And there is a need for providing data isolation between rsgroups to achieve true multitenancy in hbase leading to independently scale individual rsgroups on need bases. Some of the aspects to be covered in this umbrella project are # Provide data isolation between different RSGroups # Add balancer support to understand this construct to perform various balancing activities # Extend support on various ancillary services such as export snapshot, cluster replication, etc > Extend RSGroup to support data isolation to achieve true multitenancy in Hbase > -- > > Key: HBASE-26343 > URL: https://issues.apache.org/jira/browse/HBASE-26343 > Project: HBase > Issue Type: Umbrella > Components: rsgroup >Reporter: Mallikarjun >Priority: Major > > RSGroups currently only provide isolation on serving layer, but not on the > data layer. And there is a need for providing data isolation between rsgroups > to achieve true multitenancy in hbase leading to independently scale > individual rsgroups on need bases. Some of the aspects to be covered in this > umbrella project are > # Provide data isolation between different RSGroups > # Add balancer support to understand this construct while performing > balancer activity > # Extend support on various ancillary services such as export snapshot, > cluster replication, etc -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26343) Extend RSGroup to support data isolation to achieve true multitenancy in Hbase
[ https://issues.apache.org/jira/browse/HBASE-26343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26343: Description: RSGroups currently only provide isolation on serving layer, but not on the data layer. And there is a need for providing data isolation between rsgroups to achieve true multitenancy in hbase leading to independently scale individual rsgroups on need bases. Some of the aspects to be covered in this umbrella project are # Provide data isolation between different RSGroups # Add balancer support to understand this construct to perform various balancing activities # Extend support on various ancillary services such as export snapshot, cluster replication, etc was: RSGroups currently only provide isolation on serving layer, but not on the data layer. And there is a need for providing data isolation between rsgroups to achieve true multitenancy in hbase leading to independently scale individual rsgroups on need bases. Some of the aspects to be covered in this umbrella project are # Provide data isolation between different RSGroups # Add balancer support to understand this construct on various balancing activities # Extend support on various ancillary services such as export snapshot, cluster replication, etc > Extend RSGroup to support data isolation to achieve true multitenancy in Hbase > -- > > Key: HBASE-26343 > URL: https://issues.apache.org/jira/browse/HBASE-26343 > Project: HBase > Issue Type: Umbrella > Components: rsgroup >Reporter: Mallikarjun >Priority: Major > > RSGroups currently only provide isolation on serving layer, but not on the > data layer. And there is a need for providing data isolation between rsgroups > to achieve true multitenancy in hbase leading to independently scale > individual rsgroups on need bases. Some of the aspects to be covered in this > umbrella project are > # Provide data isolation between different RSGroups > # Add balancer support to understand this construct to perform various > balancing activities > # Extend support on various ancillary services such as export snapshot, > cluster replication, etc -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26346) Design support for rsgroup data isolation
[ https://issues.apache.org/jira/browse/HBASE-26346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26346: Description: Put down design for changes required to support rsgroup data isolation. (was: TODO) > Design support for rsgroup data isolation > -- > > Key: HBASE-26346 > URL: https://issues.apache.org/jira/browse/HBASE-26346 > Project: HBase > Issue Type: New Feature > Components: rsgroup >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Put down design for changes required to support rsgroup data isolation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-26346) Design support for rsgroup data isolation
[ https://issues.apache.org/jira/browse/HBASE-26346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26346 started by Mallikarjun. --- > Design support for rsgroup data isolation > -- > > Key: HBASE-26346 > URL: https://issues.apache.org/jira/browse/HBASE-26346 > Project: HBase > Issue Type: New Feature > Components: rsgroup >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Put down design for changes required to support rsgroup data isolation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26346) Design support for rsgroup data isolation
Mallikarjun created HBASE-26346: --- Summary: Design support for rsgroup data isolation Key: HBASE-26346 URL: https://issues.apache.org/jira/browse/HBASE-26346 Project: HBase Issue Type: New Feature Components: rsgroup Reporter: Mallikarjun Assignee: Mallikarjun TODO -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26343) Extend RSGroup to support data isolation to achieve true multitenancy in Hbase
Mallikarjun created HBASE-26343: --- Summary: Extend RSGroup to support data isolation to achieve true multitenancy in Hbase Key: HBASE-26343 URL: https://issues.apache.org/jira/browse/HBASE-26343 Project: HBase Issue Type: Umbrella Components: rsgroup Reporter: Mallikarjun RSGroups currently only provide isolation on serving layer, but not on the data layer. And there is a need for providing data isolation between rsgroups to achieve true multitenancy in hbase leading to independently scale individual rsgroups on need bases. Some of the aspects to be covered in this umbrella project are # Provide data isolation between different RSGroups # Add balancer support to understand this construct on various balancing activities # Extend support on various ancillary services such as export snapshot, cluster replication, etc -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Release Note: * Remove dependence on storing WAL filenames for backup in backup:system meta table > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > *Use Case 1.* > *Existing Design:* To cleanup WAL's for which backup is already taken using > `BackupLogCleaner`. Which uses this references to clean up backed up logs. > *New Design:* > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > *Use Case 2.* > *Existing Design:* During incremental backup, to check system table if there > are any duplicate WAL's for which backup is taken again. > *New Design:* > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424838#comment-17424838 ] Mallikarjun commented on HBASE-26322: - [~zhangduo] [~stack] [~anoop.hbase] Patch for this is ready for review. Please someone have a look when you have some time. > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-2 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26322 started by Mallikarjun. --- > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-2 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26322) Add rsgroup support for Backup
[ https://issues.apache.org/jira/browse/HBASE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26322: Description: There are some places where backup needs some changes with respect to rsgroup. Some of them being addressed here are # Incremental backup wal backup should happen only for servers which are part of a particular rsgroup under which namespace is configured for table backup under consideration # BackupLogCleaner should keep references only from those servers which are part of a particular rsgroup under which namesapce is configured for table backup under consideration > Add rsgroup support for Backup > -- > > Key: HBASE-26322 > URL: https://issues.apache.org/jira/browse/HBASE-26322 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Minor > Fix For: 3.0.0-alpha-2 > > > There are some places where backup needs some changes with respect to > rsgroup. Some of them being addressed here are > # Incremental backup wal backup should happen only for servers which are > part of a particular rsgroup under which namespace is configured for table > backup under consideration > # BackupLogCleaner should keep references only from those servers which are > part of a particular rsgroup under which namesapce is configured for table > backup under consideration -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26322) Add rsgroup support for Backup
Mallikarjun created HBASE-26322: --- Summary: Add rsgroup support for Backup Key: HBASE-26322 URL: https://issues.apache.org/jira/browse/HBASE-26322 Project: HBase Issue Type: Improvement Components: backuprestore Affects Versions: 3.0.0-alpha-1 Reporter: Mallikarjun Assignee: Mallikarjun Fix For: 3.0.0-alpha-2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-26301) Backport backup/restore to branch-2
[ https://issues.apache.org/jira/browse/HBASE-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422497#comment-17422497 ] Mallikarjun edited comment on HBASE-26301 at 9/30/21, 2:10 AM: --- [~bbeaudreault] I have a couple of features in pipeline for master branch. Post that will brackport to 2.x. Assigned this ticket to myself. was (Author: rda3mon): [~bbeaudreault] I have a couple of features in pipeline for master branch. Post that will brackport to 2.x. Assigned this ticket to me. > Backport backup/restore to branch-2 > --- > > Key: HBASE-26301 > URL: https://issues.apache.org/jira/browse/HBASE-26301 > Project: HBase > Issue Type: New Feature >Reporter: Bryan Beaudreault >Assignee: Mallikarjun >Priority: Major > > I was discussing this great feature with [~rda3mon] on Slack. His company is > using this on their fork of hbase 2.1. We're working on upgrading to 2.4 now, > and have our own home grown backup/restore system which is not as > sophisticated as the native solution. If this solution was backported to > branch-2, we would strongly consider adopting it as we finish up our upgrade. > It looks like this was originally cut from 2.0 due to release timeline > pressures: https://issues.apache.org/jira/browse/HBASE-19407, and now suffers > from a lack of community support. This might make sense since it only exists > in 3.x, which is not yet released. > It would be great to backport this to branch-2 so that it reach a wider > audience and adoption -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-26301) Backport backup/restore to branch-2
[ https://issues.apache.org/jira/browse/HBASE-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422497#comment-17422497 ] Mallikarjun edited comment on HBASE-26301 at 9/30/21, 2:10 AM: --- [~bbeaudreault] I have a couple of features in pipeline for master branch. Post that will brackport to 2.x. Assigned this ticket to me. was (Author: rda3mon): [~bbeaudreault] I have a couple of features in pipeline for master branch. Post that will brackport to 2.x. > Backport backup/restore to branch-2 > --- > > Key: HBASE-26301 > URL: https://issues.apache.org/jira/browse/HBASE-26301 > Project: HBase > Issue Type: New Feature >Reporter: Bryan Beaudreault >Assignee: Mallikarjun >Priority: Major > > I was discussing this great feature with [~rda3mon] on Slack. His company is > using this on their fork of hbase 2.1. We're working on upgrading to 2.4 now, > and have our own home grown backup/restore system which is not as > sophisticated as the native solution. If this solution was backported to > branch-2, we would strongly consider adopting it as we finish up our upgrade. > It looks like this was originally cut from 2.0 due to release timeline > pressures: https://issues.apache.org/jira/browse/HBASE-19407, and now suffers > from a lack of community support. This might make sense since it only exists > in 3.x, which is not yet released. > It would be great to backport this to branch-2 so that it reach a wider > audience and adoption -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26301) Backport backup/restore to branch-2
[ https://issues.apache.org/jira/browse/HBASE-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422497#comment-17422497 ] Mallikarjun commented on HBASE-26301: - [~bbeaudreault] I have a couple of features in pipeline for master branch. Post that will brackport to 2.x. > Backport backup/restore to branch-2 > --- > > Key: HBASE-26301 > URL: https://issues.apache.org/jira/browse/HBASE-26301 > Project: HBase > Issue Type: New Feature >Reporter: Bryan Beaudreault >Assignee: Mallikarjun >Priority: Major > > I was discussing this great feature with [~rda3mon] on Slack. His company is > using this on their fork of hbase 2.1. We're working on upgrading to 2.4 now, > and have our own home grown backup/restore system which is not as > sophisticated as the native solution. If this solution was backported to > branch-2, we would strongly consider adopting it as we finish up our upgrade. > It looks like this was originally cut from 2.0 due to release timeline > pressures: https://issues.apache.org/jira/browse/HBASE-19407, and now suffers > from a lack of community support. This might make sense since it only exists > in 3.x, which is not yet released. > It would be great to backport this to branch-2 so that it reach a wider > audience and adoption -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-26301) Backport backup/restore to branch-2
[ https://issues.apache.org/jira/browse/HBASE-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun reassigned HBASE-26301: --- Assignee: Mallikarjun > Backport backup/restore to branch-2 > --- > > Key: HBASE-26301 > URL: https://issues.apache.org/jira/browse/HBASE-26301 > Project: HBase > Issue Type: New Feature >Reporter: Bryan Beaudreault >Assignee: Mallikarjun >Priority: Major > > I was discussing this great feature with [~rda3mon] on Slack. His company is > using this on their fork of hbase 2.1. We're working on upgrading to 2.4 now, > and have our own home grown backup/restore system which is not as > sophisticated as the native solution. If this solution was backported to > branch-2, we would strongly consider adopting it as we finish up our upgrade. > It looks like this was originally cut from 2.0 due to release timeline > pressures: https://issues.apache.org/jira/browse/HBASE-19407, and now suffers > from a lack of community support. This might make sense since it only exists > in 3.x, which is not yet released. > It would be great to backport this to branch-2 so that it reach a wider > audience and adoption -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413942#comment-17413942 ] Mallikarjun commented on HBASE-25891: - I was under impression it is somewhere else and not Jira :) > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > *Use Case 1.* > *Existing Design:* To cleanup WAL's for which backup is already taken using > `BackupLogCleaner`. Which uses this references to clean up backed up logs. > *New Design:* > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > *Use Case 2.* > *Existing Design:* During incremental backup, to check system table if there > are any duplicate WAL's for which backup is taken again. > *New Design:* > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413844#comment-17413844 ] Mallikarjun edited comment on HBASE-25891 at 9/13/21, 1:57 AM: --- Thanks [~zhangduo] [~stack] for reviewing the code. Thanks [~anoop.hbase] for assisting on these changes. [~zhangduo] Where to fill release notes. any pointers? was (Author: rda3mon): Thanks [~zhangduo] [~stack] for reviewing the code. Thanks [~anoop.hbase] for assisting on this PR. [~zhangduo] Where to fill release notes. any pointers? > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > *Use Case 1.* > *Existing Design:* To cleanup WAL's for which backup is already taken using > `BackupLogCleaner`. Which uses this references to clean up backed up logs. > *New Design:* > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > *Use Case 2.* > *Existing Design:* During incremental backup, to check system table if there > are any duplicate WAL's for which backup is taken again. > *New Design:* > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413844#comment-17413844 ] Mallikarjun commented on HBASE-25891: - Thanks [~zhangduo] [~stack] for reviewing the code. Thanks [~anoop.hbase] for assisting on this PR. [~zhangduo] Where to fill release notes. any pointers? > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > *Use Case 1.* > *Existing Design:* To cleanup WAL's for which backup is already taken using > `BackupLogCleaner`. Which uses this references to clean up backed up logs. > *New Design:* > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > *Use Case 2.* > *Existing Design:* During incremental backup, to check system table if there > are any duplicate WAL's for which backup is taken again. > *New Design:* > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26279) Merger of backup:system table with hbase:meta table
Mallikarjun created HBASE-26279: --- Summary: Merger of backup:system table with hbase:meta table Key: HBASE-26279 URL: https://issues.apache.org/jira/browse/HBASE-26279 Project: HBase Issue Type: Improvement Components: backuprestore Reporter: Mallikarjun Assignee: Mallikarjun To Be filled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26034: Description: *Existing Design:* !existing_design.png|width=632,height=1238! *Proposed Changes:* *!proposed_design.png|width=637,height=1300!* was: *Existing Design:* !existing_design.png|width=637,height=1248! Changes: !proposed_design.png|width=626,height=1277! > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=632,height=1238! > *Proposed Changes:* > *!proposed_design.png|width=637,height=1300!* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26034: Attachment: proposed_design.png > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > Attachments: existing_design.png, proposed_design.png > > > *Existing Design:* > !existing_design.png|width=637,height=1248! > Changes: > !proposed_design.png|width=626,height=1277! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26034: Attachment: existing_design.png > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > Attachments: existing_design.png > > > *Existing Design:* > !existing_design.png|width=637,height=1248! > Changes: > !proposed_design.png|width=626,height=1277! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26034 started by Mallikarjun. --- > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > *Existing Design:* > !existing_design.png|width=637,height=1248! > Changes: > !proposed_design.png|width=626,height=1277! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26034: Description: *Existing Design:* !existing_design.png|width=637,height=1248! Changes: !proposed_design.png|width=626,height=1277! was:Details to be filled. > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > *Existing Design:* > !existing_design.png|width=637,height=1248! > Changes: > !proposed_design.png|width=626,height=1277! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Description: *Existing Design* !existing_design.png|width=851,height=1667! *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. *Proposed Design.* !proposed_design.png|width=865,height=1766! was: *Existing Design* *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. *Proposed Design.*
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Attachment: proposed_design.png > Support for Parallel Backups enabling multi tenancy with rsgroups > - > > Key: HBASE-25784 > URL: https://issues.apache.org/jira/browse/HBASE-25784 > Project: HBase > Issue Type: Umbrella > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Labels: backup > Attachments: existing_design.png, proposed_design.png > > > *Existing Design* > > *Problem 1:* > With this design, Incremental and Full backup can't be run in parallel and > leading to degraded RPO's in case Full backup is of longer duration esp for > large tables. > > Example: > Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes > and you are allowed to ship the remote backup with 800 Mbps. And you are > allowed to take Full Backups once in a week and rest of them should be > incremental backups > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > *Proposed Solution:* Barring some critical sections such as modifying state > of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > *Problem 2:* > With one backup at a time, it fails easily for a multi-tenant system. This > poses following problems > * Admins will not be able to achieve required RPO's for their tables because > of dependence on other tenants present in the system. As one tenant doesn't > have control over other tenants' table sizes and hence the duration of the > backup > * Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > *Proposed Solution:* Same as previous proposal > > *Problem 3:* > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are > never cleaned up until the next backup (Full / Incremental) is taken. This > poses following problem > * WAL's can grow unbounded in case there are transient problems like backup > site facing issues or anything else until next backup scheduled goes > successful > *Proposed Solution:* I can't think of anything better, but I see this can be > a potential problem. Also, one can force full backup if required WAL files > are missing for whatever other reasons not necessarily mentioned above. > > *Proposed Design.* > !image-2021-06-03-16-34-34-957.png|width=324,height=416! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Attachment: existing_design.png > Support for Parallel Backups enabling multi tenancy with rsgroups > - > > Key: HBASE-25784 > URL: https://issues.apache.org/jira/browse/HBASE-25784 > Project: HBase > Issue Type: Umbrella > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Labels: backup > Attachments: existing_design.png, proposed_design.png > > > *Existing Design* > > *Problem 1:* > With this design, Incremental and Full backup can't be run in parallel and > leading to degraded RPO's in case Full backup is of longer duration esp for > large tables. > > Example: > Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes > and you are allowed to ship the remote backup with 800 Mbps. And you are > allowed to take Full Backups once in a week and rest of them should be > incremental backups > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > *Proposed Solution:* Barring some critical sections such as modifying state > of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > *Problem 2:* > With one backup at a time, it fails easily for a multi-tenant system. This > poses following problems > * Admins will not be able to achieve required RPO's for their tables because > of dependence on other tenants present in the system. As one tenant doesn't > have control over other tenants' table sizes and hence the duration of the > backup > * Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > *Proposed Solution:* Same as previous proposal > > *Problem 3:* > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are > never cleaned up until the next backup (Full / Incremental) is taken. This > poses following problem > * WAL's can grow unbounded in case there are transient problems like backup > site facing issues or anything else until next backup scheduled goes > successful > *Proposed Solution:* I can't think of anything better, but I see this can be > a potential problem. Also, one can force full backup if required WAL files > are missing for whatever other reasons not necessarily mentioned above. > > *Proposed Design.* > !image-2021-06-03-16-34-34-957.png|width=324,height=416! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Attachment: (was: image-2021-06-03-16-34-34-957.png) > Support for Parallel Backups enabling multi tenancy with rsgroups > - > > Key: HBASE-25784 > URL: https://issues.apache.org/jira/browse/HBASE-25784 > Project: HBase > Issue Type: Umbrella > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Labels: backup > > *Existing Design* > > *Problem 1:* > With this design, Incremental and Full backup can't be run in parallel and > leading to degraded RPO's in case Full backup is of longer duration esp for > large tables. > > Example: > Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes > and you are allowed to ship the remote backup with 800 Mbps. And you are > allowed to take Full Backups once in a week and rest of them should be > incremental backups > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > *Proposed Solution:* Barring some critical sections such as modifying state > of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > *Problem 2:* > With one backup at a time, it fails easily for a multi-tenant system. This > poses following problems > * Admins will not be able to achieve required RPO's for their tables because > of dependence on other tenants present in the system. As one tenant doesn't > have control over other tenants' table sizes and hence the duration of the > backup > * Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > *Proposed Solution:* Same as previous proposal > > *Problem 3:* > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are > never cleaned up until the next backup (Full / Incremental) is taken. This > poses following problem > * WAL's can grow unbounded in case there are transient problems like backup > site facing issues or anything else until next backup scheduled goes > successful > *Proposed Solution:* I can't think of anything better, but I see this can be > a potential problem. Also, one can force full backup if required WAL files > are missing for whatever other reasons not necessarily mentioned above. > > *Proposed Design.* > !image-2021-06-03-16-34-34-957.png|width=324,height=416! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Attachment: (was: image-2021-06-03-16-33-59-282.png) > Support for Parallel Backups enabling multi tenancy with rsgroups > - > > Key: HBASE-25784 > URL: https://issues.apache.org/jira/browse/HBASE-25784 > Project: HBase > Issue Type: Umbrella > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Labels: backup > > *Existing Design* > > *Problem 1:* > With this design, Incremental and Full backup can't be run in parallel and > leading to degraded RPO's in case Full backup is of longer duration esp for > large tables. > > Example: > Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes > and you are allowed to ship the remote backup with 800 Mbps. And you are > allowed to take Full Backups once in a week and rest of them should be > incremental backups > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > *Proposed Solution:* Barring some critical sections such as modifying state > of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > *Problem 2:* > With one backup at a time, it fails easily for a multi-tenant system. This > poses following problems > * Admins will not be able to achieve required RPO's for their tables because > of dependence on other tenants present in the system. As one tenant doesn't > have control over other tenants' table sizes and hence the duration of the > backup > * Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > *Proposed Solution:* Same as previous proposal > > *Problem 3:* > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are > never cleaned up until the next backup (Full / Incremental) is taken. This > poses following problem > * WAL's can grow unbounded in case there are transient problems like backup > site facing issues or anything else until next backup scheduled goes > successful > *Proposed Solution:* I can't think of anything better, but I see this can be > a potential problem. Also, one can force full backup if required WAL files > are missing for whatever other reasons not necessarily mentioned above. > > *Proposed Design.* > !image-2021-06-03-16-34-34-957.png|width=324,height=416! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Description: *Existing Design* *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. *Proposed Design.* !image-2021-06-03-16-34-34-957.png|width=324,height=416! was: *Existing Design* !Backup Flow Chart.png|width=825,height=1617! *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. *Proposed Design.*
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Attachment: (was: Backup Flow Chart.png) > Support for Parallel Backups enabling multi tenancy with rsgroups > - > > Key: HBASE-25784 > URL: https://issues.apache.org/jira/browse/HBASE-25784 > Project: HBase > Issue Type: Umbrella > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Labels: backup > > *Existing Design* > > *Problem 1:* > With this design, Incremental and Full backup can't be run in parallel and > leading to degraded RPO's in case Full backup is of longer duration esp for > large tables. > > Example: > Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes > and you are allowed to ship the remote backup with 800 Mbps. And you are > allowed to take Full Backups once in a week and rest of them should be > incremental backups > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > *Proposed Solution:* Barring some critical sections such as modifying state > of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > *Problem 2:* > With one backup at a time, it fails easily for a multi-tenant system. This > poses following problems > * Admins will not be able to achieve required RPO's for their tables because > of dependence on other tenants present in the system. As one tenant doesn't > have control over other tenants' table sizes and hence the duration of the > backup > * Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > *Proposed Solution:* Same as previous proposal > > *Problem 3:* > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are > never cleaned up until the next backup (Full / Incremental) is taken. This > poses following problem > * WAL's can grow unbounded in case there are transient problems like backup > site facing issues or anything else until next backup scheduled goes > successful > *Proposed Solution:* I can't think of anything better, but I see this can be > a potential problem. Also, one can force full backup if required WAL files > are missing for whatever other reasons not necessarily mentioned above. > > *Proposed Design.* > !image-2021-06-03-16-34-34-957.png|width=324,height=416! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Attachment: Backup Flow Chart.png > Support for Parallel Backups enabling multi tenancy with rsgroups > - > > Key: HBASE-25784 > URL: https://issues.apache.org/jira/browse/HBASE-25784 > Project: HBase > Issue Type: Umbrella > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Labels: backup > Attachments: Backup Flow Chart.png, > image-2021-06-03-16-33-59-282.png, image-2021-06-03-16-34-34-957.png > > > *Existing Design* > *!image-2021-06-03-16-33-59-282.png|width=292,height=408!* > *Problem 1:* > With this design, Incremental and Full backup can't be run in parallel and > leading to degraded RPO's in case Full backup is of longer duration esp for > large tables. > > Example: > Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes > and you are allowed to ship the remote backup with 800 Mbps. And you are > allowed to take Full Backups once in a week and rest of them should be > incremental backups > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > *Proposed Solution:* Barring some critical sections such as modifying state > of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > *Problem 2:* > With one backup at a time, it fails easily for a multi-tenant system. This > poses following problems > * Admins will not be able to achieve required RPO's for their tables because > of dependence on other tenants present in the system. As one tenant doesn't > have control over other tenants' table sizes and hence the duration of the > backup > * Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > *Proposed Solution:* Same as previous proposal > > *Problem 3:* > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are > never cleaned up until the next backup (Full / Incremental) is taken. This > poses following problem > * WAL's can grow unbounded in case there are transient problems like backup > site facing issues or anything else until next backup scheduled goes > successful > *Proposed Solution:* I can't think of anything better, but I see this can be > a potential problem. Also, one can force full backup if required WAL files > are missing for whatever other reasons not necessarily mentioned above. > > *Proposed Design.* > !image-2021-06-03-16-34-34-957.png|width=324,height=416! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Description: *Existing Design* !Backup Flow Chart.png|width=825,height=1617! *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. *Proposed Design.* !image-2021-06-03-16-34-34-957.png|width=324,height=416! was: *Existing Design* *!image-2021-06-03-16-33-59-282.png|width=292,height=408!* *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned
[jira] [Updated] (HBASE-26203) Minor cleanups to reduce checkstyle warnings on backup code
[ https://issues.apache.org/jira/browse/HBASE-26203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26203: Description: As suggested in this PR --> [https://github.com/apache/hbase/pull/3359#pullrequestreview-716511415] Created this issue to clean up Backup classes to reduce checkstyle warnings. was:`WALProcedureStore` stands deprecated. Review its usage in Backup/Restore > Minor cleanups to reduce checkstyle warnings on backup code > --- > > Key: HBASE-26203 > URL: https://issues.apache.org/jira/browse/HBASE-26203 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Trivial > Fix For: 3.0.0-alpha-2 > > > As suggested in this PR --> > [https://github.com/apache/hbase/pull/3359#pullrequestreview-716511415] > Created this issue to clean up Backup classes to reduce checkstyle warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26203) Minor cleanups to reduce checkstyle warnings on backup code
Mallikarjun created HBASE-26203: --- Summary: Minor cleanups to reduce checkstyle warnings on backup code Key: HBASE-26203 URL: https://issues.apache.org/jira/browse/HBASE-26203 Project: HBase Issue Type: Improvement Components: backuprestore Affects Versions: 3.0.0-alpha-2 Reporter: Mallikarjun Assignee: Mallikarjun Fix For: 3.0.0-alpha-2 `WALProcedureStore` stands deprecated. Review its usage in Backup/Restore -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26202) Review deprecated WALProcedureStore usage in Backup
[ https://issues.apache.org/jira/browse/HBASE-26202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26202: Description: `WALProcedureStore` stands deprecated. Review its usage in Backup/Restore > Review deprecated WALProcedureStore usage in Backup > --- > > Key: HBASE-26202 > URL: https://issues.apache.org/jira/browse/HBASE-26202 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Trivial > Fix For: 3.0.0-alpha-2 > > > `WALProcedureStore` stands deprecated. Review its usage in Backup/Restore -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26202) Review deprecated WALProcedureStore usage in Backup
Mallikarjun created HBASE-26202: --- Summary: Review deprecated WALProcedureStore usage in Backup Key: HBASE-26202 URL: https://issues.apache.org/jira/browse/HBASE-26202 Project: HBase Issue Type: Improvement Components: backuprestore Affects Versions: 3.0.0-alpha-2 Reporter: Mallikarjun Assignee: Mallikarjun Fix For: 3.0.0-alpha-2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-26147) Add dry run mode to hbase balancer
[ https://issues.apache.org/jira/browse/HBASE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389189#comment-17389189 ] Mallikarjun edited comment on HBASE-26147 at 7/29/21, 2:41 AM: --- [~bbeaudreault] This feature is going to be super useful. This comment is not directly on feature as such. # Command name could be inline with existing balancer. May be `balancer 'dry_run'` or something similar to `force`. # Rsgroup support should also be considered in this PR. was (Author: rda3mon): [~bbeaudreault] This comment is not directly on feature as such. # Command name could be inline with existing balancer. May be `balancer 'dry_run'` or something similar to `force`. # Rsgroup support should also be considered in this PR. > Add dry run mode to hbase balancer > -- > > Key: HBASE-26147 > URL: https://issues.apache.org/jira/browse/HBASE-26147 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > It's often rather hard to know how the cost function changes you're making > will affect the balance of the cluster, and currently the only way to know is > to run it. If the cost decisions are not good, you may have just moved many > regions towards a non-ideal balance. Region moves themselves are not free for > clients, and the resulting balance may cause a regression. > We should add a mode to the balancer so that it can be invoked without > actually executing any plans. This will allow an administrator to iterate on > their cost functions and used the balancer's logging to see how their changes > would affect the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26147) Add dry run mode to hbase balancer
[ https://issues.apache.org/jira/browse/HBASE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389189#comment-17389189 ] Mallikarjun commented on HBASE-26147: - [~bbeaudreault] This comment is not directly on feature as such. # Command name could be inline with existing balancer. May be `balancer 'dry_run'` or something similar to `force`. # Rsgroup support should also be considered in this PR. > Add dry run mode to hbase balancer > -- > > Key: HBASE-26147 > URL: https://issues.apache.org/jira/browse/HBASE-26147 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > It's often rather hard to know how the cost function changes you're making > will affect the balance of the cluster, and currently the only way to know is > to run it. If the cost decisions are not good, you may have just moved many > regions towards a non-ideal balance. Region moves themselves are not free for > clients, and the resulting balance may cause a regression. > We should add a mode to the balancer so that it can be invoked without > actually executing any plans. This will allow an administrator to iterate on > their cost functions and used the balancer's logging to see how their changes > would affect the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386921#comment-17386921 ] Mallikarjun edited comment on HBASE-25891 at 7/25/21, 5:08 PM: --- [~zhangduo] Corrected the description (Agree it was confusing). Let me know if there are any specific part which requires clarification Summarizing the changes: # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) Also list grows huge for large clusters like we have 300 node cluster and incremental backup can be performed often # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` was (Author: rda3mon): [~zhangduo] # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) Also list grows huge for large clusters like we have 300 node cluster and incremental backup can be performed often # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` Correcting the desciption. Let me know if there any specific part which requires clarification > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > *Use Case 1.* > *Existing Design:* To cleanup WAL's for which backup is already taken using > `BackupLogCleaner`. Which uses this references to clean up backed up logs. > *New Design:* > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest
[jira] [Updated] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Context: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} Also, Every backup (Incremental and Full) performs a log roll just before taking backup and stores what was the timestamp at which log roll was performed per regionserver per backup using following format. {code:java} // code placeholder rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 column=meta:rs-log-ts, timestamp=1622887363275, value=\x00\x00\x01y\xDB\x81\x85 {code} There are 2 cases for which WAL log refrences stored in `backup:system` and are being used. *Use Case 1.* *Existing Design:* To cleanup WAL's for which backup is already taken using `BackupLogCleaner`. Which uses this references to clean up backed up logs. *New Design:* Since log roll timestamp is stored as part of backup per regionserver. We can check all previous successfull backup's and then identify which logs are to be retained and which ones are to be cleaned up as follows * Identify which are the latest successful backups performed per table. * Per backup identified above, identify what is the oldest log rolled timestamp perfomed per regionserver per table. * All those WAL's which are older than oldest log rolled timestamp perfomed for any table backed can be removed by `BackupLogCleaner` *Use Case 2.* *Existing Design:* During incremental backup, to check system table if there are any duplicate WAL's for which backup is taken again. *New Design:* * Incremental backup already identifies which all WAL's to be backed up using `rslogts:` mentioned above. * Additionally it checks `wals:` to ensure no logs are backuped for second time. And this is redundant and not seen any extra benefit. was: Context: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} Also, Every backup (Incremental and Full) performs a log roll just before taking backup and stores what was the timestamp at which log roll was performed per regionserver per backup using following format. {code:java} // code placeholder rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 column=meta:rs-log-ts, timestamp=1622887363275, value=\x00\x00\x01y\xDB\x81\x85 {code} There are 2 cases for which WAL log
[jira] [Comment Edited] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386921#comment-17386921 ] Mallikarjun edited comment on HBASE-25891 at 7/25/21, 5:02 PM: --- [~zhangduo] # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) Also list grows huge for large clusters like we have 300 node cluster and incremental backup can be performed often # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` Correcting the desciption. Let me know if there any specific part which requires clarification was (Author: rda3mon): [~zhangduo] # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) Also list grows huge for large clusters like we have 300 node cluster and incremental backup can be performed often # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` Correcting the title. Let me know if there any specific part which requires clarification > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there
[jira] [Comment Edited] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386921#comment-17386921 ] Mallikarjun edited comment on HBASE-25891 at 7/25/21, 5:01 PM: --- [~zhangduo] # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) Also list grows huge for large clusters like we have 300 node cluster and incremental backup can be performed often # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` Correcting the title. Let me know if there any specific part which requires clarification was (Author: rda3mon): [~zhangduo] # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) Also list grows huge for large clusters like we have 300 node cluster and incremental backup can be performed often # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` Correcting the title. > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > *
[jira] [Comment Edited] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386921#comment-17386921 ] Mallikarjun edited comment on HBASE-25891 at 7/25/21, 5:00 PM: --- [~zhangduo] # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) Also list grows huge for large clusters like we have 300 node cluster and incremental backup can be performed often # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` Correcting the title. was (Author: rda3mon): [~zhangduo] # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` Correcting the title. > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time.
[jira] [Commented] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386921#comment-17386921 ] Mallikarjun commented on HBASE-25891: - [~zhangduo] # Every regionserver WAL names are stored in `*_backup:system_*` as a reference used by BackupLogCleaner. This is unnecessary as every backup meta infromation stores timestamp at which backup was initiated (Backup WAL roll) and BackupLogCleaner can make use of it to clean backed up WAL logs. (I have given example for the same in description) # `_*tableSetTimestampMap*_` field present in `_*BackupInfo*_` but missed out while storing in `_*backup:system*_`. This is useful for some scenarios like BackupLogCleaner so added it to `Backup.proto` Correcting the title. > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Summary: Remove dependence on storing WAL filenames for backup (was: Remove dependence storing WAL filenames for backup) > Remove dependence on storing WAL filenames for backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence on storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Context: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} Also, Every backup (Incremental and Full) performs a log roll just before taking backup and stores what was the timestamp at which log roll was performed per regionserver per backup using following format. {code:java} // code placeholder rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 column=meta:rs-log-ts, timestamp=1622887363275, value=\x00\x00\x01y\xDB\x81\x85 {code} There are 2 cases for which WAL log refrences stored in `backup:system` and are being used. 1. To cleanup WAL's for which backup is already taken using `BackupLogCleaner` Since log roll timestamp is stored as part of backup per regionserver. We can check all previous successfull backup's and then identify which logs are to be retained and which ones are to be cleaned up as follows * Identify which are the latest successful backups performed per table. * Per backup identified above, identify what is the oldest log rolled timestamp perfomed per regionserver per table. * All those WAL's which are older than oldest log rolled timestamp perfomed for any table backed can be removed by `BackupLogCleaner` 2. During incremental backup, to check system table if there are any duplicate WAL's for which backup is taken again. * Incremental backup already identifies which all WAL's to be backed up using `rslogts:` mentioned above. * Additionally it checks `wals:` to ensure no logs are backuped for second time. And this is redundant and not seen any extra benefit. was: Context: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} Also, Every backup (Incremental and Full) performs a log roll just before taking backup and stores what was the timestamp at which log roll was performed per regionserver per backup using following format. {code:java} // code placeholder rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 column=meta:rs-log-ts, timestamp=1622887363275, value=\x00\x00\x01y\xDB\x81\x85 {code} There are 2 cases for which WAL log refrences stored in `backup:system` and are being used. 1. To cleanup WAL's for which backup is already taken using `BackupLogCleaner` Since log
[jira] [Commented] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382994#comment-17382994 ] Mallikarjun commented on HBASE-25891: - This will be a much needed enhancement in terms of being able to extend this functionality to rsgroups and things to come for hbase backup restore. Esp before hbase 3.0 goes live, as this changes information stored in meta. Would like someone to spend time in reviewing this. [~anoop.hbase] [~zhangduo] > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26034: Description: Details to be filled. (was: TODO:) > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > Details to be filled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26034) Add support to take parallel backups
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26034: Summary: Add support to take parallel backups (was: Add support to take multiple parallel backup) > Add support to take parallel backups > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > TODO: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370227#comment-17370227 ] Mallikarjun commented on HBASE-25891: - [~anoop.hbase] [~stack] [~zhangduo] Can someone help me in getting this reviewed please. > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26034) Add support to take multiple parallel backup
[ https://issues.apache.org/jira/browse/HBASE-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-26034: Description: TODO: > Add support to take multiple parallel backup > > > Key: HBASE-26034 > URL: https://issues.apache.org/jira/browse/HBASE-26034 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-2 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-2 > > > TODO: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26034) Add support to take multiple parallel backup
Mallikarjun created HBASE-26034: --- Summary: Add support to take multiple parallel backup Key: HBASE-26034 URL: https://issues.apache.org/jira/browse/HBASE-26034 Project: HBase Issue Type: Improvement Components: backuprestore Affects Versions: 3.0.0-alpha-2 Reporter: Mallikarjun Assignee: Mallikarjun Fix For: 3.0.0-alpha-2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359302#comment-17359302 ] Mallikarjun commented on HBASE-25891: - [~anoop.hbase] Did you get a chance to look at it? > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358007#comment-17358007 ] Mallikarjun commented on HBASE-25891: - I have updated the description above. Hopefully it can answer your questions [~anoop.hbase]. Adding details specific to questions here. {quote}Means the WAL files will get renamed with this prefix? When those files become eligible for deletion then? {quote} No. They are cleaned up by cleanup chore. Similar to `TimeToLiveLogCleaner` {quote}Now that we dont have this systen table at all, what happens when taking a full/incremental snapshot? {quote} Full backup does snapshot and export. There is no dependence on WAL files. Incremental backup continues to check `rslogts:` to see which regionserver was backed up until what timestamp and based on which WAL files are generated to be backed up. {quote}How WAL files been retained when backup refers to it? When that become eligible for deletion? (Backup deleted/ another full backup came?) And how we make sure we allow WAL deletion then? {quote} We don't need to store list of WAL files for that. We have checkpoints until what point WAL's are read for backup and all those WAL files created beyond that timestamp are eligable for backup automatically. and those created before that timestamp can be cleaned up. > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Context: > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Also, Every backup (Incremental and Full) performs a log roll just before > taking backup and stores what was the timestamp at which log roll was > performed per regionserver per backup using following format. > > {code:java} > // code placeholder > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 > column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 > column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP > rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 > column=meta:rs-log-ts, timestamp=1622887363275, > value=\x00\x00\x01y\xDB\x81\x85 > {code} > > > There are 2 cases for which WAL log refrences stored in `backup:system` and > are being used. > 1. To cleanup WAL's for which backup is already taken using > `BackupLogCleaner` > Since log roll timestamp is stored as part of backup per regionserver. We can > check all previous successfull backup's and then identify which logs are to > be retained and which ones are to be cleaned up as follows > * Identify which are the latest successful backups performed per table. > * Per backup identified above, identify what is the oldest log rolled > timestamp perfomed per regionserver per table. > * All those WAL's which are older than oldest log rolled timestamp perfomed > for any table backed can be removed by `BackupLogCleaner` > > 2. During incremental backup, to check system table if there are any > duplicate WAL's for which backup is taken again. > * Incremental backup already identifies which all WAL's to be backed up > using `rslogts:` mentioned above. > * Additionally it checks `wals:` to ensure no logs are backuped for second > time. And this is redundant and not seen any extra benefit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Context: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} Also, Every backup (Incremental and Full) performs a log roll just before taking backup and stores what was the timestamp at which log roll was performed per regionserver per backup using following format. {code:java} // code placeholder rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 column=meta:rs-log-ts, timestamp=1622887363275, value=\x00\x00\x01y\xDB\x81\x85 {code} There are 2 cases for which WAL log refrences stored in `backup:system` and are being used. 1. To cleanup WAL's for which backup is already taken using `BackupLogCleaner` Since log roll timestamp is stored as part of backup per regionserver. We can check all previous successfull backup's and then identify which logs are to be retained and which ones are to be cleaned up as follows * Identify which are the latest successful backups performed per table. * Per backup identified above, identify what is the oldest log rolled timestamp perfomed per regionserver per table. * All those WAL's which are older than oldest log rolled timestamp perfomed for any table backed can be removed by `BackupLogCleaner` 2. During incremental backup, to check system table if there are any duplicate WAL's for which backup is taken again. * Incremental backup already identifies which all WAL's to be backed up using `rslogts:` mentioned above. * Additionally it checks `wals:` to ensure no logs are backuped for second time. And this is redundant and not seen any extra benefit. was: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} Primarily used for following * WAL's are stored in meta table to check if a particular log has been backed up or not. * Check during incremental backup if a particular WAL is being backed up was covered during previous incremental backup or not. Changes for above 2 use cases. * Since log roll during incremental or full backup is stored with prefix `trslm:`. Can be used to identify which log files can be cleaned up * Check during incremental backup if a particular WAL is being backed up or not is redundant. No such a check is required > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >
[jira] [Commented] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357912#comment-17357912 ] Mallikarjun commented on HBASE-25891: - [~anoop.hbase] I have made certain changes to this PR (not relating to the multi tenancy scope earlier planned) and updated description accordingly what the changes are. Please let me know if this is sufficient. > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Primarily used for following > * WAL's are stored in meta table to check if a particular log has been > backed up or not. > * Check during incremental backup if a particular WAL is being backed up was > covered during previous incremental backup or not. > Changes for above 2 use cases. > * Since log roll during incremental or full backup is stored with prefix > `trslm:`. Can be used to identify which log files can be cleaned up > * Check during incremental backup if a particular WAL is being backed up or > not is redundant. No such a check is required > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} Primarily used for following * WAL's are stored in meta table to check if a particular log has been backed up or not. * Check during incremental backup if a particular WAL is being backed up was covered during previous incremental backup or not. Changes for above 2 use cases. * Since log roll during incremental or full backup is stored with prefix `trslm:`. Can be used to identify which log files can be cleaned up * Check during incremental backup if a particular WAL is being backed up or not is redundant. No such a check is required was: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} This has several problems # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or for performing logcleaner. # No support for rsgroup. Hence tables belonging to rsgroups which doesn't have backup enabled also have to retain wals' and forever. Proposed Solution: > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > Primarily used for following > * WAL's are stored in meta table to check if a particular log has been > backed up or not. > * Check during incremental backup if a particular WAL is being backed up was > covered
[jira] [Updated] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} This has several problems # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or for performing logcleaner. # No support for rsgroup. Hence tables belonging to rsgroups which doesn't have backup enabled also have to retain wals' and forever. Proposed Solution: was: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} This has several problems # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or for performing logcleaner. # No support for rsgroup. Hence all rsgroups which doesn't have backup enabled tables, WAL's are retained forever. > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > > This has several problems > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > for performing logcleaner. > # No support for rsgroup. Hence tables belonging to rsgroups which doesn't > have backup enabled also have to retain wals' and forever. > > Proposed Solution: > >
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Description: *Existing Design* *!image-2021-06-03-16-33-59-282.png|width=292,height=408!* *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. Proposed Design. !https://i.ibb.co/vVV1BTs/Backup-Activity-Diagram.png|width=322,height=414! was: *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. Proposed Design.
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Description: *Existing Design* *!image-2021-06-03-16-33-59-282.png|width=292,height=408!* *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. *Proposed Design.* !image-2021-06-03-16-34-34-957.png|width=324,height=416! was: *Existing Design* *!image-2021-06-03-16-33-59-282.png|width=292,height=408!* *Problem 1:* With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution:* Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems * Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup * Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution:* Same as previous proposal *Problem 3:* Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem * WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution:* I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Attachment: image-2021-06-03-16-34-34-957.png > Support for Parallel Backups enabling multi tenancy with rsgroups > - > > Key: HBASE-25784 > URL: https://issues.apache.org/jira/browse/HBASE-25784 > Project: HBase > Issue Type: Umbrella > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Labels: backup > Attachments: image-2021-06-03-16-33-59-282.png, > image-2021-06-03-16-34-34-957.png > > > *Existing Design* > *!image-2021-06-03-16-33-59-282.png|width=292,height=408!* > *Problem 1:* > With this design, Incremental and Full backup can't be run in parallel and > leading to degraded RPO's in case Full backup is of longer duration esp for > large tables. > > Example: > Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes > and you are allowed to ship the remote backup with 800 Mbps. And you are > allowed to take Full Backups once in a week and rest of them should be > incremental backups > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > *Proposed Solution:* Barring some critical sections such as modifying state > of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > *Problem 2:* > With one backup at a time, it fails easily for a multi-tenant system. This > poses following problems > * Admins will not be able to achieve required RPO's for their tables because > of dependence on other tenants present in the system. As one tenant doesn't > have control over other tenants' table sizes and hence the duration of the > backup > * Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > *Proposed Solution:* Same as previous proposal > > *Problem 3:* > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are > never cleaned up until the next backup (Full / Incremental) is taken. This > poses following problem > * WAL's can grow unbounded in case there are transient problems like backup > site facing issues or anything else until next backup scheduled goes > successful > *Proposed Solution:* I can't think of anything better, but I see this can be > a potential problem. Also, one can force full backup if required WAL files > are missing for whatever other reasons not necessarily mentioned above. > > Proposed Design. > !https://i.ibb.co/vVV1BTs/Backup-Activity-Diagram.png|width=322,height=414! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups
[ https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25784: Attachment: image-2021-06-03-16-33-59-282.png > Support for Parallel Backups enabling multi tenancy with rsgroups > - > > Key: HBASE-25784 > URL: https://issues.apache.org/jira/browse/HBASE-25784 > Project: HBase > Issue Type: Umbrella > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Labels: backup > Attachments: image-2021-06-03-16-33-59-282.png, > image-2021-06-03-16-34-34-957.png > > > *Problem 1:* > With this design, Incremental and Full backup can't be run in parallel and > leading to degraded RPO's in case Full backup is of longer duration esp for > large tables. > > Example: > Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes > and you are allowed to ship the remote backup with 800 Mbps. And you are > allowed to take Full Backups once in a week and rest of them should be > incremental backups > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > *Proposed Solution:* Barring some critical sections such as modifying state > of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > *Problem 2:* > With one backup at a time, it fails easily for a multi-tenant system. This > poses following problems > * Admins will not be able to achieve required RPO's for their tables because > of dependence on other tenants present in the system. As one tenant doesn't > have control over other tenants' table sizes and hence the duration of the > backup > * Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > *Proposed Solution:* Same as previous proposal > > *Problem 3:* > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are > never cleaned up until the next backup (Full / Incremental) is taken. This > poses following problem > * WAL's can grow unbounded in case there are transient problems like backup > site facing issues or anything else until next backup scheduled goes > successful > *Proposed Solution:* I can't think of anything better, but I see this can be > a potential problem. Also, one can force full backup if required WAL files > are missing for whatever other reasons not necessarily mentioned above. > > Proposed Design. > !https://i.ibb.co/vVV1BTs/Backup-Activity-Diagram.png|width=322,height=414! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354430#comment-17354430 ] Mallikarjun commented on HBASE-25891: - [~anoop.hbase] Not there completely. Let me put down the details and share it. > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > > This has several problems > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > for performing logcleaner. > # No support for rsgroup. Hence all rsgroups which doesn't have backup > enabled tables, WAL's are retained forever. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Fix Version/s: 3.0.0-alpha-1 > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > > This has several problems > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > for performing logcleaner. > # No support for rsgroup. Hence all rsgroups which doesn't have backup > enabled tables, WAL's are retained forever. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Affects Version/s: 3.0.0-alpha-1 > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Affects Versions: 3.0.0-alpha-1 >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > > This has several problems > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > for performing logcleaner. > # No support for rsgroup. Hence all rsgroups which doesn't have backup > enabled tables, WAL's are retained forever. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} This has several problems # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or for performing logcleaner. # No support for rsgroup. Hence all rsgroups which doesn't have backup enabled tables, WAL's are retained forever. was: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} This has several problems # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or for performing logcleaner. # No support for rsgroup. Hence all rsgroups which doesn't have backup enabled, WAL's are retained forever. > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > > This has several problems > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > for performing logcleaner. > # No support for rsgroup. Hence all rsgroups which doesn't have backup > enabled tables, WAL's are retained forever. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove dependence storing WAL filenames for backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Summary: Remove dependence storing WAL filenames for backup (was: Remove the dependence of storing WAL filenames for incremental backup) > Remove dependence storing WAL filenames for backup > -- > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > > This has several problems > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > for performing logcleaner. > # No support for rsgroup. Hence all rsgroups which doesn't have backup > enabled, WAL's are retained forever. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove the dependence of storing WAL filenames for incremental backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} This has several problems # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or for performing logcleaner. # No support for rsgroup. Hence all rsgroups which doesn't have backup enabled, WAL's are retained forever. was: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} # # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or log cle > Remove the dependence of storing WAL filenames for incremental backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Currently WAL logs are stored in `backup:system` meta table > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > > This has several problems > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > for performing logcleaner. > # No support for rsgroup. Hence all rsgroups which doesn't have backup > enabled, WAL's are retained forever. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove the dependence of storing WAL filenames for incremental backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 {code} # # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or log cle was: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder {code} # # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or log cle > Remove the dependence of storing WAL filenames for incremental backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Currently WAL logs are stored in `backup:system` meta table > > {code:java} > // code placeholder > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 > wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, > timestamp=1622003479895, value=backup_1622003358258 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, > timestamp=1622003479895, > value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 > wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, > timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 > {code} > > > > # > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > log cle > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove the dependence of storing WAL filenames for incremental backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Description: Currently WAL logs are stored in `backup:system` meta table {code:java} // code placeholder {code} # # Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. # Unnecessary to have wal log listed for performing incremental backup or log cle was: Here are some of the problems identified. 1. Ever growing rows of wal's sourced for incremental backup is maintained and never cleaned up. 2. Backup sessions, Backup sets, wal logs, active sessions, merges, etc are all identified by prefix of row key. Which doesn't seem to be very intuitive > Remove the dependence of storing WAL filenames for incremental backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Currently WAL logs are stored in `backup:system` meta table > > {code:java} > // code placeholder > {code} > > > > # > # Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > # Unnecessary to have wal log listed for performing incremental backup or > log cle > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25891) Remove the dependence of storing WAL filenames for incremental backup
[ https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun updated HBASE-25891: Summary: Remove the dependence of storing WAL filenames for incremental backup (was: Simplify backup table to be able to maintain it better) > Remove the dependence of storing WAL filenames for incremental backup > - > > Key: HBASE-25891 > URL: https://issues.apache.org/jira/browse/HBASE-25891 > Project: HBase > Issue Type: Improvement > Components: backuprestore >Reporter: Mallikarjun >Assignee: Mallikarjun >Priority: Major > > Here are some of the problems identified. > 1. Ever growing rows of wal's sourced for incremental backup is maintained > and never cleaned up. > 2. Backup sessions, Backup sets, wal logs, active sessions, merges, etc are > all identified by prefix of row key. Which doesn't seem to be very intuitive -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25888) Backup tests are categorically flakey
[ https://issues.apache.org/jira/browse/HBASE-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347587#comment-17347587 ] Mallikarjun commented on HBASE-25888: - [~ndimiduk] You can have a look at PR whenever you have time. > Backup tests are categorically flakey > - > > Key: HBASE-25888 > URL: https://issues.apache.org/jira/browse/HBASE-25888 > Project: HBase > Issue Type: Bug > Components: backuprestore, test >Reporter: Nick Dimiduk >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > Attachments: > TEST-org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.xml.gz, > TEST-org.apache.hadoop.hbase.backup.TestBackupMerge.xml.gz, > TEST-org.apache.hadoop.hbase.backup.TestFullBackupSet.xml.gz, > TEST-org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.xml.gz, > > TEST-org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.xml.gz, > TEST-org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.xml.gz, > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.txt, > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.txt.gz, > org.apache.hadoop.hbase.backup.TestBackupMerge-output.txt.gz, > org.apache.hadoop.hbase.backup.TestBackupMerge.txt.gz, > org.apache.hadoop.hbase.backup.TestFullBackupSet-output.txt.gz, > org.apache.hadoop.hbase.backup.TestFullBackupSet.txt.gz, > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures-output.txt.gz, > > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.txt.gz, > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad-output.txt.gz, > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.txt.gz, > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests-output.txt.gz, > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.txt.gz > > > Here's some logs from a PR build vs. master that suffered a significant > number of failures in the backup tests. I suspect that a single improvement > could fix all of these tests to be more robust. > {noformat} > Test Name > Duration > Age > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.testBackupDeleteRestore > 6 min 23 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.(?)1 min 6 sec > 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupMerge.TestIncBackupMergeRestore > 5 min 3 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupMerge.(?)1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestFullBackupSet.testFullBackupSetExist > 6 min 16 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestFullBackupSet.(?) 1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.TestIncBackupMergeRestore >5 min 55 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.(?) > 1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable > 5 min 56 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.(?) 1 min 6 > sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.testFullRestoreSingleEmpty > 6 min 5 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.testFullRestoreMultipleEmpty > 0.17 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.(?) > {noformat} > https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3249/4/testReport/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25888) Backup tests are categorically flakey
[ https://issues.apache.org/jira/browse/HBASE-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347230#comment-17347230 ] Mallikarjun commented on HBASE-25888: - It is as simple as `Setup` and `TearDown` problem. When you run them individually, run fine but fail as a suite. I have fixed for most of the tests, there are pending few. looking into them. > Backup tests are categorically flakey > - > > Key: HBASE-25888 > URL: https://issues.apache.org/jira/browse/HBASE-25888 > Project: HBase > Issue Type: Bug > Components: backuprestore, test >Reporter: Nick Dimiduk >Assignee: Mallikarjun >Priority: Major > Fix For: 3.0.0-alpha-1 > > Attachments: > TEST-org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.xml.gz, > TEST-org.apache.hadoop.hbase.backup.TestBackupMerge.xml.gz, > TEST-org.apache.hadoop.hbase.backup.TestFullBackupSet.xml.gz, > TEST-org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.xml.gz, > > TEST-org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.xml.gz, > TEST-org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.xml.gz, > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.txt, > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.txt.gz, > org.apache.hadoop.hbase.backup.TestBackupMerge-output.txt.gz, > org.apache.hadoop.hbase.backup.TestBackupMerge.txt.gz, > org.apache.hadoop.hbase.backup.TestFullBackupSet-output.txt.gz, > org.apache.hadoop.hbase.backup.TestFullBackupSet.txt.gz, > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures-output.txt.gz, > > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.txt.gz, > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad-output.txt.gz, > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.txt.gz, > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests-output.txt.gz, > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.txt.gz > > > Here's some logs from a PR build vs. master that suffered a significant > number of failures in the backup tests. I suspect that a single improvement > could fix all of these tests to be more robust. > {noformat} > Test Name > Duration > Age > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.testBackupDeleteRestore > 6 min 23 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.(?)1 min 6 sec > 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupMerge.TestIncBackupMergeRestore > 5 min 3 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupMerge.(?)1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestFullBackupSet.testFullBackupSetExist > 6 min 16 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestFullBackupSet.(?) 1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.TestIncBackupMergeRestore >5 min 55 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.(?) > 1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable > 5 min 56 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.(?) 1 min 6 > sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.testFullRestoreSingleEmpty > 6 min 5 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.testFullRestoreMultipleEmpty > 0.17 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.(?) > {noformat} > https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3249/4/testReport/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-25888) Backup tests are categorically flakey
[ https://issues.apache.org/jira/browse/HBASE-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-25888 started by Mallikarjun. --- > Backup tests are categorically flakey > - > > Key: HBASE-25888 > URL: https://issues.apache.org/jira/browse/HBASE-25888 > Project: HBase > Issue Type: Bug > Components: backuprestore, test >Reporter: Nick Dimiduk >Assignee: Mallikarjun >Priority: Major > Attachments: > TEST-org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.xml.gz, > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.txt, > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.txt.gz > > > Here's some logs from a PR build vs. master that suffered a significant > number of failures in the backup tests. I suspect that a single improvement > could fix all of these tests to be more robust. > {noformat} > Test Name > Duration > Age > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.testBackupDeleteRestore > 6 min 23 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupDeleteRestore.(?)1 min 6 sec > 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupMerge.TestIncBackupMergeRestore > 5 min 3 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestBackupMerge.(?)1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestFullBackupSet.testFullBackupSetExist > 6 min 16 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestFullBackupSet.(?) 1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.TestIncBackupMergeRestore >5 min 55 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithFailures.(?) > 1 min 6 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable > 5 min 56 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.(?) 1 min 6 > sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.testFullRestoreSingleEmpty > 6 min 5 sec 1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.testFullRestoreMultipleEmpty > 0.17 sec1 > precommit checks / yetus jdk8 Hadoop3 checks / > org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests.(?) > {noformat} > https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3249/4/testReport/ -- This message was sent by Atlassian Jira (v8.3.4#803005)