[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762016#comment-16762016 ] huaxiang sun commented on HBASE-18693: -- By the way, do you know how to update email address in jira? It seems that it is still associated with the old email address which is invalid. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Major > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762013#comment-16762013 ] huaxiang sun commented on HBASE-18693: -- Sorry [~pankaj2461], can you take over this jira? I do not have bandwidth now. Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Major > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754754#comment-16754754 ] Hadoop QA commented on HBASE-18693: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HBASE-18693 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18693 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895096/HBASE-18693.master.003.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/15765/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Major > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754752#comment-16754752 ] Pankaj Kumar commented on HBASE-18693: -- PingĀ [~huaxiang] , any plan to commit this change? > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Major > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313764#comment-16313764 ] huaxiang sun commented on HBASE-18693: -- Thanks [~jingcheng.du] for the review. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312387#comment-16312387 ] Jingcheng Du commented on HBASE-18693: -- Thanks [~huaxiang]! I am +1 to V3. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311744#comment-16311744 ] huaxiang sun commented on HBASE-18693: -- Hi [~jingcheng.du], just want to follow up the review status, thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295419#comment-16295419 ] huaxiang sun commented on HBASE-18693: -- Hi [~dujin...@gmail.com], ping for review, thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289742#comment-16289742 ] huaxiang sun commented on HBASE-18693: -- Hi [~dujin...@gmail.com], v3 is up to date. The only difference is diff --git a/hbase-shell/src/main/ruby/shell/commands/restore_snapshot.rb b/hbase-shell/src/main/rub| diff --git hbase-shell/src/main/ruby/hbase_constants.rb hbase-shell/src/main/ruby/hbase_constants.r | index ebaae78..12df9ff 100644 | --- hbase-shell/src/main/ruby/hbase_constants.rb | +++ hbase-shell/src/main/ruby/hbase_constants.rb | @@ -84,6 +84,7 @@ module HBaseConstants | SERVER_NAME = 'SERVER_NAME'.freeze | LOCALITY_THRESHOLD = 'LOCALITY_THRESHOLD'.freeze | RESTORE_ACL = 'RESTORE_ACL'.freeze | + MOVE_MOB_FILES_FROM_ARCHIVE_TO_WORKDIR = 'MOVE_MOB_FILES_FROM_ARCHIVE_TO_WORKDIR'.freeze | FORMATTER = 'FORMATTER'.freeze | FORMATTER_CLASS = 'FORMATTER_CLASS'.freeze Which is to address the TestShell failure. Can you review the v2 in review board? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288010#comment-16288010 ] huaxiang sun commented on HBASE-18693: -- Thanks [~dujin...@gmail.com] for remind. I think I have the latest code which I will polish and post. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287449#comment-16287449 ] Jingcheng Du commented on HBASE-18693: -- Thanks [~huaxiang] for the patch! Have you already posted the v3 patch to RB? > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch, HBASE-18693.master.003.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233801#comment-16233801 ] Hadoop QA commented on HBASE-18693: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 4m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 41s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s{color} | {color:red} hbase-client: The patch generated 2 new + 228 unchanged - 0 fixed = 230 total (was 228) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s{color} | {color:red} hbase-server: The patch generated 24 new + 270 unchanged - 0 fixed = 294 total (was 270) {color} | | {color:red}-1{color} | {color:red} rubocop {color} | {color:red} 0m 11s{color} | {color:red} The patch generated 4 new + 340 unchanged - 1 fixed = 344 total (was 341) {color} | | {color:red}-1{color} | {color:red} ruby-lint {color} | {color:red} 0m 5s{color} | {color:red} The patch generated 2 new + 835 unchanged - 0 fixed = 837 total (was 835) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 54s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 49m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 31s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m 37s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 16s{color} | {color:green} hbase-shell
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206319#comment-16206319 ] huaxiang sun commented on HBASE-18693: -- I am checking these failed unittests locally and will do another QA run after local verification, thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206314#comment-16206314 ] Ted Yu commented on HBASE-18693: Can you get a clean QA run ? See if the 3 failed tests can be reproduced locally. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206304#comment-16206304 ] huaxiang sun commented on HBASE-18693: -- @tedyu and [~jingcheng.du], I posted v2 at the review board, any comments for v2? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch, > HBASE-18693.master.002.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199628#comment-16199628 ] Hadoop QA commented on HBASE-18693: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 10s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} rubocop {color} | {color:red} 0m 9s{color} | {color:red} The patch generated 3 new + 334 unchanged - 1 fixed = 337 total (was 335) {color} | | {color:red}-1{color} | {color:red} ruby-lint {color} | {color:red} 0m 5s{color} | {color:red} The patch generated 1 new + 743 unchanged - 0 fixed = 744 total (was 743) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 46s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 60m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 2m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 11m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 15s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 16s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 43s{color} | {color:red} hbase-shell in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199017#comment-16199017 ] huaxiang sun commented on HBASE-18693: -- Thanks [~dujin...@gmail.com] and [~mdrob], I will take care of the comments. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198767#comment-16198767 ] Mike Drob commented on HBASE-18693: --- The rubocop & ruby-lint warnings look fine to ignore for now. We'll need to do a major cleanup pass later on line length anyway. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198114#comment-16198114 ] Jingcheng Du commented on HBASE-18693: -- Thanks a lot [~huaxiang], I've updated the comments in RB. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198008#comment-16198008 ] Jingcheng Du commented on HBASE-18693: -- Sure, [~huaxiang]. I am looking at it. It may take a few days. Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197198#comment-16197198 ] huaxiang sun commented on HBASE-18693: -- [~dujin...@gmail.com], can you help to look at the patch? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18693.master.001.patch > > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195558#comment-16195558 ] Hadoop QA commented on HBASE-18693: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 34s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} rubocop {color} | {color:red} 0m 8s{color} | {color:red} The patch generated 3 new + 332 unchanged - 1 fixed = 335 total (was 333) {color} | | {color:red}-1{color} | {color:red} ruby-lint {color} | {color:red} 0m 3s{color} | {color:red} The patch generated 1 new + 731 unchanged - 0 fixed = 732 total (was 731) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 35s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 35m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 43s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 95m 45s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 10s{color} | {color:red} hbase-shell in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} |
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149413#comment-16149413 ] huaxiang sun commented on HBASE-18693: -- Thanks [~dujin...@gmail.com]! I will upload a patch. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149191#comment-16149191 ] Jingcheng Du commented on HBASE-18693: -- bq. I got what was your concern. restore_snapshot always restores to the same table, that is why I add an option here. clone_snapshot is a different story, it can be cloned to different tables. If the option is added to clone_snapshot, it will corrupt the snapshot. You are right. I am +1 on this option. Thanks! > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147677#comment-16147677 ] huaxiang sun commented on HBASE-18693: -- Hi Jingcheng, {quote} Restoring a snapshot to the same table is okay. What if we try to restore the snapshot in another table? The same MOB file can be in different locations? No, right? {quote} I got what was your concern. restore_snapshot always restores to the same table, that is why I add an option here. clone_snapshot is a different story, it can be cloned to different tables. If the option is added to clone_snapshot, it will corrupt the snapshot. {quote} You are right, this is a problem. How about select files with multiple threads, each thread handle part of the files selection? Thanks. {quote} HBASE-17043 has been created for this effort. I think this is not enough and overhead (pressure to NN). We need to give user an option in this case. If this option looks good to you, I am going to post a patch. Thanks > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147396#comment-16147396 ] Jingcheng Du commented on HBASE-18693: -- Thanks Huaxiang. bq. The snapshot itself is not destroyed after moving mob files from archive to working directory. I do not see an issue to restore a snapshot twice here. Can you share more details? Restoring a snapshot to the same table is okay. What if we try to restore the snapshot in another table? The same MOB file can be in different locations? No, right? bq. For one of our use cases, user exported a snapshot with millions of mob files and restored the table at a remote cluster. The select() took more than one day to complete before actual compaction happened. We did the hack to skip hfile links so compaction could happen within several minutes. Even compacting links in a longer interval, this is still a huge overhead. What do you think? You are right, this is a problem. How about select files with multiple threads, each thread handle part of the files selection? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146177#comment-16146177 ] huaxiang sun commented on HBASE-18693: -- I did one test manually to confirm restore_snapshot with mob files moved to working directory. The snapshot is still there and I can do restore_snapshot/clone_snapshot with it. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145814#comment-16145814 ] huaxiang sun commented on HBASE-18693: -- Hi [~dujin...@gmail.com], {quote} My concern is if we restore a snapshot twice which is possible, how to handle such operations? {quote} The snapshot itself is not destroyed after moving mob files from archive to working directory. I do not see an issue to restore a snapshot twice here. Can you share more details? {quote} Or we can skip the hfile links in most of MOB compaction, and compact the links in a longer interval (like a month)? {quote} For one of our use cases, user exported a snapshot with millions of mob files and restored the table at a remote cluster. The select() took more than one day to complete before actual compaction happened. We did the hack to skip hfile links so compaction could happen within several minutes. Even compacting links in a longer interval, this is still a huge overhead. What do you think? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144864#comment-16144864 ] Jingcheng Du commented on HBASE-18693: -- Thanks Huaxiang. HDFS move doesn't copy the data, right, it doesn't, it is supposed to be a rename operation. My concern is if we restore a snapshot twice which is possible, how to handle such operations? In HBase, we compact the hfile links in compaction, I think compacting hfile links in MOB compaction is reasonable too. Or we can skip the hfile links in most of MOB compaction, and compact the links in a longer interval (like a month)? I prefer the 1st option. What's your idea? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144865#comment-16144865 ] Jingcheng Du commented on HBASE-18693: -- HDFS move doesn't copy the data, right, it doesn't, it is supposed to be a rename operation. My concern is if we restore a snapshot twice which is possible, how to handle such operations? In HBase, we compact the hfile links in compaction, I think compacting hfile links in MOB compaction is reasonable too. Or we can skip the hfile links in most of MOB compaction, and compact the links in a longer interval (like a month)? I prefer the 1st option. What's your idea? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144813#comment-16144813 ] huaxiang sun commented on HBASE-18693: -- Thanks [~jingcheng.du]. hdfs move() is a very light operation. I browsed the source code and also looked at https://stackoverflow.com/questions/34512596/how-does-hdfs-mv-command-work. As far as I understand, it only involves name node and no real data operation is performed. Skip hFileLink may not work as some mob hFileLinks need to be compacted to reduce file number in the directory. If these HFileLinks are compacted always, this will cause lots of unnecessary IOs as mob files will be created. Please share your thoughts, thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144792#comment-16144792 ] Jingcheng Du commented on HBASE-18693: -- Thanks [~huangxiangang]. The restore operations should be a fast one, moving data in such an operation is not proper I think. Could we just skip the hfile link, or just compact them no matter what size it is in the compaction? How about this? > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143157#comment-16143157 ] huaxiang sun commented on HBASE-18693: -- For each mob file, it is expected that fs.move() will be called. Right now, for each mob file, it needs to create two files for HFileLink. So I am expecting that moving files from archive dir to working dir to be as efficient as the current implementation. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143051#comment-16143051 ] Anoop Sam John commented on HBASE-18693: bq. we want to add an option so that restore_snapshot can move mob files from archive dir to working dir How efficient this op will be? I assume this wont include actual data move but just rename ops. What if a million files under the MOB region? > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142402#comment-16142402 ] huaxiang sun commented on HBASE-18693: -- ping [~jingcheng.du] and [~anoop.hbase], any comments before I go ahead to implement the proposed option? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0-alpha-2 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 100 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)