[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-09-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170830#comment-16170830
 ] 

Ted Yu commented on HBASE-14417:


Vlad:
All the patches on this JIRA were from me.

This was resolved almost half year ago.

To add DistCp support, open new JIRA.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v23.txt, 14417-tbl-ext.v24.txt, 14417-tbl-ext.v9.txt, 
> 14417.v11.txt, 14417.v13.txt, 14417.v1.txt, 14417.v21.txt, 14417.v23.txt, 
> 14417.v24.txt, 14417.v25.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-09-18 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170820#comment-16170820
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

Reopened to add DistCp support.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v23.txt, 14417-tbl-ext.v24.txt, 14417-tbl-ext.v9.txt, 
> 14417.v11.txt, 14417.v13.txt, 14417.v1.txt, 14417.v21.txt, 14417.v23.txt, 
> 14417.v24.txt, 14417.v25.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990397#comment-15990397
 ] 

Hudson commented on HBASE-14417:


ABORTED: Integrated in Jenkins build HBase-HBASE-14614 #190 (See 
[https://builds.apache.org/job/HBase-HBASE-14614/190/])
HBASE-14417 Incremental backup and bulk loading (tedyu: rev 
0345fc87759a7d44ecc385327ebb586fc632fb65)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/mapreduce/MapReduceRestoreJob.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestIncrementalBackupWithBulkLoad.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupHFileCleaner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManager.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestBackupBase.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestBackupHFileCleaner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupAdminImpl.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/HFileArchiveUtil.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/RestoreTablesClient.java


> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v23.txt, 14417-tbl-ext.v24.txt, 14417-tbl-ext.v9.txt, 
> 14417.v11.txt, 14417.v13.txt, 14417.v1.txt, 14417.v21.txt, 14417.v23.txt, 
> 14417.v24.txt, 14417.v25.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946469#comment-15946469
 ] 

Hudson commented on HBASE-14417:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2758 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2758/])
HBASE-14417 Incremental backup and bulk loading (tedyu: rev 
0345fc87759a7d44ecc385327ebb586fc632fb65)
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupManager.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestIncrementalBackupWithBulkLoad.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestBackupBase.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupAdminImpl.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/RestoreTablesClient.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/HFileArchiveUtil.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestBackupHFileCleaner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/mapreduce/MapReduceRestoreJob.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/BackupHFileCleaner.java


> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v23.txt, 14417-tbl-ext.v24.txt, 14417-tbl-ext.v9.txt, 
> 14417.v11.txt, 14417.v13.txt, 14417.v1.txt, 14417.v21.txt, 14417.v23.txt, 
> 14417.v24.txt, 14417.v25.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946136#comment-15946136
 ] 

Hadoop QA commented on HBASE-14417:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 46s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
51s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
37s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 11s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 103m 5s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
14s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 37s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12860929/14417-tbl-ext.v24.txt 
|
| JIRA Issue | HBASE-14417 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 649abde09dfb 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / cb4fac1 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6246/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6246/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6246/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Incremental backup and bulk loading
> ---

[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-27 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943830#comment-15943830
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

Yes, Let me do one more round today.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v23.txt, 14417-tbl-ext.v9.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v1.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943762#comment-15943762
 ] 

Ted Yu commented on HBASE-14417:


[~vrodionov]:
Do you have other comment ?

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v23.txt, 14417-tbl-ext.v9.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v1.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941317#comment-15941317
 ] 

Ted Yu commented on HBASE-14417:


TestAcidGuarantees failure was not related to my patch.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v23.txt, 14417-tbl-ext.v9.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v1.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941315#comment-15941315
 ] 

Hadoop QA commented on HBASE-14417:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
6s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 7s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 102m 25s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 1s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.TestAcidGuarantees |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12860450/14417-tbl-ext.v23.txt 
|
| JIRA Issue | HBASE-14417 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux f1256c2f7f5c 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / faf81d5 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6215/artifact/patchprocess/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6215/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/6215/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/P

[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-23 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938860#comment-15938860
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

[~tedyu], please do not commit until I finish RB. Thanks.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v9.txt, 14417.v11.txt, 14417.v13.txt, 14417.v1.txt, 
> 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 14417.v2.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-23 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938772#comment-15938772
 ] 

Josh Elser commented on HBASE-14417:


Passing over the +1 from RB.

Also, mind the whitespace error on commit.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v22.txt, 
> 14417-tbl-ext.v9.txt, 14417.v11.txt, 14417.v13.txt, 14417.v1.txt, 
> 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 14417.v2.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Here is review board:
> https://reviews.apache.org/r/57790/
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937212#comment-15937212
 ] 

Hadoop QA commented on HBASE-14417:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
22s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 22s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 99m 44s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 139m 44s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12859991/14417-tbl-ext.v22.txt 
|
| JIRA Issue | HBASE-14417 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 9e7da8794801 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / f2d1b8d |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6198/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6198/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6198/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Incremental backup and bulk loading
> 

[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-20 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933289#comment-15933289
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

[~te...@apache.org], can you submit the patch to RB?

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Blocker
>  Labels: backup
> Fix For: 2.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v11.txt, 
> 14417-tbl-ext.v14.txt, 14417-tbl-ext.v18.txt, 14417-tbl-ext.v19.txt, 
> 14417-tbl-ext.v20.txt, 14417-tbl-ext.v21.txt, 14417-tbl-ext.v9.txt, 
> 14417.v11.txt, 14417.v13.txt, 14417.v1.txt, 14417.v21.txt, 14417.v23.txt, 
> 14417.v24.txt, 14417.v25.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Here is the review board (out of date):
> https://reviews.apache.org/r/54258/
> In order not to miss the hfiles which are loaded into region directories in a 
> situation where postBulkLoadHFile() hook is not called (bulk load being 
> interrupted), we record hfile names thru preCommitStoreFile() hook.
> At time of incremental backup, we check the presence of such hfiles. If they 
> are present, they become part of the incremental backup image.
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932201#comment-15932201
 ] 

Hadoop QA commented on HBASE-14417:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
55s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
55s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
35m 17s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 107m 37s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 5s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12859488/14417-tbl-ext.v21.txt 
|
| JIRA Issue | HBASE-14417 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux a58f6ac38258 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7c03a21 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6145/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6145/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6145/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Incremental backup and bulk loading
> 

[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2017-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932085#comment-15932085
 ] 

Hadoop QA commented on HBASE-14417:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 5s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
58s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
52s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
31m 4s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 12s 
{color} | {color:red} hbase-server generated 3 new + 0 unchanged - 0 fixed = 3 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 119m 56s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 166m 4s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  Using .equals to compare two byte[]'s, (equivalent to ==) in 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.readBulkloadRows(List)  
At BackupSystemTable.java:byte[]'s, (equivalent to ==) in 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.readBulkloadRows(List)  
At BackupSystemTable.java:[line 416] |
|  |  Boxing/unboxing to parse a primitive 
org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.getTsFromBackupId(String)
  At 
RestoreTablesClient.java:org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.getTsFromBackupId(String)
  At RestoreTablesClient.java:[line 263] |
|  |  Exception is caught when Exception is not thrown in 
org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restore(HashMap, 
TableName[], TableName[], boolean)  At RestoreTablesClient.java:is not thrown 
in org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restore(HashMap, 
TableName[], TableName[], boolean)  At RestoreTablesClient.java:[line 249] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL

[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-12-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733635#comment-15733635
 ] 

Ted Yu commented on HBASE-14417:


The final filename is available here in HRegion#bulkLoadHFiles() :
{code}
  Path commitedStoreFile = store.bulkLoadHFile(finalPath, seqId);
{code}
If we add one more hook above which records final filename in hbase:backup 
table, we still depend on postBulkLoadHFile() hook to write final filename one 
more time (with state of completion) - because bulk load event persistence 
(done in finally block) may fail. Meaning BackupHFileCleaner wouldn't have 
enough information whether the bulk load succeeded by simply checking the 
existence of store file(s) in region directory:
{code}
// write a bulk load event when not all hfiles are loaded
try {
  WALProtos.BulkLoadDescriptor loadDescriptor = 
ProtobufUtil.toBulkLoadDescriptor(
  this.getRegionInfo().getTable(),
{code}

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v9.txt, 
> 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 
> 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-12-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733183#comment-15733183
 ] 

Ted Yu commented on HBASE-14417:


The sequence Id for bulk loaded hfile is generated inside 
HRegion#bulkLoadHFiles().
Coming out of bulkLoadHFiles() call, postBulkLoadHFile() hook is called with 
actual file names.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v9.txt, 
> 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 
> 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-12-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732531#comment-15732531
 ] 

Ted Yu commented on HBASE-14417:


More response to Vlad's review comment w.r.t. fault tolerance in bulk load.

When bulk load fails midway, the user should provide complete set of hfiles 
again because the staging directory is not exposed to end users.
With this in mind, the benefit of using another hook (prior to 
postBulkLoadHFile()) to persist location of bulk loaded hfiles is minimal - 
since in subsequent bulk load attempt(s), the same set of (source) hfiles would 
be loaded again.

Another factor is that the more writes to hbase:backup table, the higher the 
chance of getting (write) failure.

One optimization we can do in the future is to combine writes (performed in 
postBulkLoadHFile()) from several regions on the same region server, provided 
that these writes are sufficiently close (300 ms apart, e.g.). The completion 
of bulk load on a single region server is determined by the slowest 
participating region, so this optimization would keep the response time on par 
with the current implementation (where hbase:backup table is not involed).

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v9.txt, 
> 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 
> 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-12-02 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715023#comment-15715023
 ] 

Ashish Singhi commented on HBASE-14417:
---

I don't want to block this jira. I will be on holiday for next two weeks so 
will not be able to check this before that. If any one else is fine pls go 
ahead and commit it. I will review it later.
Thanks

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v9.txt, 14417.v1.txt, 14417.v11.txt, 
> 14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 
> 14417.v25.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-12-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712616#comment-15712616
 ] 

Ted Yu commented on HBASE-14417:


Due to lack of access to gmail, I didn't see the above until several minutes 
ago.

Here is the review board:
https://reviews.apache.org/r/54258/

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v9.txt, 14417.v1.txt, 14417.v11.txt, 
> 14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 
> 14417.v25.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-11-24 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694965#comment-15694965
 ] 

Ashish Singhi commented on HBASE-14417:
---

Can you upload the patch on RB. It will be easy to review.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v9.txt, 14417.v1.txt, 14417.v11.txt, 
> 14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 
> 14417.v25.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-11-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685093#comment-15685093
 ] 

Ted Yu commented on HBASE-14417:


BackupHFileCleaner should be registered through 
hbase.master.hfilecleaner.plugins . It is responsible for keeping bulk loaded 
hfiles so that incremental backup can pick them up.
BackupObserver should be registered through hbase.coprocessor.region.classes
It is notified when bulk load completes and writes records into hbase:backup 
table.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417-tbl-ext.v9.txt, 14417.v1.txt, 14417.v11.txt, 
> 14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 
> 14417.v25.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-11-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650599#comment-15650599
 ] 

Ted Yu commented on HBASE-14417:


I am proceeding to implement the above proposal.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-11-04 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638081#comment-15638081
 ] 

Devaraj Das commented on HBASE-14417:
-

A summary of some internal discussions on the high-level flow that doesn't use 
ZK...
1. Client updates the hbase:backup table with a set of paths that are to be 
bulkloaded (if the tables in question have been fully backed up at least once 
in the past)
2. Client performs the bulkload of the data. If the client fails before the 
bulkload was fully complete, the cleaner chore in (5) would take care of 
cleaning up the unneeded entries from hbase:backup
3. There is a HFileCleaner that makes sure that paths that came about due to 
(1) are held until the next incremental backup
4. As part of the incremental backup, the hbase:backup table is updated to 
reflect the right location where the earlier bulkloaded file got copied to
5. A chore runs periodically (in the BackupController) that eliminates entries 
from the hbase:backup table if the corresponding paths don't exist in the 
filesystem until after a configured time period (default, say 24 hours; 
bulkload timeout is assumed to be much smaller than this, and hence all 
bulkloads that are meant to successfully complete would complete).
Thoughts?

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-11-02 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629891#comment-15629891
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

[~tedyu], calling remote region in postAppend hook does not look like a good 
idea. How about putting all this logic into RSRpcServices.bulkLoadHFile? Not 
necessary do synchronous call, you can use queue and update backup table on 
bulk load finish.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-11-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629369#comment-15629369
 ] 

Ted Yu commented on HBASE-14417:


I observed this in the TestHRegionServerBulkLoad-output for the version (v11 
and earlier) where bulk load marker is written directly to hbase:backup table 
in postAppend hook:
{code}
2016-09-13 23:10:14,072 DEBUG [B.defaultRpcServer.handler=4,queue=0,port=35667] 
ipc.CallRunner(112): B.defaultRpcServer.handler=4,queue=0,port=35667: callId: 
10646 service: ClientService methodName: Scan size: 264 connection:
172.18.128.12:59780
org.apache.hadoop.hbase.RegionTooBusyException: failed to get a lock in 6 
ms. regionName=atomicBulkLoad,,1473808150804.6b6c67612b01bce3348c144b959b7f0e., 
server=cn012.l42scl.hortonworks.com,35667,1473808145352
  at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7744)
  at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7725)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:7634)
  at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2588)
  at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2582)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2569)
  at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33516)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2229)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
  at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:136)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:111)
  at java.lang.Thread.run(Thread.java:745)
{code}
Here was the state of the BulkLoadHandler thread (stuck):
{code}
"RS:0;cn012:36301.append-pool9-t1" #453 prio=5 os_prio=0 tid=0x7fc3945bb000 
nid=0x18ec in Object.wait() [0x7fc30dada000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  at 
org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1727)
  - locked <0x000794750580> (a java.util.concurrent.atomic.AtomicLong)
  at 
org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1756)
  at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:241)
  at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:191)
  - locked <0x000794750048> (a 
org.apache.hadoop.hbase.client.BufferedMutatorImpl)
  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:949)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:569)
  at 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.writeBulkLoadDesc(BackupSystemTable.java:227)
  at 
org.apache.hadoop.hbase.backup.impl.BulkLoadHandler.postAppend(BulkLoadHandler.java:83)
  at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.postAppend(FSHLog.java:1448)
{code}
Even increasing handler count didn't help:
{code}
diff --git a/hbase-server/src/test/resources/hbase-site.xml 
b/hbase-server/src/test/resources/hbase-site.xml
index bca90a3..829fcc9 100644
--- a/hbase-server/src/test/resources/hbase-site.xml
+++ b/hbase-server/src/test/resources/hbase-site.xml
@@ -30,6 +30,10 @@
 
   
   
+hbase.backup.enable
+true
+  
+  
 hbase.defaults.for.version.skip
 true
   
@@ -48,11 +52,11 @@
   
   
 hbase.regionserver.handler.count
-5
+50
   
   
{code}
Post v11, the data stored in zookeeper is temporary: once an incremental backup 
is run for the table receiving bulk load, data in zookeeper would be stored for 
the backup Id and removed from zookeeper.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-11-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626239#comment-15626239
 ] 

stack commented on HBASE-14417:
---

I like what [~vrodionov] asks here. We are busy elsewhere trying to undo our 
reliance on zk for all but the bare minimum yet here we are dev'ing new 
features on it. 

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623137#comment-15623137
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

[~te...@apache.org], is patch ready for review? If yes, then please open review 
on RB.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622809#comment-15622809
 ] 

Ted Yu commented on HBASE-14417:


hbase:backup table is used to retrieve Set of tables which have gone through 
full backup after region server comes up (since the region server may have 
missed prior zk notifications).
Afterwards, BulkLoadHandler would receive notifications from zookeeper and 
doesn't need to poll hbase:backup table.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622790#comment-15622790
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

{quote}
Patch v24 registers tables being fully backed up in zookeeper so that 
BulkLoadHandler on respective region server can avoid unnecessary post to 
zookeeper.
{quote}

Why can not we use hbase:backup table for that?

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605864#comment-15605864
 ] 

Ted Yu commented on HBASE-14417:


In IncrementalTableBackupClient, before marking the backup complete, I am 
adding code to copy the bulk loaded hfiles to destination filesystem and 
persist the list of copied files to hbase:backup table.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605689#comment-15605689
 ] 

Ted Yu commented on HBASE-14417:


When running TestFullRestore, I found an interesting issue: bulk load can 
happen to the table which is fully backed up - when overwrite option is 
specified.

I can think of two ways to omit these bulk loaded files:
1. under backup zookeeper subtree, create znode with tables which are being 
restored to with overwrite option. This allows the postAppend() hook to skip 
these tables for the duration of the restore
2. at the end of bulk load, issue deletes against the znodes which are added 
during the bulk load

I am leaning toward first option.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589109#comment-15589109
 ] 

Ted Yu commented on HBASE-14417:


Discussed with Enis.
The persistence of bulk loaded hfiles should be synchronous with the bulk load.
Writing to hbase:backup table through postAppend() hook is tricky.

The precedent of HBASE-13153 can be used for the persistence of reference to 
bulk loaded hfiles.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564243#comment-15564243
 ] 

Ted Yu commented on HBASE-14417:


Constructive suggestion is welcome.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-10 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563507#comment-15563507
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

>> One potential approach is to store hfile information in zookeeper.

-1. No Zk, please.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-10-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563463#comment-15563463
 ] 

Ted Yu commented on HBASE-14417:


While working on BackupHFileCleaner, the counterpart to 
ReplicationHFileCleaner, I notice the potential impact on the server hosting 
hbase:backup because we need to have up-to-date information on the hfiles which 
are still referenced by the incremental backup.

One potential approach is to store hfile information in zookeeper.
This would also alleviate the issue mentioned above about reducing writing 
BulkLoadDescriptor's to hbase:backup table.

Any suggestions, [~mbertozzi] [~vrodionov] ?

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v2.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514628#comment-15514628
 ] 

Ted Yu commented on HBASE-14417:


Currently there is one BulkLoadHandler inside each region server which writes 
BulkLoadDescriptor periodically to hbase:backup table.
Ideally only BulkLoadDescriptor's for tables which have gone through full 
backup should be written.

Looking for a way to pass Set of such tables to region servers so that each 
server doesn't have to poll hbase:backup table periodically.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v2.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-09-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511698#comment-15511698
 ] 

Ted Yu commented on HBASE-14417:


I was adding new test which does one more full table backup after the 
incremental restore and verifies that BulkLoadDescriptor's are cleaned up from 
hbase:backup.
It turns out the bulk loaded hfiles were renamed during the incremental restore 
which resulted in FileNotFoundException.

Filed HBASE-16672 so that the bulk loaded hfiles can be used for multiple 
restore destinations.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453795#comment-15453795
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

{quote}
What about multiple restore destinations ?
{quote}
Multiple destination support is tricky, agree, this is why we need to see full 
design doc, step-by-step, to start a productive discussion.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453774#comment-15453774
 ] 

Ted Yu commented on HBASE-14417:


bq. we need ship bulk load only once to backup destination

What about multiple restore destinations ?

I can write up some doc after getting consensus.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453728#comment-15453728
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

{quote}
Suppose there're two backup sets involving common table(s). The bulk loaded 
hfiles for the common table should be shared by the two backups.
{quote}

Its fine. We keep backup data per table (in incremental mode), means, that we 
need ship bulk load only once to backup destination. Design doc, [~tedyu]?  

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453160#comment-15453160
 ] 

Ted Yu commented on HBASE-14417:


HFile references would stay, assisted by BackupHFileCleaner.
Please read the earlier comments.

Suppose there're two backup sets involving common table(s). The bulk loaded 
hfiles for the common table should be shared by the two backups.


> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453063#comment-15453063
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

{quote}
The above is restore, right ?
For backup, we only keep hfile refs.
{quote}

No, for backup. Where are you going to keep bulkloaded files after backup is 
complete? Not sure, I am following you here.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452950#comment-15452950
 ] 

Ted Yu commented on HBASE-14417:


bq. HFiles reached the backup destination

The above is restore, right ?
For backup, we only keep hfile refs.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452938#comment-15452938
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

I thought we need these HFile references in hbase:backup only until we ship 
them (bulk loaded files) to backup destination? What do we need them for after 
backup is complete and HFiles reached the backup destination, [~tedyu]? 

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452824#comment-15452824
 ] 

Ted Yu commented on HBASE-14417:


bq. Delete file ref from hbase:backup after backup completes

Did you mean after restore completes ?
What if the user wants to restore to a different destination afterward ?
The removal of hfile ref from hbase:backup can be coupled with the deletion of 
backup(s).

Restoring incremental backup would ship hfiles along with WAL files to 
destination.

Suppose given this sequence of events where f means full backup, b means bulk 
load and i means incremental backup:
* f
* b1 (with hfile1)
* b2 (with hfile2)
* i1
* b3 (with hfile3)
* i2

The design of hfile ref in hbase:backup should make BackupHFileCleaner 
operation efficient.
If we consolidate hfile ref from b1 and b2 into i1, b3 into i2, 
BackupHFileCleaner needs to search backward (across all outstanding incremental 
backups): i2 -> i1 -> f.
If we consolidate hfile ref from b1 and b2 by merging them and storing b1', 
there is no need to search incremental backup(s).

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452786#comment-15452786
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

Delete file ref from hbase:backup after backup completes - do not keep them 
there forever.

Now we have two different types of files in backup : HFile (snapshot/full), WAL 
(incremental). Bulk load adds third type - HFile (bulk load/incremental)

*Shipping HFiles to backup destination
*New storage layout for incremental backup image, that has bulk loaded files?
*The algorithm for restore incremental backup that has both : WALs and HFiles?  

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452524#comment-15452524
 ] 

Ted Yu commented on HBASE-14417:


ReplicationHFileCleaner retrieves hfile refs from zookeeper in order to check 
for deletable files.
The new BackupHFileCleaner would retrieve hfile refs by scanning hbase:backup 
table.
The hfile refs may be stored separately if no incremental / full backup has 
been performed since the bulk load or, in manifest of some incremental backup.
Since we don't know which incremental backup manifest may contain related hfile 
ref, we need to scan backwards until one incremental backup is found or, one 
full backup is found.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449854#comment-15449854
 ] 

Ted Yu commented on HBASE-14417:


In principal, the above is in line with my earlier comments.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-30 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449794#comment-15449794
 ] 

Enis Soztutar commented on HBASE-14417:
---

We should use a similar design for this issue with "HFile replication" via 
HBASE-13153. Replication of hfiles is conceptually very similar to incremental 
bulk load, and we already have the tools. Please read the design doc and 
corresponding code. 

For this issue, we can mainly do this: 
 - Similar to replication, everytime BL happens, we create a reference per 
incremental backup as a part of the BL process. These references can be saved 
in the hbase:backup table. 
 - A custom hfile cleaner (like the ReplicationHFileCleaner) will monitor the 
bulk loaded hfiles, and makes sure that they are not deleted even after 
compactions, etc if there is still incremental backup to be performed.  
 - Incremental backup will also copy the files that are from BL by referring to 
the references saved in hbase:backup table in the next round. 

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439447#comment-15439447
 ] 

Ted Yu commented on HBASE-14417:


There may be more than one round of bulk load between the full backup and 
incremental backup(s).
For each round, we may use timestamp of completion of bulk load for the loaded 
hfiles (in terms of record in hbase:backup).

When the next incremental backup takes place, we consolidate all the recorded 
bulk loaded hfiles and save the list in manifest of the incremental backup.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437430#comment-15437430
 ] 

Ted Yu commented on HBASE-14417:


During offline discussion, Vladimir suggested that we record the list of bulk 
loaded hfiles into hbase:backup table at the end of bulk load.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414459#comment-15414459
 ] 

Ted Yu commented on HBASE-14417:


I think HBASE-14141 should be done first which would lay foundation for proper 
resolution of this issue.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-09 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414450#comment-15414450
 ] 

Devaraj Das commented on HBASE-14417:
-

The question is should we do this first or HBASE-14141 first. We need both in 
reality. We could put in a short term solution for backing up bulk-loaded data 
but wondering if we should bite the bullet and do HBASE-14141 and then this.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-08-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414447#comment-15414447
 ] 

Ted Yu commented on HBASE-14417:


Discussed with Devaraj.
Quick solution is to detect the presence of bulk loaded hfiles between full 
backup and incremental backup. If bulk load is detected, convert incremental 
backup to full backup.

Better solution, related to HBASE-14141, is to list bulk loaded hfiles during 
incremental backup and put them in their own backup image.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-07-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382805#comment-15382805
 ] 

Ted Yu commented on HBASE-14417:


If bulk loading is scheduled at regular interval, one approach is to align 
incremental backup activity with the bulk load.


> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-07-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375729#comment-15375729
 ] 

Ted Yu commented on HBASE-14417:


After registering these files, would they still participate in the next round 
of major compaction (assuming major compaction comes before the next 
incremental backup) ?

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-07-08 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368619#comment-15368619
 ] 

Vladimir Rodionov commented on HBASE-14417:
---

Bulk loading produces new HFiles, we need to register these files and move them 
to backup destination during next incremental phase. Something like this.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-07-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368585#comment-15368585
 ] 

Ted Yu commented on HBASE-14417:


Looks like bulk data loading should be applied to target table(s) as well.

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)