[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-22 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696462#comment-16696462
 ] 

Duo Zhang commented on HBASE-21154:
---

The patch v6 added a TestMigrateNamespaceTable to confirm that the migration 
could work.

[~psomogyi] please try again?

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154-v6.patch, 
> HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-22 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21154:
--
Attachment: HBASE-21154-v6.patch

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154-v6.patch, 
> HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-22 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696406#comment-16696406
 ] 

Duo Zhang commented on HBASE-21154:
---

Let me try to write an UT for the migration data.

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696375#comment-16696375
 ] 

Hudson commented on HBASE-21387:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1184 (See 
[https://builds.apache.org/job/HBase-1.2-IT/1184/])
HBASE-21387 Race condition surrounding in progress snapshot handling in 
(openinx: rev d9693394f6ff7beb99c64f2dd188aa1bdff00f18)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/EnabledTableSnapshotHandler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestSnapshotWhenChoreCleaning.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/BaseFileCleanerDelegate.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotLogCleaner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestSnapshotFromMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotHFileCleaner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotFileCache.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/snapshot/TestSnapshotFileCache.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/snapshot/TestSnapshotHFileCleaner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/TakeSnapshotHandler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/FileCleanerDelegate.java


> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot 

[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696372#comment-16696372
 ] 

Zheng Hu commented on HBASE-21387:
--

Pushed to branch-2 & branch-2.0 & branch-2.1 & master & branch-1 & branch-1.2 & 
branch-1.3 & branch-1.4 , Thanks all. 

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696374#comment-16696374
 ] 

Hudson commented on HBASE-21387:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #503 (See 
[https://builds.apache.org/job/HBase-1.3-IT/503/])
HBASE-21387 Race condition surrounding in progress snapshot handling in 
(openinx: rev 53b5abc6513ac7f716b8011a02685ae0f2dc3eef)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/FileCleanerDelegate.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotFileCache.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/EnabledTableSnapshotHandler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/TakeSnapshotHandler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotHFileCleaner.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestSnapshotWhenChoreCleaning.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/BaseFileCleanerDelegate.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/snapshot/TestSnapshotFileCache.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/snapshot/TestSnapshotHFileCleaner.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java


> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 




[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21387:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21285) Enhanced TableSnapshotInputFormat to allow a size based splitting

2018-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696322#comment-16696322
 ] 

Hadoop QA commented on HBASE-21285:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1.4 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 0s{color} | {color:green} branch-1.4 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_192 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_201 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} branch-1.4 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
33s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_192 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_201 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK v1.8.0_192 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed with JDK v1.7.0_201 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
15s{color} | {color:red} hbase-server: The patch generated 2 new + 26 unchanged 
- 1 fixed = 28 total (was 27) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
32s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m  3s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 
2.5.2 2.6.5 2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed with JDK v1.8.0_192 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_201 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}106m 
31s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}151m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:2cf636a |
| JIRA Issue | HBASE-21285 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949228/HBASE-21285.branch-1.4.004.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux fee21ef15b67 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 

[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696320#comment-16696320
 ] 

Zheng Hu commented on HBASE-21387:
--

Thanks [~elserj] and [~yuzhih...@gmail.com]'s careful reviewing.  Will commit ( 
and fix the minor change) if no other conern.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21285) Enhanced TableSnapshotInputFormat to allow a size based splitting

2018-11-22 Thread Lavinia-Stefania Sirbu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavinia-Stefania Sirbu updated HBASE-21285:
---
Attachment: HBASE-21285.branch-1.4.004.patch
Status: Patch Available  (was: Open)

> Enhanced TableSnapshotInputFormat to allow a size based splitting
> -
>
> Key: HBASE-21285
> URL: https://issues.apache.org/jira/browse/HBASE-21285
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
>Reporter: Lavinia-Stefania Sirbu
>Assignee: Lavinia-Stefania Sirbu
>Priority: Minor
> Attachments: HBASE-21285.branch-1.4.001.patch, 
> HBASE-21285.branch-1.4.002.patch, HBASE-21285.branch-1.4.003.patch, 
> HBASE-21285.branch-1.4.004.patch, HBASE-21285.master.001.patch
>
>
> Currently, all the splits generated by a snapshot are having length 0. Right 
> now, we have a configuration for the number of splits per region, but it's a 
> general one and not very helpful when the sizes for regions are really 
> different. The modification must be done in TableSnapshotInputFormatImpl 
> where the length must be computed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21285) Enhanced TableSnapshotInputFormat to allow a size based splitting

2018-11-22 Thread Lavinia-Stefania Sirbu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavinia-Stefania Sirbu updated HBASE-21285:
---
Status: Open  (was: Patch Available)

> Enhanced TableSnapshotInputFormat to allow a size based splitting
> -
>
> Key: HBASE-21285
> URL: https://issues.apache.org/jira/browse/HBASE-21285
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
>Reporter: Lavinia-Stefania Sirbu
>Assignee: Lavinia-Stefania Sirbu
>Priority: Minor
> Attachments: HBASE-21285.branch-1.4.001.patch, 
> HBASE-21285.branch-1.4.002.patch, HBASE-21285.branch-1.4.003.patch, 
> HBASE-21285.master.001.patch
>
>
> Currently, all the splits generated by a snapshot are having length 0. Right 
> now, we have a configuration for the number of splits per region, but it's a 
> general one and not very helpful when the sizes for regions are really 
> different. The modification must be done in TableSnapshotInputFormatImpl 
> where the length must be computed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21432) [hbase-connectors] Add Apache Yetus integration for hbase-connectors repository

2018-11-22 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696270#comment-16696270
 ] 

Allen Wittenauer commented on HBASE-21432:
--

The Github Pull Request Builder has been mostly replaced by the Github Branch 
Source Plugin (which is what I meant above).  The big gotcha is that gprb is 
for freestyle jobs, gbsp is for pipeline jobs.

Using Jenkins w/a Multibranch Pipeline + Github Branch Source Plugin, it 
enables a tab-based system that allows one to re-run branches, PR, and, if 
enabled, tags on demand from the Jenkins UI.  The Yetus integration that is 
done as part of YETUS-708 (and committed into my dev branch) just makes 
test-patch smarter to know it is running under Jenkins, where the PR is at, 
etc., so that there isn't a need to manually configure or pass parameters for 
details that Jenkins itself shares via environment variables.  This means that 
one stanza in the Pipeline can do both full builds and incremental/PR builds 
with no work required in the Pipeline to figure out what type of run it is.

> [hbase-connectors] Add Apache Yetus integration for hbase-connectors 
> repository 
> 
>
> Key: HBASE-21432
> URL: https://issues.apache.org/jira/browse/HBASE-21432
> Project: HBase
>  Issue Type: Task
>  Components: build, hbase-connectors
>Affects Versions: connector-1.0.0
>Reporter: Peter Somogyi
>Priority: Major
>
> Add automated testing for pull requests and patch files created for 
> hbase-connectors repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Hadoop Flags: Reviewed
Release Note: To prevent race condition between in progress snapshot 
(performed by TakeSnapshotHandler) and HFileCleaner which results in data loss, 
this JIRA introduced mutual exclusion between taking snapshot and running 
HFileCleaner. That is, at any given moment, either some snapshot can be taken 
or, HFileCleaner checks hfiles which are not referenced, but not both can be 
running.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696227#comment-16696227
 ] 

Ted Yu commented on HBASE-21387:


Looks like catching FileNotFoundException is not enough to pass the new test.

Let's go with v17.
{code}
+LOG.debug("toDeleteFiles[{}] is: " + deletableFiles.get(i));
{code}
Minor: looks like you intended to provide both index and FileStatus. There was 
only one argument above.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21432) [hbase-connectors] Add Apache Yetus integration for hbase-connectors repository

2018-11-22 Thread Peter Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696211#comment-16696211
 ] 

Peter Somogyi commented on HBASE-21432:
---

I had a discussion with [~busbey] about the Github pull request integration 
that is described on the 
[wiki|https://cwiki.apache.org/confluence/display/INFRA/GitHub+Pull+Request+Builder]
 and our concern is that retriggering a specific execution is not simple. As an 
example he mentioned Accumulo's integration where an additional job was created 
just for rerunning a failed build.

[~aw], would your plugin make reexecution of a certain build possible?

> [hbase-connectors] Add Apache Yetus integration for hbase-connectors 
> repository 
> 
>
> Key: HBASE-21432
> URL: https://issues.apache.org/jira/browse/HBASE-21432
> Project: HBase
>  Issue Type: Task
>  Components: build, hbase-connectors
>Affects Versions: connector-1.0.0
>Reporter: Peter Somogyi
>Priority: Major
>
> Add automated testing for pull requests and patch files created for 
> hbase-connectors repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696208#comment-16696208
 ] 

Hadoop QA commented on HBASE-21387:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
2s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
24s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 19s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 30s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m 21s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.snapshot.TestSnapshotWhenChoreCleaning |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21387 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949212/21387-suggest.txt |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 6ff82b279620 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5cc845b713 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 

[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-22 Thread Peter Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696204#comment-16696204
 ] 

Peter Somogyi commented on HBASE-21154:
---

Using an existing hbase-root directory fails when migrating the data and 
calling disable on hbase:namespace table. There is a check in disableTable and 
similarly in deleteTable to verify master is initialized at this point. [1]

[1] 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L2588-L2591

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-11-22 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696202#comment-16696202
 ] 

Wellington Chevreuil commented on HBASE-21505:
--

[~tianjingyun], just uploaded a temporary patch. Still working on it, tests are 
needed, but feel free to share your thoughts/concerns.

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: 
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a complete 
> inaccurate value. 
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for 
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication 
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format 
> gives the impression there’s only one, even when there are recovery queues, 
> for instance. 
> Here is a list of ideas on how the command should report under different 

[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-11-22 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21505:
-
Attachment: 0001-HBASE-21505-initial-version-for-more-detailed-report.patch

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: 
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a complete 
> inaccurate value. 
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for 
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication 
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format 
> gives the impression there’s only one, even when there are recovery queues, 
> for instance. 
> Here is a list of ideas on how the command should report under different 
> states of replication:
> a) Source started, target stopped, no edits arrived on source yet: 
> 

[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696126#comment-16696126
 ] 

Hadoop QA commented on HBASE-21154:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 24 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
51s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} The patch passed checkstyle in hbase-protocol-shaded 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} The patch passed checkstyle in hbase-common {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} The patch passed checkstyle in hbase-client {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} hbase-server: The patch generated 0 new + 865 
unchanged - 103 fixed = 865 total (was 968) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch passed checkstyle in hbase-rsgroup {color} 
|
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
2m  0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
43s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
13s{color} | 

[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: 21387-suggest.txt

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-22 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695987#comment-16695987
 ] 

Allan Yang commented on HBASE-21154:


{quote}
And for safety, maybe we should only disable the namespace table automatically 
in the code, without deleting it. And add a tool in HBCK2 for deleting unused 
system tables?
{quote}
But if the table is not deleted, the migrateNamespaceTable() will run every 
time we start master?

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21300) Fix the wrong reference file path when restoring snapshots for tables with MOB columns

2018-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695984#comment-16695984
 ] 

Hadoop QA commented on HBASE-21300:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
38s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
3s{color} | {color:red} hbase-server: The patch generated 1 new + 42 unchanged 
- 0 fixed = 43 total (was 42) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
50s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}234m 19s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}269m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.client.TestMobRestoreSnapshotFromClientAfterSplittingRegions |
|   | hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas |
|   | hadoop.hbase.client.TestCloneSnapshotFromClientAfterSplittingRegion |
|   | hadoop.hbase.client.TestMobCloneSnapshotFromClientAfterSplittingRegion |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21300 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949168/HBASE-21300.master.004.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 4fd8a0128650 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 
13:14:43 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5cc845b713 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| 

[jira] [Commented] (HBASE-21504) If enable FIFOCompactionPolicy, a compaction may write a "empty" hfile whose maxTimeStamp is long max. This kind of hfile will never be archived.

2018-11-22 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695972#comment-16695972
 ] 

Allan Yang commented on HBASE-21504:


OK, for FIFOCompactionPolicy it is possible, a UT is needed.

> If enable FIFOCompactionPolicy, a compaction may write a "empty" hfile whose 
> maxTimeStamp is long max. This kind of hfile will never be archived.
> -
>
> Key: HBASE-21504
> URL: https://issues.apache.org/jira/browse/HBASE-21504
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 2.1.0
>Reporter: xuming
>Priority: Critical
> Attachments: 1.patch
>
>
> When i use FIFOCompactionPolicy, and if all hfiles(>1) are TTL expired in a 
> region, once we do a compaction on the region, the compaction policy will 
> select the latest hfile to do compaction.But beacuse the latest hfile is 
> already TTL expired, compactor only write a "empty" hfile(whose entry counter 
> is 0 and maxTimeStamp is Long.MAX_VALUE) finally. Because maxTimeStamp is 
> long max, so the "empty" hfile will never be TTL expired, more seriously we 
> can not archive it by FIFOCompactionPolicy forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21504) If enable FIFOCompactionPolicy, a compaction may write a "empty" hfile whose maxTimeStamp is long max. This kind of hfile will never be archived.

2018-11-22 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695963#comment-16695963
 ] 

Zheng Hu commented on HBASE-21504:
--

I think the problem is: the FIFOCompactionPolicy only choose those expired 
HFiles to compact.  If a HFile with Long.MAX_VALUE timestamp,  the  
FIFOCompactionPolicy will never choose it as a candidate ? 

> If enable FIFOCompactionPolicy, a compaction may write a "empty" hfile whose 
> maxTimeStamp is long max. This kind of hfile will never be archived.
> -
>
> Key: HBASE-21504
> URL: https://issues.apache.org/jira/browse/HBASE-21504
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 2.1.0
>Reporter: xuming
>Priority: Critical
> Attachments: 1.patch
>
>
> When i use FIFOCompactionPolicy, and if all hfiles(>1) are TTL expired in a 
> region, once we do a compaction on the region, the compaction policy will 
> select the latest hfile to do compaction.But beacuse the latest hfile is 
> already TTL expired, compactor only write a "empty" hfile(whose entry counter 
> is 0 and maxTimeStamp is Long.MAX_VALUE) finally. Because maxTimeStamp is 
> long max, so the "empty" hfile will never be TTL expired, more seriously we 
> can not archive it by FIFOCompactionPolicy forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21492) CellCodec Written To WAL Before It's Verified

2018-11-22 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695951#comment-16695951
 ] 

Allan Yang commented on HBASE-21492:


The patch looks good, can you write a UT for this case, I think it is easy to 
reproduce this case.

> CellCodec Written To WAL Before It's Verified
> -
>
> Key: HBASE-21492
> URL: https://issues.apache.org/jira/browse/HBASE-21492
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.2.7, 2.0.2
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Critical
> Attachments: HBASE-21492.1.patch
>
>
> The cell codec class name is written into the WAL file, but the cell codec 
> class is not actually verified to exist.  Therefore, users can inadvertently 
> configure an invalid class name and it will be recorded into the WAL file.  
> At that point, the WAL file becomes unreadable and blocks processing of all 
> other WAL files.
> {code:java|title=AbstractProtobufLogWriter.java}
>   private WALHeader buildWALHeader0(Configuration conf, WALHeader.Builder 
> builder) {
> if (!builder.hasWriterClsName()) {
>   builder.setWriterClsName(getWriterClassName());
> }
> if (!builder.hasCellCodecClsName()) {
>   builder.setCellCodecClsName(WALCellCodec.getWALCellCodecClass(conf));
> }
> return builder.build();
>   }
> {code}
> https://github.com/apache/hbase/blob/025ddce868eb06b4072b5152c5ffae5a01e7ae30/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractProtobufLogWriter.java#L78-L86



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21504) If enable FIFOCompactionPolicy, a compaction may write a "empty" hfile whose maxTimeStamp is long max. This kind of hfile will never be archived.

2018-11-22 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695939#comment-16695939
 ] 

Allan Yang commented on HBASE-21504:


I think it is OK, if new KVs are writen to a new hfile, this empty HFile will 
compact with the new one and generate a normal HFile with kvs and the rignt 
time range.

> If enable FIFOCompactionPolicy, a compaction may write a "empty" hfile whose 
> maxTimeStamp is long max. This kind of hfile will never be archived.
> -
>
> Key: HBASE-21504
> URL: https://issues.apache.org/jira/browse/HBASE-21504
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 2.1.0
>Reporter: xuming
>Priority: Critical
> Attachments: 1.patch
>
>
> When i use FIFOCompactionPolicy, and if all hfiles(>1) are TTL expired in a 
> region, once we do a compaction on the region, the compaction policy will 
> select the latest hfile to do compaction.But beacuse the latest hfile is 
> already TTL expired, compactor only write a "empty" hfile(whose entry counter 
> is 0 and maxTimeStamp is Long.MAX_VALUE) finally. Because maxTimeStamp is 
> long max, so the "empty" hfile will never be TTL expired, more seriously we 
> can not archive it by FIFOCompactionPolicy forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-11-22 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695924#comment-16695924
 ] 

Jingyun Tian commented on HBASE-21505:
--

Yeah. I got your point. These are good for a better understanding. I'm looking 
forward to your patch.

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a complete 
> inaccurate value. 
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for 
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication 
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format 
> gives the impression there’s only one, even when there are recovery queues, 
> for instance. 
> Here is a list of ideas on how the command should report under different 
> states of replication:
> a) Source started, target stopped, no edits arrived on source yet: 
> Status replication should not show any lags, 

[jira] [Updated] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-22 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21154:
--
Attachment: HBASE-21154-v5.patch

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21498) Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a new BlockCache

2018-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695879#comment-16695879
 ] 

Hadoop QA commented on HBASE-21498:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
50s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
10s{color} | {color:red} hbase-server: The patch generated 1 new + 216 
unchanged - 6 fixed = 217 total (was 222) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 22s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
5s{color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 
1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}127m 47s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}164m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  instanceof will always return true for all non-null values in new 
org.apache.hadoop.hbase.regionserver.HRegionServer(Configuration), since all 
org.apache.hadoop.hbase.regionserver.HRegionServer are instances of 
org.apache.hadoop.hbase.regionserver.HRegionServer  At HRegionServer.java:for 
all non-null values in new 
org.apache.hadoop.hbase.regionserver.HRegionServer(Configuration), since all 
org.apache.hadoop.hbase.regionserver.HRegionServer are instances of 
org.apache.hadoop.hbase.regionserver.HRegionServer  At HRegionServer.java:[line 
601] |
| Failed junit tests | hadoop.hbase.master.TestMasterNotCarryTable |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21498 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949167/HBASE-21498.master.005.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 373254b6cc57 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 

[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695873#comment-16695873
 ] 

Hadoop QA commented on HBASE-21387:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1.2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 4s{color} | {color:green} branch-1.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_192 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_201 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} branch-1.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
16s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_192 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_201 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK v1.8.0_192 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK v1.7.0_201 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
11s{color} | {color:red} hbase-server: The patch generated 6 new + 242 
unchanged - 2 fixed = 248 total (was 244) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
13s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 49s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 
2.5.2 2.6.5 2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed with JDK v1.8.0_192 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK v1.7.0_201 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 75m 
59s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}107m 20s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:34a9b27 |
| JIRA Issue | HBASE-21387 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949178/HBASE-21387.branch-1.2.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9ffddac44f9b 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 

[jira] [Commented] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-11-22 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695837#comment-16695837
 ] 

Wellington Chevreuil commented on HBASE-21505:
--

Thanks for the comments [~tianjingyun]! 
{quote}these metrics are mainly to indicate if current replication source is 
stuck.That's why we need to update the TimeStampsOfLastShippedOp even there is 
no log replicated
{quote}
Yeah, my proposal here is to make these metrics consistent both on the 
behaviour and name accuracy, currently for example, both 
TimeStampsOfLastShippedOp and AgeOfLastShippedOp are misleading, and it also 
seems to have some bugs on its calculation (see #4 from original description).
{quote}And AgeOfLastShippedOp = current time stamp - timeStampOfLastShippedOp
{quote}
I understand that, and I think this is one the things that makes these metrics 
inconsistent, as sometimes it may not really be the age of last shipped if we 
keep updating *timeStampOfLastShippedOp* even when no edits came in. This will 
cause this metric to sometimes report one thing (*timeStampOfLastShippedOp* was 
really updated with a new edit), other times it will report nothing (when 
*timeStampOfLastShippedOp* just gets updated because of another cycle of the 
source run, but no edits came in).

And things get even worse when recovery queues are in the picture, but we don't 
show those. Have seen lots of confusion from users, and was thinking in 
refining the command, together with new metrics, to give a more detailed and 
intuitive output about replication status.
{quote}Are you planning add more metrics to help calculate replication lag? My 
concern is we need to make sure the replication lag could really indicate the 
status of replication is normal or not.
{quote}
Yes, per my previous comments, idea is to provide a more intuitive "feedback" 
about replication source state under different scenarios. The main metric to 
identify stuck replications would still be *Replication Lag*. A new metric to 
track when a last edit came into source would be added, this would be 
*TimeStampOfLastArrivedInSource*, per my previous comment suggestions.  
*TimeStampOfLastArrivedInSource* would be used to calculate the lag, so that we 
don't need to keep updating *timeStampOfLastShippedOp* (even when no edit was 
really shipped) anymore. It would allow us to consistently report the actual 
time an edit was last shipped to target. *AgeOfLastShippedOp* would also be 
more consistent, as it would show how much time last edit took between when it 
got into source and when it was shipped to target. These would also give 
additional facts on how replication is performing, for example, in bullet #*e* 
from my previous comment, we show there's no lag, and last shipped spent 30 
secs to complete replication. Also, I think it avoids confusion to explicit say 
if no edits were ever shipped or arrived in source since last source RS restart 
than reporting an arbitrary date (in this case, RS restart date/replication 
source last run date).

I'm still working on this, currently need to perform more tests, but can share 
an initial patch for review.

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems 

[jira] [Commented] (HBASE-21503) Replication normal source can get stuck due potential race conditions between source wal reader and wal provider initialization threads.

2018-11-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695801#comment-16695801
 ] 

Hudson commented on HBASE-21503:


Results for branch master
[build #622 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/622/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/622//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/622//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/622//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Replication normal source can get stuck due potential race conditions between 
> source wal reader and wal provider initialization threads.
> 
>
> Key: HBASE-21503
> URL: https://issues.apache.org/jira/browse/HBASE-21503
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21503-master.001.patch, HBASE-21503.patch
>
>
> Noticed replication sources could get stuck while doing some tests that 
> involved RS restart. On these cases, upon RS restart, the newly created 
> normal source was reaching wal end and not recognising it was open for write, 
> what leads to remove it from source queue. Thus, no new OP get's replicated 
> unless this log goes to a recovery queue.
> Checking this further, my understanding is that, during restart, RS will 
> start replication services, which inits ReplicationSourceManager and 
> ReplicationSources for each wal group id, in below sequence:
> {noformat}
> HRegionServer -> Replication.startReplicationService() -> 
> ReplicationSourceManager.init() -> add ReplicationSource 
> {noformat}
> At this point, ReplicationSources have no paths yet, so WAL reader thread is 
> not running. ReplicationSourceManager is registered as a WAL listener, in 
> order to get notified whenever new wal file is available. During 
> ReplicationSourceManager and ReplicationSource instances creation, a 
> WALFileLengthProvider instance is obtained from WALProvider and cached by 
> both ReplicationSourceManager and ReplicationSource. The default 
> implementation for this WALFileLengthProvider is below, on WALProvider 
> interface:
> {noformat}
> default WALFileLengthProvider getWALFileLengthProvider() {
> return path -> getWALs().stream().map(w -> 
> w.getLogFileSizeIfBeingWritten(path))
> .filter(o -> o.isPresent()).findAny().orElse(OptionalLong.empty());
>   } 
> {noformat}
> Notice that if WALProvider.getWALs returns an empty list, this 
> WALFileLengthProvider instance is always going to return nothing. This is 
> relevant because when ReplicationSource finally starts 
> ReplicationSourceWALReader thread, it passes this WALFileLengthProvider, 
> which is used by WALEntryStream (inside the wal reader) to determine if wal 
> is being written (and should be kept in the queue) here:
> {noformat}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   boolean beingWritten = readNextEntryAndRecordReaderPosition();
>   LOG.trace("reading wal file {}. Current open for write: {}", 
> this.currentPath, beingWritten);
>   if (currentEntry == null && !beingWritten) {
> // no more entries in this log file, and the file is already closed, 
> i.e, rolled
> // Before dequeueing, we should always get one more attempt at 
> reading.
> // This is in case more entries came in after we opened the reader, 
> and the log is rolled
> // while we were reading. See HBASE-6758
> resetReader();
> readNextEntryAndRecordReaderPosition();
> if (currentEntry == null) {
>   if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
> dequeueCurrentLog();
> if (openNextLog()) {
>   readNextEntryAndRecordReaderPosition();
> }
>   }
> }
>   }
> ...
> {noformat}
> Here code snippet for WALEntryStream.readNextEntryAndRecordReaderPosition() 
> method that relies on the WALFileLengthProvider:
> {noformat}
> ...
> 

[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21387:
-
Attachment: HBASE-21387.branch-1.2.patch
HBASE-21387.branch-1.3.patch
HBASE-21387.branch-1.patch

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387.dbg.txt, 21387.v10.txt, 
> 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, HBASE-21387.branch-1.2.patch, 
> HBASE-21387.branch-1.3.patch, HBASE-21387.branch-1.patch, 
> HBASE-21387.v13.patch, HBASE-21387.v14.patch, HBASE-21387.v15.patch, 
> HBASE-21387.v16.patch, HBASE-21387.v17.patch, two-pass-cleaner.v4.txt, 
> two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695767#comment-16695767
 ] 

Zheng Hu commented on HBASE-21387:
--

HBASE-16490 did not apply to branch-1.3 & branch-1.2 ,  and branch-1.2 is diff 
with branch-1.3. so need to prepare 3 patches for branch-1/branch-1.4,  
branch-1.3 , branch-1.2: 

* HBASE-21387.branch-1.patch can be applied to branch-1 & branch-1.4 
* HBASE-21387.branch-1.3.patch can be applied to branch-1.3; 
* HBASE-21387.branch-1.2.patch can be applied to branch-1.2

Let's wait the Hadoop QA. 



> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387.dbg.txt, 21387.v10.txt, 
> 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, HBASE-21387.v13.patch, 
> HBASE-21387.v14.patch, HBASE-21387.v15.patch, HBASE-21387.v16.patch, 
> HBASE-21387.v17.patch, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21300) Fix the wrong reference file path when restoring snapshots for tables with MOB columns

2018-11-22 Thread Yi Mei (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Mei updated HBASE-21300:
---
Attachment: HBASE-21300.master.004.patch

> Fix the wrong reference file path when restoring snapshots for tables with 
> MOB columns
> --
>
> Key: HBASE-21300
> URL: https://issues.apache.org/jira/browse/HBASE-21300
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Attachments: HBASE-21300.master.001.patch, 
> HBASE-21300.master.002.patch, HBASE-21300.master.003.patch, 
> HBASE-21300.master.004.patch
>
>
> When restoring snapshots for tables with MOB columns, the reference files for 
> mob region are created under hbase root dir, rather than restore dir.
> Some of the mob reference file paths are as follows:
> {quote}hdfs:/7ae0d109-3ca4-d0e7-7250-62ed234ab247/mobdir/data/ns_testMob/testMob
> hdfs:/7ae0d109-3ca4-d0e7-7250-62ed234ab247/mobdir/data/ns_testMob/testMob/057a856eb65753c6e6bdb168ba58a0b2
> hdfs:/7ae0d109-3ca4-d0e7-7250-62ed234ab247/mobdir/data/ns_testMob/testMob/057a856eb65753c6e6bdb168ba58a0b2/A
> hdfs:/7ae0d109-3ca4-d0e7-7250-62ed234ab247/mobdir/data/ns_testMob/testMob/057a856eb65753c6e6bdb168ba58a0b2/A/d41d8cd98f00b204e9800998ecf8427e201810120fc8e2446f174598a7280a81b1134cee
> hdfs:/7ae0d109-3ca4-d0e7-7250-62ed234ab247/mobdir/data/ns_testMob/testMob/057a856eb65753c6e6bdb168ba58a0b2/A/ns_testMob=testMob=057a856eb65753c6e6bdb168ba58a0b2-d41d8cd98f00b204e9800998ecf8427e201810120fc8e2446f174598a7280a81b1134cee
> {quote}
> The restore dir files are as follows:
> {quote}hdfs://hbase/.tmpdir-to-restore-snapshot/856e06fa-e018-4e95-9647-2cfbd5161e7e
> hdfs://hbase/.tmpdir-to-restore-snapshot/856e06fa-e018-4e95-9647-2cfbd5161e7e/data
> hdfs://hbase/.tmpdir-to-restore-snapshot/856e06fa-e018-4e95-9647-2cfbd5161e7e/data/ns_testMob
> hdfs://hbase/.tmpdir-to-restore-snapshot/856e06fa-e018-4e95-9647-2cfbd5161e7e/data/ns_testMob/testMob
> hdfs://hbase/.tmpdir-to-restore-snapshot/856e06fa-e018-4e95-9647-2cfbd5161e7e/data/ns_testMob/testMob/ecdf66f0d8c09a816faf37336ad262e1
> hdfs://hbase/.tmpdir-to-restore-snapshot/856e06fa-e018-4e95-9647-2cfbd5161e7e/data/ns_testMob/testMob/ecdf66f0d8c09a816faf37336ad262e1/.regioninfo
> hdfs://hbase/.tmpdir-to-restore-snapshot/856e06fa-e018-4e95-9647-2cfbd5161e7e/data/ns_testMob/testMob/ecdf66f0d8c09a816faf37336ad262e1/A
> hdfs://hbase/.tmpdir-to-restore-snapshot/856e06fa-e018-4e95-9647-2cfbd5161e7e/data/ns_testMob/testMob/ecdf66f0d8c09a816faf37336ad262e1/A/ns_testMob=testMob=ecdf66f0d8c09a816faf37336ad262e1-7208172df03b46518370643aa28ffd05
> {quote}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21498) Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a new BlockCache

2018-11-22 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21498:
---
Attachment: HBASE-21498.master.005.patch

> Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a 
> new BlockCache
> --
>
> Key: HBASE-21498
> URL: https://issues.apache.org/jira/browse/HBASE-21498
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21498.master.001.patch, 
> HBASE-21498.master.002.patch, HBASE-21498.master.003.patch, 
> HBASE-21498.master.004.patch, HBASE-21498.master.005.patch
>
>
> In our cluster, we use a small heap/offheap config for master. After 
> HBASE-21290, master doesn't instantiate BlockCache when it not carry table. 
> But it will new CacheConfig in SplitTableRegionProcedure.splitStoreFiles 
> method. And it will instantiate a new BlockCache if it not initialized before 
> and make master OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21387:
-
Fix Version/s: 1.2.10
   2.1.2
   1.4.9
   2.0.3
   2.2.0
   1.3.3
   1.5.0
   3.0.0

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387.dbg.txt, 21387.v10.txt, 
> 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, HBASE-21387.v13.patch, 
> HBASE-21387.v14.patch, HBASE-21387.v15.patch, HBASE-21387.v16.patch, 
> HBASE-21387.v17.patch, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21498) Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a new BlockCache

2018-11-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695742#comment-16695742
 ] 

Guanghao Zhang commented on HBASE-21498:


{quote}Even if there are n RSs, we will end up creating one BC only and share.

A bit strange this sequence.

Passing around the static singleton object looks a bit odd.
{quote}
Yes... I thought we need a refactor for CacheConfig and BlockCache.

> Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a 
> new BlockCache
> --
>
> Key: HBASE-21498
> URL: https://issues.apache.org/jira/browse/HBASE-21498
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21498.master.001.patch, 
> HBASE-21498.master.002.patch, HBASE-21498.master.003.patch, 
> HBASE-21498.master.004.patch
>
>
> In our cluster, we use a small heap/offheap config for master. After 
> HBASE-21290, master doesn't instantiate BlockCache when it not carry table. 
> But it will new CacheConfig in SplitTableRegionProcedure.splitStoreFiles 
> method. And it will instantiate a new BlockCache if it not initialized before 
> and make master OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695739#comment-16695739
 ] 

Hadoop QA commented on HBASE-21154:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 24 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
37s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
19s{color} | {color:red} hbase-server: The patch generated 1 new + 865 
unchanged - 103 fixed = 866 total (was 968) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
46s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 15s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
1m 59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
42s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
15s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}122m 
49s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
21s{color} | {color:green} hbase-rsgroup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
56s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}192m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker 

[jira] [Commented] (HBASE-21401) Sanity check in BaseDecoder#parseCell

2018-11-22 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695666#comment-16695666
 ] 

Zheng Hu commented on HBASE-21401:
--

bq. So now this patch is in master branch only right?

The master has been reverted too, you can see [1].  and the benchmark data can 
be seen in HBASE-21459.  IMO,  the checkKeyValueBytes cost about ~ 10e-4 ms per 
operation , I think it's acceptable..  How do you guys think ? 

1. 
https://github.com/apache/hbase/commit/362b5dd25980af11ed6dde9c046bb10893a20a56

> Sanity check in BaseDecoder#parseCell
> -
>
> Key: HBASE-21401
> URL: https://issues.apache.org/jira/browse/HBASE-21401
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21401.v1.patch, HBASE-21401.v2.patch, 
> HBASE-21401.v3.patch, HBASE-21401.v4.patch, HBASE-21401.v4.patch, 
> HBASE-21401.v5.patch
>
>
> In KeyValueDecoder & ByteBuffKeyValueDecoder,  we pass a byte buffer to 
> initialize the Cell without a sanity check (check each field's offset 
> exceed the byte buffer or not), so ArrayIndexOutOfBoundsException may happen 
> when read the cell's fields, such as HBASE-21379,  it's hard to debug this 
> kind of bug. 
> An earlier check will help to find such kind of bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)