[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024308#comment-16024308
 ] 

Hadoop QA commented on HBASE-18099:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
7s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 45s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 120m 29s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 158m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestAsyncRegionAdminApi |
|   | hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869795/18099.v4.txt |
| JIRA Issue | HBASE-18099 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux c266884f4efd 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / dc1065a |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6939/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/6939/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 

[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024208#comment-16024208
 ] 

Jerry He commented on HBASE-18099:
--

+1 on v4. 
Region is LimitedPrivate CP.  You may need to find out which versions the fix 
can go into.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Critical
> Attachments: 18099.v1.txt, 18099.v2.txt, 18099.v3.txt, 18099.v4.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024157#comment-16024157
 ] 

Jerry He commented on HBASE-18099:
--

bq. This check acts as safety guard. Normally waitForFlushes() would get into 
loop and return non-zero value.
There is no really a need to have this safety guard. What is the concrete case 
which it guards against?
It is also a little strange to use a duration/time-lapse as indicator if an op 
is successful or not.  It can be zero but still valid.
For example, in the patch:
{code}
99  FlushResult res = region.flush(true);
100 if (res.getResult() == FlushResult.Result.CANNOT_FLUSH) {
101   long duration = region.waitForFlushes();
{code}
Between line 100 and line 101, the third-party flush can be over and 
writestate.flushing set to false.  Then the waitForFlushes is a no-op with 
duration zero. We don't want to fail the snapshot in this case.

We can just let region.waitForFlushes() be void and call it in line 101 without 
checking anything.
Please add a comment above line 100 to tell why we need to do line 101 -- there 
may be another ongoing flush for the same region we want to wait for.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt, 18099.v2.txt, 18099.v3.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024142#comment-16024142
 ] 

Hadoop QA commented on HBASE-18099:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 5s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 29s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 122m 49s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 161m 20s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869758/18099.v3.txt |
| JIRA Issue | HBASE-18099 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 57e38a7af78c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 837bb9e |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6933/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6933/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> FlushSnapshotSubprocedure should check the return value from Region#flush()
> 

[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023940#comment-16023940
 ] 

Jerry He commented on HBASE-18099:
--

I still don't understand.  As an example, there is a read only table, 
writestate.writesEnabled is false. Then region flush will return CANNOT_FLUSH 
right away.
This snapshot should succeed, but now it will fail. 
CANNOT_FLUSH_MEMSTORE_EMPTY checks the size of the memstore, which we have not 
come to that point.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt, 18099.v2.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023894#comment-16023894
 ] 

Ted Yu commented on HBASE-18099:


bq. Either there is nothing to flush
If there is nothing to flush, FlushResult.Result.CANNOT_FLUSH wouldn't have 
been returned - CANNOT_FLUSH_MEMSTORE_EMPTY would be returned.

This check acts as safety guard. Normally waitForFlushes() would get into loop 
and return non-zero value.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt, 18099.v2.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023873#comment-16023873
 ] 

Jerry He commented on HBASE-18099:
--

It is valid. Either there is nothing to flush (read only table). or it is a 
replica region, which is probably not the the snapshot regions anyway.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt, 18099.v2.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023763#comment-16023763
 ] 

Ted Yu commented on HBASE-18099:


Return value of 0 indicates that there is not flush (e.g. writestate.readOnly 
being true).
This allows the subprocedure to know whether the flush actually completes or 
not.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt, 18099.v2.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023752#comment-16023752
 ] 

Jerry He commented on HBASE-18099:
--

Why do you want to fail the snapshot if the wait time is 0?

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt, 18099.v2.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023717#comment-16023717
 ] 

Ted Yu commented on HBASE-18099:


Ran the timed out tests above with patch v2 locally which all passed.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt, 18099.v2.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023680#comment-16023680
 ] 

Hadoop QA commented on HBASE-18099:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
51s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
30m 36s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 101m 56s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 
2s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 148m 1s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.wal.TestWALFiltering |
|   | org.apache.hadoop.hbase.wal.TestWALSplitCompressed |
|   | org.apache.hadoop.hbase.TestAcidGuarantees |
|   | org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot |
|   | org.apache.hadoop.hbase.TestLocalHBaseCluster |
|   | org.apache.hadoop.hbase.wal.TestBoundedRegionGroupingStrategy |
|   | org.apache.hadoop.hbase.wal.TestFSHLogProvider |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.10.1 Server=1.10.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869705/18099.v2.txt |
| JIRA Issue | HBASE-18099 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux f0107bc28d64 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 
24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |

[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023554#comment-16023554
 ] 

Ted Yu commented on HBASE-18099:


I ran patch v2 thru all snapshot unit tests which passed.

>From 
>hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestMobSnapshotFromClient-output.txt
> :
{code}
2017-05-24 19:05:34,765 DEBUG [MemStoreFlusher.1] 
regionserver.HRegionFileSystem(462): Committing store file 
hdfs://localhost:34968/user/hbase/test-data/f01c3a23-186d-4e09-9b8b-fd9b60840fda/data/default/test/55ad54ef83710bca3ffe6c5bf935abb2/.tmp/fam/22c69e9c6817408f952bba14175fc7c7
 as 
hdfs://localhost:34968/user/hbase/test-data/f01c3a23-186d-4e09-9b8b-fd9b60840fda/data/default/test/55ad54ef83710bca3ffe6c5bf935abb2/fam/22c69e9c6817408f952bba14175fc7c7
2017-05-24 19:05:34,773 INFO  [MemStoreFlusher.1] regionserver.HStore(1010): 
Added 
hdfs://localhost:34968/user/hbase/test-data/f01c3a23-186d-4e09-9b8b-fd9b60840fda/data/default/test/55ad54ef83710bca3ffe6c5bf935abb2/fam/22c69e9c6817408f952bba14175fc7c7,
 entries=2048, sequenceid=27, filesize=245.0 K
2017-05-24 19:05:34,775 INFO  [MemStoreFlusher.1] regionserver.HRegion(2742): 
Finished memstore flush of ~64 KB/65536, currentsize=37.25 KB/38144 for region 
test,5,1495652731733.55ad54ef83710bca3ffe6c5bf935abb2. in 473ms, sequenceid=27, 
compaction requested=false
2017-05-24 19:05:34,775 DEBUG 
[rs(cn012.l42scl.hortonworks.com,35700,1495652693950)-snapshot-pool38-thread-2] 
snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask(107): Waited 120 ms for 
flush to complete
{code}

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt, 18099.v2.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023405#comment-16023405
 ] 

Hadoop QA commented on HBASE-18099:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
57s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 0s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 111m 33s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 150m 21s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869676/18099.v1.txt |
| JIRA Issue | HBASE-18099 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux b91dac468928 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 64c7017 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6926/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6926/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> FlushSnapshotSubprocedure should check the return value from Region#flush()
> 

[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023396#comment-16023396
 ] 

Jerry He commented on HBASE-18099:
--

Failing the snapshot is the last resort.  Could we try some other approach?

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023166#comment-16023166
 ] 

Ted Yu commented on HBASE-18099:


There is Region#waitForFlushesAndCompactions() but it waits for compaction(s) 
to finish as well.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18099) FlushSnapshotSubprocedure should check the return value from Region#flush()

2017-05-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023129#comment-16023129
 ] 

Ted Yu commented on HBASE-18099:


CANNOT_FLUSH has two possibilities:
. writes not enabled
. already flushing

Patch v1 is tentative fix.
We can introduce another enum for the case of already flushing but that is 
incompatible change.

> FlushSnapshotSubprocedure should check the return value from Region#flush()
> ---
>
> Key: HBASE-18099
> URL: https://issues.apache.org/jira/browse/HBASE-18099
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 18099.v1.txt
>
>
> In the following thread:
> http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
> Jacob described the scenario where data from certain region were missing in 
> the snapshot.
> Here was related region server log:
> https://pastebin.com/1ECXjhRp
> He pointed out that concurrent flush from MemStoreFlusher.1 thread was not 
> initiated from the thread pool for snapshot.
> In RegionSnapshotTask#call() method there is this:
> {code}
>   region.flush(true);
> {code}
> The return value is not checked.
> In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> {code}
>   String msg = "Not flushing since "
>   + (writestate.flushing ? "already flushing"
>   : "writes not enabled");
> {code}
> This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for 
> the concurrent flush to complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)