[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-04 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708938#comment-16708938
 ] 

Josh Elser commented on HBASE-21544:


{quote}in a long term I do think we should remove the hflush check for writing 
recovered edits
{quote}
Thanks for the input, Duo! Let me spin out a second issue to look at doing 
that. Shouldn't be hard to push down into the WAL writer classes via 
WALFactory. (famous last words...)

Putting up a v2 shortly to address the checkstyle complaints while I hope 
[~reidchan] and/or [~zyork] can make sure I did the original work justice ;)

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.0.4
>
> Attachments: HBASE-20734.001.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708072#comment-16708072
 ] 

Hadoop QA commented on HBASE-21544:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 7s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
30s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
43s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 4s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
21s{color} | {color:red} hbase-server: The patch generated 2 new + 416 
unchanged - 6 fixed = 418 total (was 422) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
58s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
30s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}118m 
53s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21544 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950464/HBASE-20734.001.branch-2.0.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 80c8f554e13a 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 8e36aae9d3 |
| maven | version: Apache Maven 

[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708012#comment-16708012
 ] 

Duo Zhang commented on HBASE-21544:
---

Agree that for recovered edits we do not need the FileSystem to support hflush. 
Even if now we think HBASE-20734 is a 'better' solution since S3 is a bit 
slow(?), but in a long term I do think we should remove the hflush check for 
writing recovered edits.

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.0.4
>
> Attachments: HBASE-20734.001.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707951#comment-16707951
 ] 

Josh Elser commented on HBASE-21544:


.001 has the first stab at a backport. The modified tests by this patch are 
passing, letting HadoopQA tell me about the rest of the changes.

Note to self: amend the commit message to have the new Jira issue key.

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.0.4
>
> Attachments: HBASE-20734.001.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707782#comment-16707782
 ] 

Josh Elser commented on HBASE-21544:


{quote}Sure. Changed my mind. Can push out a 2.0.4 in a few weeks w/ 
HBASE-20734 in it if that helps.
{quote}
Alright, let me put up a patch for QA.

Your normal cadence should be sufficient. The workaround is to just turn off 
the "check" for hflush that MikeD added after HBASE-18784. Can put together 
relnotes too.

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707776#comment-16707776
 ] 

stack commented on HBASE-21544:
---

Sure. Changed my mind [~elserj] Can push out a 2.0.4 in a few weeks w/ 
HBASE-20734 in it if that helps.

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707699#comment-16707699
 ] 

Josh Elser commented on HBASE-21544:


{quote}Does it only do that if the underlying FileSystem supports hflush?
{quote}
I've only made a cursory glance at the HDFS code – I don't believe this is a 
cross-implementation "guarantee", but rather a "wink-nod" type situation where 
all {{close()}} implementations just make this guarantee. In other words, it's 
not that {{close()}} calls an {{hflush()}} outwardly, but just makes sure that 
it happens internally. I would have to dig more deeply to give you a more 
informed answer – Enis had mentioned he thought close and hflush give the same 
semantics, and I confirmed with an HDFS dev (Jitendra Pandey).
{quote}Could we just backport the fix for HBASE-20734 to branch-2.0 and call it 
a day?
{quote}
That would be another way to do it. [~stack] you had originally said "no" to 
HBASE-20734 for branch-2.0. In light to the error described on this Jira issue, 
might you change your mind?

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out 

[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707696#comment-16707696
 ] 

Sean Busbey commented on HBASE-21544:
-

bq. FSDataOutputStream (assuming that's what you meant by FileSystem.close()) 
doesn't say anything in terms of Javadoc, but the implementation is such that 
close() makes the same guarantees as hflush().

Does it only do that if the underlying FileSystem supports hflush?

{quote}
bq. I thought recovered edits now go to the same FileSystem as the WAL? 
wouldn't that imply that hflush should be present?

Ah, this didn't land on 2.0.x. Yes, that would have precluded the need for such 
a change.

Semantics are that it would be good to make sure that we aren't over-requiring 
from our filesystem, but you are correct in that this is less of a concern in 
newer versions since the durability required of the FS by WALs is more than 
that for recovered.edits 
{quote}

Sure. I just worry about too many configuration knobs. Could we just backport 
the fix for HBASE-20734 to branch-2.0 and call it a day?

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for 

[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707692#comment-16707692
 ] 

Josh Elser commented on HBASE-21544:


{quote}Edit: I see you're talking about WASB. Does that support hflush?
{quote}
Yeah, all of their stuff does (if you have it configured a certain way, at 
least). This was observed when recovered.edits were going to a part of the 
FileSystem which didnt' support hflush.
{quote}I think HBASE-20734 should fix this case since HDFS will have hflush 
capability.

I thought recovered edits now go to the same FileSystem as the WAL? wouldn't 
that imply that hflush should be present?
{quote}
Ah, this didn't land on 2.0.x. Yes, that would have precluded the need for such 
a change.

Semantics are that it would be good to make sure that we aren't over-requiring 
from our filesystem, but you are correct in that this is less of a concern in 
newer versions since the durability required of the FS by WALs is more than 
that for recovered.edits :)
{quote}what does the contract for FileSystem.close say about data persistence?
{quote}
FSDataOutputStream (assuming that's what you meant by {{FileSystem.close()}}) 
doesn't say anything in terms of Javadoc, but the implementation is such that 
{{close()}} makes the same guarantees as {{hflush()}}.

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the 

[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707657#comment-16707657
 ] 

Sean Busbey commented on HBASE-21544:
-

what does the contract for FileSystem.close say about data persistence?

I thought recovered edits now go to the same FileSystem as the WAL? wouldn't 
that imply that hflush should be present?

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707642#comment-16707642
 ] 

Zach York commented on HBASE-21544:
---

[~elserj] are you guys storing the WAL on HDFS? In that case, I think 
HBASE-20734 should fix this case since HDFS will have hflush capability.

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)