[jira] [Comment Edited] (RATIS-692) RaftStorageDirectory.tryLock throws a very deep IOException

2019-09-28 Thread Clay B. (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940046#comment-16940046
 ] 

Clay B. edited comment on RATIS-692 at 9/28/19 2:56 PM:


Thanks [~szetszwo]; this solved the issue in my tests. Also the code looks very 
nice and obvious as to what it is doing.


was (Author: clayb):
Thanks [~szetszwo]; this solved this issue in my tests.

> RaftStorageDirectory.tryLock throws a very deep IOException
> ---
>
> Key: RATIS-692
> URL: https://issues.apache.org/jira/browse/RATIS-692
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Clay B.
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r692_20190928.patch
>
>
> Working with our Namazu infrastructure, the first issue I hit when dialing up 
> the faulty I/O injection rate is as follows:
> {code}
> 2019-09-27 14:13:45 ERROR RaftStorageDirectory:336 - Failed to acquire lock 
> on 
> /home/vagrant/test_data/data0_slowed/64656d6f-5261-6674-4772-6f7570313233/in_use.lock.
>  If this storage directory is mounted via NFS, ensure that the appropriate 
> nfs lock services are running.
> java.io.IOException: Input/output error
> at java.io.RandomAccessFile.writeBytes(Native Method)
> at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
> at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
> at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:63)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:109)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Exception in thread "main" java.io.IOException: Input/output error
> at java.io.RandomAccessFile.writeBytes(Native Method)
> at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
> at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
> at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:63)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:109)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> It looks like the call chain does not re-try anywhere however.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-692) RaftStorageDirectory.tryLock throws a very deep IOException

2019-09-28 Thread Clay B. (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940046#comment-16940046
 ] 

Clay B. commented on RATIS-692:
---

Thanks [~szetszwo]; this solved this issue in my tests.

> RaftStorageDirectory.tryLock throws a very deep IOException
> ---
>
> Key: RATIS-692
> URL: https://issues.apache.org/jira/browse/RATIS-692
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Clay B.
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r692_20190928.patch
>
>
> Working with our Namazu infrastructure, the first issue I hit when dialing up 
> the faulty I/O injection rate is as follows:
> {code}
> 2019-09-27 14:13:45 ERROR RaftStorageDirectory:336 - Failed to acquire lock 
> on 
> /home/vagrant/test_data/data0_slowed/64656d6f-5261-6674-4772-6f7570313233/in_use.lock.
>  If this storage directory is mounted via NFS, ensure that the appropriate 
> nfs lock services are running.
> java.io.IOException: Input/output error
> at java.io.RandomAccessFile.writeBytes(Native Method)
> at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
> at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
> at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:63)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:109)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Exception in thread "main" java.io.IOException: Input/output error
> at java.io.RandomAccessFile.writeBytes(Native Method)
> at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
> at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
> at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:63)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:109)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> It looks like the call chain does not re-try anywhere however.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-696) RaftStorageDirectory.getLogSegmentFiles throws a deep IOException

2019-09-28 Thread Clay B. (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clay B. updated RATIS-696:
--
Description: 
In {{RaftStorageDirectory.getLogSegmentFiles()}} one can hang while creating a 
{{Files.newDirectoryStream}}. If one gets an {{IOException}} the server will 
simply hang at this point.


{code:java}
Exception in thread "main" java.nio.file.FileSystemException: 
/home/vagrant/test_data/data2_slowed/64656d6f-5261-6674-4772-6f7570313233/current:
 Input/output error
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at 
org.apache.ratis.server.storage.RaftStorageDirectory.getLogSegmentFiles(RaftStorageDirectory.java:200)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:223)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
at 
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:191)
at org.apache.ratis.server.impl.ServerState.(ServerState.java:121)
at 
org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}

  was:In {{RaftStorageDirectory.getLogSegmentFiles()}} one can hang while 
creating a {{Files.newDirectoryStream}}. If one gets an {{IOException}} the 
server will simply hang at this point.


> RaftStorageDirectory.getLogSegmentFiles throws a deep IOException
> -
>
> Key: RATIS-696
> URL: https://issues.apache.org/jira/browse/RATIS-696
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Clay B.
>Priority: Major
>
> In {{RaftStorageDirectory.getLogSegmentFiles()}} one can hang while creating 
> a {{Files.newDirectoryStream}}. If one gets an {{IOException}} the server 
> will simply hang at this point.
> {code:java}
> Exception in thread "main" java.nio.file.FileSystemException: 
> /home/vagrant/test_data/data2_slowed/64656d6f-5261-6674-4772-6f7570313233/current:
>  Input/output error
> at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at 
> sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
> at java.nio.file.Files.newDirectoryStream(Files.java:457)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.getLogSegmentFiles(RaftStorageDirectory.java:200)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:223)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
> at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
> at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:191)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:121)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-696) RaftStorageDirectory.getLogSegmentFiles throws a deep IOException

2019-09-28 Thread Clay B. (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clay B. updated RATIS-696:
--
Summary: RaftStorageDirectory.getLogSegmentFiles throws a deep IOException  
(was: RaftStorageDirectory.getLogSegmentFiles)

> RaftStorageDirectory.getLogSegmentFiles throws a deep IOException
> -
>
> Key: RATIS-696
> URL: https://issues.apache.org/jira/browse/RATIS-696
> Project: Ratis
>  Issue Type: Sub-task
>Reporter: Clay B.
>Priority: Major
>
> In {{RaftStorageDirectory.getLogSegmentFiles()}} one can hang while creating 
> a {{Files.newDirectoryStream}}. If one gets an {{IOException}} the server 
> will simply hang at this point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-692) RaftStorageDirectory.tryLock throws a very deep IOException

2019-09-28 Thread Clay B. (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clay B. updated RATIS-692:
--
Summary: RaftStorageDirectory.tryLock throws a very deep IOException  (was: 
RaftStorageDirectory.tryLock throws a very deep re-tried IOException)

> RaftStorageDirectory.tryLock throws a very deep IOException
> ---
>
> Key: RATIS-692
> URL: https://issues.apache.org/jira/browse/RATIS-692
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Clay B.
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r692_20190928.patch
>
>
> Working with our Namazu infrastructure, the first issue I hit when dialing up 
> the faulty I/O injection rate is as follows:
> {code}
> 2019-09-27 14:13:45 ERROR RaftStorageDirectory:336 - Failed to acquire lock 
> on 
> /home/vagrant/test_data/data0_slowed/64656d6f-5261-6674-4772-6f7570313233/in_use.lock.
>  If this storage directory is mounted via NFS, ensure that the appropriate 
> nfs lock services are running.
> java.io.IOException: Input/output error
> at java.io.RandomAccessFile.writeBytes(Native Method)
> at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
> at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
> at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:63)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:109)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Exception in thread "main" java.io.IOException: Input/output error
> at java.io.RandomAccessFile.writeBytes(Native Method)
> at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
> at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
> at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:63)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:109)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> It looks like the call chain does not re-try anywhere however.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-695) Improve running in the face of flakey disks

2019-09-28 Thread Clay B. (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clay B. updated RATIS-695:
--
Description: In testing with 
[Namazu|https://github.com/apache/incubator-ratis/blob/35838f032a4096d78843130fa1435bcddf5ce961/dev-support/vagrant/README.md#ratis-hdd-slowdown-vm]
 disk paths which fail in the face of {{IOException}}s are found. This 
umbrella-JIRA is to track the code paths found that need hardening. These code 
paths seem to be fatal to the Ratis server performing actions but does not 
cause the server to abort out.  (was: In testing with 
[Namazu|https://github.com/apache/incubator-ratis/blob/35838f032a4096d78843130fa1435bcddf5ce961/dev-support/vagrant/README.md#ratis-hdd-slowdown-vm]
 disk paths which fail in the face of {{IOException}}s are found. This 
umbrella-JIRA is to track the code paths found that need hardening.)

> Improve running in the face of flakey disks
> ---
>
> Key: RATIS-695
> URL: https://issues.apache.org/jira/browse/RATIS-695
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Clay B.
>Priority: Minor
>
> In testing with 
> [Namazu|https://github.com/apache/incubator-ratis/blob/35838f032a4096d78843130fa1435bcddf5ce961/dev-support/vagrant/README.md#ratis-hdd-slowdown-vm]
>  disk paths which fail in the face of {{IOException}}s are found. This 
> umbrella-JIRA is to track the code paths found that need hardening. These 
> code paths seem to be fatal to the Ratis server performing actions but does 
> not cause the server to abort out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-696) RaftStorageDirectory.getLogSegmentFiles

2019-09-28 Thread Clay B. (Jira)
Clay B. created RATIS-696:
-

 Summary: RaftStorageDirectory.getLogSegmentFiles
 Key: RATIS-696
 URL: https://issues.apache.org/jira/browse/RATIS-696
 Project: Ratis
  Issue Type: Sub-task
Reporter: Clay B.


In {{RaftStorageDirectory.getLogSegmentFiles()}} one can hang while creating a 
{{Files.newDirectoryStream}}. If one gets an {{IOException}} the server will 
simply hang at this point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-692) RaftStorageDirectory.tryLock throws a very deep re-tried IOException

2019-09-28 Thread Clay B. (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clay B. updated RATIS-692:
--
Parent: RATIS-695
Issue Type: Sub-task  (was: Bug)

> RaftStorageDirectory.tryLock throws a very deep re-tried IOException
> 
>
> Key: RATIS-692
> URL: https://issues.apache.org/jira/browse/RATIS-692
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Clay B.
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: r692_20190928.patch
>
>
> Working with our Namazu infrastructure, the first issue I hit when dialing up 
> the faulty I/O injection rate is as follows:
> {code}
> 2019-09-27 14:13:45 ERROR RaftStorageDirectory:336 - Failed to acquire lock 
> on 
> /home/vagrant/test_data/data0_slowed/64656d6f-5261-6674-4772-6f7570313233/in_use.lock.
>  If this storage directory is mounted via NFS, ensure that the appropriate 
> nfs lock services are running.
> java.io.IOException: Input/output error
> at java.io.RandomAccessFile.writeBytes(Native Method)
> at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
> at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
> at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:63)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:109)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Exception in thread "main" java.io.IOException: Input/output error
> at java.io.RandomAccessFile.writeBytes(Native Method)
> at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:327)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
> at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
> at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:100)
> at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:63)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:109)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> It looks like the call chain does not re-try anywhere however.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-695) Improve running in the face of flakey disks

2019-09-28 Thread Clay B. (Jira)
Clay B. created RATIS-695:
-

 Summary: Improve running in the face of flakey disks
 Key: RATIS-695
 URL: https://issues.apache.org/jira/browse/RATIS-695
 Project: Ratis
  Issue Type: Improvement
  Components: server
Reporter: Clay B.


In testing with 
[Namazu|https://github.com/apache/incubator-ratis/blob/35838f032a4096d78843130fa1435bcddf5ce961/dev-support/vagrant/README.md#ratis-hdd-slowdown-vm]
 disk paths which fail in the face of {{IOException}}s are found. This 
umbrella-JIRA is to track the code paths found that need hardening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-692) RaftStorageDirectory.tryLock throws a very deep re-tried IOException

2019-09-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939883#comment-16939883
 ] 

Hadoop QA commented on RATIS-692:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 38m  4s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.logservice.TestLogServiceWithNetty |
|   | ratis.logservice.TestLogServiceWithGrpc |
|   | ratis.examples.filestore.TestFileStoreWithNetty |
|   | ratis.server.simulation.TestLeaderElectionWithSimulatedRpc |
|   | ratis.server.simulation.TestRaftWithSimulatedRpc |
|   | ratis.grpc.TestRaftSnapshotWithGrpc |
|   | ratis.grpc.TestWatchRequestWithGrpc |
|   | ratis.netty.TestRaftStateMachineExceptionWithNetty |
|   | ratis.grpc.TestRaftWithGrpc |
|   | ratis.server.raftlog.TestRaftLogMetrics |
|   | ratis.grpc.TestRaftStateMachineExceptionWithGrpc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/ratis:date2019-09-28 |
| JIRA Issue | RATIS-692 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981624/r692_20190928.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux fb3bf45a51f5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / cff085c |
| maven | version: Apache Maven 3.6.2 
(40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) |
| Default Java | 1.8.0_222 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1008/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1008/testReport/ |
| Max. process+thread count | 1155 (vs. ulimit of 5000) |
| modules | C: ratis-server U: ratis-server |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1008/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RaftStorageDirectory.tryLock throws a very deep re-tried IOException
> 

[jira] [Commented] (RATIS-603) Add a logStringSupplier for RaftServerImpl to optionally print SmLogEntry on errors

2019-09-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939877#comment-16939877
 ] 

Hadoop QA commented on RATIS-603:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 47s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | ratis.grpc.TestServerRestartWithGrpc |
|   | ratis.grpc.TestWatchRequestWithGrpc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/ratis:date2019-09-28 |
| JIRA Issue | RATIS-603 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981583/RATIS-603.006.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux d0c8755bbbc7 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build@2/yetus-personality.sh
 |
| git revision | master / cff085c |
| maven | version: Apache Maven 3.6.2 
(40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) |
| Default Java | 1.8.0_222 |
| unit | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1007/artifact/out/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1007/testReport/ |
| Max. process+thread count | 1833 (vs. ulimit of 5000) |
| modules | C: ratis-server U: ratis-server |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1007/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add a logStringSupplier for RaftServerImpl to optionally print SmLogEntry on 
> errors
> ---
>
> Key: RATIS-603
> URL: https://issues.apache.org/jira/browse/RATIS-603
> Project: Ratis
>  Issue Type: New Feature
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-603.001.patch, 

[jira] [Commented] (RATIS-680) Fix LICENSE file issues

2019-09-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939872#comment-16939872
 ] 

Hadoop QA commented on RATIS-680:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} shellcheck {color} | {color:blue}  0m  
0s{color} | {color:blue} Shellcheck was not available. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
0s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  2m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/ratis:date2019-09-28 |
| JIRA Issue | RATIS-680 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981617/RATIS-680.02.patch |
| Optional Tests |  dupname  asflicense  shellcheck  shelldocs  |
| uname | Linux b6038b5e3341 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / cff085c |
| maven | version: Apache Maven 3.6.2 
(40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) |
| Max. process+thread count | 47 (vs. ulimit of 5000) |
| modules | C: . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/1006/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch, RATIS-680.02.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)