Hi Riguz, Thanks a lot for the emails! It is great that you have resolved the problem.
Tsz-Wo On Wed, Jun 29, 2022 at 2:00 AM Riguz Lee <[email protected]> wrote: > Hi, > > > After our diagnosis, we've found that there's wrong configuration in the > ceph file system layer, and also some bugs in kubernetes CSI driver. > > > Now we believe that the exception is not caused by Ratis. > > > ------------------------------ > > Riguz Lee > [email protected] > > <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Riguz+Lee&icon=https%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DWADhgK2CSPXTsMHbMwibMAg%26s%3D100%26t%3D1557476522%3Frand%3D1646105659&mail=dr%40riguz.com&code=> > > > > > Original Email > > Sender:"Riguz Lee"< [email protected] >; > > Sent Time:2022/6/27 18:28 > > To:"user"< [email protected] >; > > Subject:Ratis start failed due to "OverlappingFileLockException" > > > Hi there, > > I get an error when trying to start a raft node, which is deployed inside > a kubernetes cluster. Here's the error info: > > > Caused by: java.io.IOException: Failed to lock storage > /data/ratis-data/dynamic-service-2.dynamic-service-gcek/43dea5d8-f076-11ec-8ea0-0242ac120002. > The directory is already locked > at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.tryLock( > RaftStorageDirectoryImpl.java:236) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageDirectoryImpl. > lambda$lock$0(RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar > !/:2.3.0] > at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:166) > ~[ratis-common-2.3.0.jar!/:2.3.0] > at org.apache.ratis.util.FileUtils.attempt(FileUtils.java:40) > ~[ratis-common-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lock( > RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageDirectoryImpl. > analyzeStorage(RaftStorageDirectoryImpl.java:153) ~[ratis-server-2.3.0.jar > !/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageImpl. > analyzeAndRecoverStorage(RaftStorageImpl.java:97) ~[ratis-server-2.3.0.jar > !/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageImpl.<init>( > RaftStorageImpl.java:67) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageImpl.<init>( > RaftStorageImpl.java:52) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java: > 116) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl. > java:201) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.impl.RaftServerProxy. > lambda$newRaftServerImpl$5(RaftServerProxy.java:274) ~[ratis-server-2.3.0. > jar!/:2.3.0] > at java.util.concurrent.CompletableFuture$AsyncSupply.run( > CompletableFuture.java:1700) ~[?:?] > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1128) ~[?:?] > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:628) ~[?:?] > at java.lang.Thread.run(Thread.java:829) ~[?:?] > Caused by: java.nio.channels.OverlappingFileLockException > at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.tryLock( > RaftStorageDirectoryImpl.java:227) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageDirectoryImpl. > lambda$lock$0(RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar > !/:2.3.0] > at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:166) > ~[ratis-common-2.3.0.jar!/:2.3.0] > at org.apache.ratis.util.FileUtils.attempt(FileUtils.java:40) > ~[ratis-common-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lock( > RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageDirectoryImpl. > analyzeStorage(RaftStorageDirectoryImpl.java:153) ~[ratis-server-2.3.0.jar > !/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageImpl. > analyzeAndRecoverStorage(RaftStorageImpl.java:97) ~[ratis-server-2.3.0.jar > !/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageImpl.<init>( > RaftStorageImpl.java:67) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.storage.RaftStorageImpl.<init>( > RaftStorageImpl.java:52) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java: > 116) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl. > java:201) ~[ratis-server-2.3.0.jar!/:2.3.0] > at org.apache.ratis.server.impl.RaftServerProxy. > lambda$newRaftServerImpl$5(RaftServerProxy.java:274) ~[ratis-server-2.3.0. > jar!/:2.3.0] > at java.util.concurrent.CompletableFuture$AsyncSupply.run( > CompletableFuture.java:1700) ~[?:?] > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1128) ~[?:?] > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:628) ~[?:?] > > at java.lang.Thread.run(Thread.java:829) ~[?:?] > > > And I've tried to recreate the raft directory(by recreating the pvc) and > restart the pod, but still get the same issue. > > > Each pod has it's own data storage, there's no reason it will be locked by > two ratis process. > > So I guess it might be some kind of bug? I found a JIRA bug here: > https://issues.apache.org/jira/browse/RATIS-538, which is > > almost the same. > > > Any ideas how to fix it? > > > Thanks, > > > Riguz Lee > >
