Hi Riguz,

Thanks a lot for the emails!  It is great that you have resolved the
problem.

Tsz-Wo


On Wed, Jun 29, 2022 at 2:00 AM Riguz Lee <[email protected]> wrote:

> Hi,
>
>
> After our diagnosis, we've found that there's wrong configuration in the
> ceph file system layer, and also some bugs in kubernetes CSI driver.
>
>
> Now we believe that the exception is not caused by Ratis.
>
>
> ------------------------------
>
> Riguz Lee
> [email protected]
>
> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Riguz+Lee&icon=https%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DWADhgK2CSPXTsMHbMwibMAg%26s%3D100%26t%3D1557476522%3Frand%3D1646105659&mail=dr%40riguz.com&code=>
>
>
>
>
> Original Email
>
> Sender:"Riguz Lee"< [email protected] >;
>
> Sent Time:2022/6/27 18:28
>
> To:"user"< [email protected] >;
>
> Subject:Ratis start failed due to "OverlappingFileLockException"
>
>
> Hi there,
>
> I get an error when trying to start a raft node, which is deployed inside
> a kubernetes cluster. Here's the error info:
>
>
> Caused by: java.io.IOException: Failed to lock storage
> /data/ratis-data/dynamic-service-2.dynamic-service-gcek/43dea5d8-f076-11ec-8ea0-0242ac120002.
> The directory is already locked
>     at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.tryLock(
> RaftStorageDirectoryImpl.java:236) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.
> lambda$lock$0(RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar
> !/:2.3.0]
>     at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:166)
> ~[ratis-common-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.util.FileUtils.attempt(FileUtils.java:40)
> ~[ratis-common-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lock(
> RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.
> analyzeStorage(RaftStorageDirectoryImpl.java:153) ~[ratis-server-2.3.0.jar
> !/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageImpl.
> analyzeAndRecoverStorage(RaftStorageImpl.java:97) ~[ratis-server-2.3.0.jar
> !/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageImpl.<init>(
> RaftStorageImpl.java:67) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageImpl.<init>(
> RaftStorageImpl.java:52) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:
> 116) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.
> java:201) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.impl.RaftServerProxy.
> lambda$newRaftServerImpl$5(RaftServerProxy.java:274) ~[ratis-server-2.3.0.
> jar!/:2.3.0]
>     at java.util.concurrent.CompletableFuture$AsyncSupply.run(
> CompletableFuture.java:1700) ~[?:?]
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1128) ~[?:?]
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:628) ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]
>  Caused by: java.nio.channels.OverlappingFileLockException
>     at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.tryLock(
> RaftStorageDirectoryImpl.java:227) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.
> lambda$lock$0(RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar
> !/:2.3.0]
>     at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:166)
> ~[ratis-common-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.util.FileUtils.attempt(FileUtils.java:40)
> ~[ratis-common-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lock(
> RaftStorageDirectoryImpl.java:194) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageDirectoryImpl.
> analyzeStorage(RaftStorageDirectoryImpl.java:153) ~[ratis-server-2.3.0.jar
> !/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageImpl.
> analyzeAndRecoverStorage(RaftStorageImpl.java:97) ~[ratis-server-2.3.0.jar
> !/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageImpl.<init>(
> RaftStorageImpl.java:67) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.storage.RaftStorageImpl.<init>(
> RaftStorageImpl.java:52) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:
> 116) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.
> java:201) ~[ratis-server-2.3.0.jar!/:2.3.0]
>     at org.apache.ratis.server.impl.RaftServerProxy.
> lambda$newRaftServerImpl$5(RaftServerProxy.java:274) ~[ratis-server-2.3.0.
> jar!/:2.3.0]
>     at java.util.concurrent.CompletableFuture$AsyncSupply.run(
> CompletableFuture.java:1700) ~[?:?]
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1128) ~[?:?]
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:628) ~[?:?]
>
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]
>
>
> And I've tried to recreate the raft directory(by recreating the pvc) and
> restart the pod, but still get the same issue.
>
>
> Each pod has it's own data storage, there's no reason it will be locked by
> two ratis process.
>
> So I guess it might be some kind of bug? I found a JIRA bug here:
> https://issues.apache.org/jira/browse/RATIS-538, which is
>
> almost the same.
>
>
> Any ideas how to fix it?
>
>
> Thanks,
>
>
> Riguz Lee
>
>

Reply via email to