[
https://issues.apache.org/jira/browse/HDDS-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099505#comment-17099505
]
Mukul Kumar Singh commented on HDDS-3382:
-----------------------------------------
[~hanishakoneru], I think this issue will happen even without a OM restart. The
root cause I feel is inside Ratis, there is an assumption that there can be
only one open segment at any time.
When the writer ingress is at a high rate, then multiple segment files are
rolled over as part of rollsegment. However as segmentedRaftlogWorker is a
single thread, multiple renames can be pending at any time on this thread.
if this segment is evicted from the segment cache, the applyTransaction thread
tries to read the segment, the file is still left to renamed, and hence the
read fails with FNFE.
> OzoneManager fails to apply log index because of FNFE
> -----------------------------------------------------
>
> Key: HDDS-3382
> URL: https://issues.apache.org/jira/browse/HDDS-3382
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Manager
> Reporter: Mukul Kumar Singh
> Priority: Major
> Labels: MiniOzoneChaosCluster
> Attachments: log.zip
>
>
> OzoneManager fails to apply log index because of FNFE
> It fails because of the following exception.
> {code}
> 2020-04-12 21:58:41,019 [omNode-1@group-D62218D261DE-SegmentedRaftLogWorker]
> INFO segmented.SegmentedRaftLogWorker
> (SegmentedRaftLogWorker.java:execute(541)) -
> omNode-1@group-D62218D261DE-SegmentedRaftLogWorker:
> Rolled log segment from
> /tmp/chaos-2020-04-12-21-57-56-IST/MiniOzoneClusterImpl-e8cfabca-aa87-41d3-91fd-5530e06ac6ad/omNode-1/ratis/b870c9eb-edfb-36b5-b758-d62218d261de/current/log_inprogress_20626
> to /tmp/chaos-2
> 020-04-12-21-57-56-IST/MiniOzoneClusterImpl-e8cfabca-aa87-41d3-91fd-5530e06ac6ad/omNode-1/ratis/b870c9eb-edfb-36b5-b758-d62218d261de/current/log_20626-20675
> 2020-04-12 21:58:41,019 [omNode-1@group-D62218D261DE-StateMachineUpdater]
> ERROR segmented.SegmentedRaftLogInputStream
> (SegmentedRaftLogInputStream.java:nextEntry(122)) - caught exception
> initializing log_20626-206
> 75
> java.io.FileNotFoundException:
> /tmp/chaos-2020-04-12-21-57-56-IST/MiniOzoneClusterImpl-e8cfabca-aa87-41d3-91fd-5530e06ac6ad/omNode-1/ratis/b870c9eb-edfb-36b5-b758-d62218d261de/current/log_20626-20675
> (No such file
> or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.<init>(SegmentedRaftLogReader.java:140)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.init(SegmentedRaftLogInputStream.java:94)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:120)
> at
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:98)
> at
> org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:202)
> at
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:309)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:292)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:219)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:168)
> at java.lang.Thread.run(Thread.java:748)
> 2020-04-12 21:58:41,020 [pool-105-thread-2] INFO utils.LoadBucket
> (LoadBucket.java:execute(135)) - Going to opType=Filesystem:DirectoryOp
> keyName=DirectoryLoadGenerator_694945963 :mkdir
> 2020-04-12 21:58:41,018 [OMDoubleBufferFlushThread] INFO
> file.OMDirectoryCreateResponse
> (OMDirectoryCreateResponse.java:addToDBBatch(66)) -
> resp:DirectoryLoadGenerator_2101090747/
> 2020-04-12 21:58:41,018 [grpc-default-executor-2] INFO
> segmented.SegmentedRaftLogWorker
> (SegmentedRaftLogWorker.java:rollLogSegment(396)) -
> omNode-2@group-D62218D261DE-SegmentedRaftLogWorker: Rolling segment log-
> 20626_20675 to index:20675
> 2020-04-12 21:58:41,018 [OMDoubleBufferFlushThread] INFO
> file.OMDirectoryCreateResponse
> (OMDirectoryCreateResponse.java:addToDBBatch(66)) -
> resp:DirectoryLoadGenerator_1172041571/
> 2020-04-12 21:58:41,020 [IPC Server handler 13 on 13988] INFO
> file.OMDirectoryCreateRequest (OMDirectoryCreateRequest.java:<init>(104)) -
> req:DirectoryLoadGenerator_694945963
> 2020-04-12 21:58:41,020 [OMDoubleBufferFlushThread] INFO
> file.OMDirectoryCreateResponse
> (OMDirectoryCreateResponse.java:addToDBBatch(66)) -
> resp:DirectoryLoadGenerator_1868267720/
> 2020-04-12 21:58:41,020 [omNode-1@group-D62218D261DE-SegmentedRaftLogWorker]
> INFO segmented.SegmentedRaftLogWorker
> (SegmentedRaftLogWorker.java:execute(583)) -
> omNode-1@group-D62218D261DE-SegmentedRaftLogWorker:
> created new log segment
> /tmp/chaos-2020-04-12-21-57-56-IST/MiniOzoneClusterImpl-e8cfabca-aa87-41d3-91fd-5530e06ac6ad/omNode-1/ratis/b870c9eb-edfb-36b5-b758-d62218d261de/current/log_inprogress_20676
> 2020-04-12 21:58:41,020 [omNode-1@group-D62218D261DE-StateMachineUpdater]
> ERROR impl.StateMachineUpdater (StateMachineUpdater.java:run(185)) -
> omNode-1@group-D62218D261DE-StateMachineUpdater: the StateMachineUpdat
> er hits Throwable
> org.apache.ratis.server.raftlog.RaftLogIOException:
> java.io.FileNotFoundException:
> /tmp/chaos-2020-04-12-21-57-56-IST/MiniOzoneClusterImpl-e8cfabca-aa87-41d3-91fd-5530e06ac6ad/omNode-1/ratis/b870c9eb-edfb-36b5-b75
> 8-d62218d261de/current/log_20626-20675 (No such file or directory)
> at
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:311)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:292)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:219)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:168)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException:
> /tmp/chaos-2020-04-12-21-57-56-IST/MiniOzoneClusterImpl-e8cfabca-aa87-41d3-91fd-5530e06ac6ad/omNode-1/ratis/b870c9eb-edfb-36b5-b758-d62218d261de/current/log_20626-20675
> (N
> o such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.<init>(SegmentedRaftLogReader.java:140)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.init(SegmentedRaftLogInputStream.java:94)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:120)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]