761417898 opened a new pull request, #15676:
URL: https://github.com/apache/iotdb/pull/15676

   ## Description  
   
   ### Key Changes  
   - **Added disk directory failure detection and recovery**:  
     - When a disk directory is found to be inaccessible during initial access, 
the system now reports the directory as abnormal and automatically attempts to 
fetch a new available directory.  
     - This resolves the issue of single-disk directory failures in multi-disk 
environments.  
   
   ### Verification  
   - **Tested in a 3N3C environment** (refer to [Feishu 
Doc](https://timechor.feishu.cn/docx/Cgi1dMLhfovBs9xqK0dc1P0VnVe)).  
   - **Behavior**:  
     - IoTDB now logs directory access failures (see example logs below) and 
successfully retries with a healthy directory.  
   
   ### Example Log Output  
   ```plaintext
   (Now changed to warning-level logging)2025-06-09 10:13:34,500 
[pool-33-IoTDB-DataNodeInternalRPC-Processor-24] ERROR  
o.a.i.d.s.d.w.a.AbstractNodeAllocationStrategy:72 - Meet exception when 
creating wal node 
   java.io.FileNotFoundException: 
/root/apache-iotdb-2.0.4-SNAPSHOT-all-bin/data/datanode/wal3/root.db1.g_0-7/_0.checkpoint
 (No such file or directory)
           at java.io.FileOutputStream.open0(Native Method)
           at java.io.FileOutputStream.open(FileOutputStream.java:270)
           at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
           at 
org.apache.iotdb.db.storageengine.dataregion.wal.io.LogWriter.<init>(LogWriter.java:70)
           at 
org.apache.iotdb.db.storageengine.dataregion.wal.io.CheckpointWriter.<init>(CheckpointWriter.java:30)
           at 
org.apache.iotdb.db.storageengine.dataregion.wal.checkpoint.CheckpointManager.<init>(CheckpointManager.java:90)
           at 
org.apache.iotdb.db.storageengine.dataregion.wal.node.WALNode.<init>(WALNode.java:129)
           at 
org.apache.iotdb.db.storageengine.dataregion.wal.node.WALNode.<init>(WALNode.java:118)
           at 
org.apache.iotdb.db.storageengine.dataregion.wal.allocation.AbstractNodeAllocationStrategy.createWALNode(AbstractNodeAllocationStrategy.java:70)
           at 
org.apache.iotdb.db.storageengine.dataregion.wal.allocation.FirstCreateStrategy.applyForWALNode(FirstCreateStrategy.java:56)
           at 
org.apache.iotdb.db.storageengine.dataregion.wal.WALManager.applyForWALNode(WALManager.java:100)
           at 
org.apache.iotdb.db.storageengine.dataregion.DataRegion.getWALNode(DataRegion.java:3840)
           at 
org.apache.iotdb.db.consensus.statemachine.dataregion.DataRegionStateMachine.read(DataRegionStateMachine.java:242)
           at 
org.apache.iotdb.consensus.iot.IoTConsensusServerImpl.<init>(IoTConsensusServerImpl.java:150)
           at 
org.apache.iotdb.consensus.iot.IoTConsensus.lambda$createLocalPeer$8(IoTConsensus.java:286)
           at 
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
           at 
org.apache.iotdb.consensus.iot.IoTConsensus.createLocalPeer(IoTConsensus.java:269)
           at 
org.apache.iotdb.db.protocol.thrift.impl.DataNodeRegionManager.createDataRegion(DataNodeRegionManager.java:157)
           at 
org.apache.iotdb.db.protocol.thrift.impl.DataNodeInternalRPCServiceImpl.createDataRegion(DataNodeInternalRPCServiceImpl.java:562)
           at 
org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$createDataRegion.getResult(IDataNodeRPCService.java:6511)
           at 
org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$createDataRegion.getResult(IDataNodeRPCService.java:6491)
           at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
           at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
           at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   2025-06-09 10:13:34,735 [pool-33-IoTDB-DataNodeInternalRPC-Processor-12] 
WARN  o.a.i.d.s.d.t.g.TsFileNameGenerator:120 - Failed to process folder 
[tierLevel=0, sequence=true, 
baseDir=/root/apache-iotdb-2.0.4-SNAPSHOT-all-bin/data/datanode/data3/sequence],
 state set to ABNORMAL 
   2025-06-09 10:13:34,735 [pool-33-IoTDB-DataNodeInternalRPC-Processor-13] 
WARN  o.a.i.d.s.d.t.g.TsFileNameGenerator:120 - Failed to process folder 
[tierLevel=0, sequence=true, 
baseDir=/root/apache-iotdb-2.0.4-SNAPSHOT-all-bin/data/datanode/data3/sequence],
 state set to ABNORMAL 
   2025-06-09 10:13:34,735 [pool-33-IoTDB-DataNodeInternalRPC-Processor-7] WARN 
 o.a.i.d.s.d.t.g.TsFileNameGenerator:120 - Failed to process folder 
[tierLevel=0, sequence=true, 
baseDir=/root/apache-iotdb-2.0.4-SNAPSHOT-all-bin/data/datanode/data3/sequence],
 state set to ABNORMAL 
   (​Now adjusted to: {} is above the warning threshold, or not accessible, 
free space {}, total space {})2025-06-09 10:13:35,215 
[pool-24-IoTDB-IoTConsensusRPC-Processor-2] WARN  
o.a.i.d.s.r.d.s.SequenceStrategy:70 - 
/root/apache-iotdb-2.0.4-SNAPSHOT-all-bin/data/datanode/data3/sequence is above 
the warning threshold, free space 118933000192, total space243640324096 
   2025-06-09 10:13:55,833 [pool-24-IoTDB-IoTConsensusRPC-Processor-7] WARN  
o.a.i.d.s.d.t.g.TsFileNameGenerator:120 - Failed to process folder 
[tierLevel=0, sequence=false, 
baseDir=/root/apache-iotdb-2.0.4-SNAPSHOT-all-bin/data/datanode/data3/unsequence],
 state set to ABNORMAL
   ```  
   
   ---  
   
   ### Key Changed Classes/Packages  
   - `FolderManager` (core logic for directory failover)  
   - Also includes the relevant code that calls 
`org.apache.iotdb.db.storageengine.rescon.disk.FolderManager#getNextFolder`
   
   ---  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to