JackieTien97 opened a new pull request, #17794:
URL: https://github.com/apache/iotdb/pull/17794

   ## Problem
   
   Per [the root-cause 
analysis](https://timechor.feishu.cn/docx/UPU1dVSN8ocBNDx27c8cWnaLnYc): 
`FragmentInstanceDispatcherImpl.dispatchRemote` retries the **same** 
`FragmentInstance` once after a `TException`. A `TException` only means the 
client didn't receive the response — the server may have already executed the 
FI. After the first execution finishes it runs `releaseResource()` (`dataRegion 
= null`) but its `FragmentInstanceContext` stays cached in 
`FragmentInstanceManager.instanceContext` (~5 min) while `instanceExecution` is 
removed. The retry hits `instanceContext.computeIfAbsent`, **reuses the 
released context**, and a fresh (ALIVE) driver dereferences the null 
`dataRegion` in `init()` → **NPE**. The single-execution guards don't help 
because this is cross-execution reuse.
   
   ## Changes
   
   - **`TSStatusCode`**: add `REPEATED_RPC_CALL(723)` (intentionally not in 
`NEED_RETRY`).
   - **`FragmentInstanceManager`** (data + schema paths): when 
`instanceContext.computeIfAbsent` would reuse an existing context for the same 
`instanceId`, throw `IoTDBRuntimeException(REPEATED_RPC_CALL)` **before** the 
planning `try` block — so it propagates up cleanly without invoking 
`clearFIRelatedResources`/`createFailedInstanceInfo` on the first execution's 
cached resources.
   - **`RegionReadExecutor`**: in both `catch` blocks, carry an 
`IoTDBRuntimeException`'s status code back so `REPEATED_RPC_CALL` reaches the 
dispatcher (instead of being downgraded to `EXECUTE_STATEMENT_ERROR`); 
`needRetryHelper` keeps it non-retryable.
   - **`FragmentInstanceDispatcherImpl`**: before retrying 
`dispatchRemoteHelper`, if the query has already timed out, fail fast with a 
`QUERY_TIMEOUT` status wrapped in `FragmentInstanceDispatchException` instead 
of re-dispatching.
   - **`ErrorHandlingUtils`**: map `QueryTimeoutRuntimeException` to 
`QUERY_TIMEOUT`.
   
   ## Test
   
   - New `RegionReadExecutorTest#testRepeatedRpcCall` covers both the 
consensus-read and VirtualDataRegion paths, asserting the response carries 
`REPEATED_RPC_CALL` and `readNeedRetry == false`.
   - `mvn test -pl iotdb-core/datanode -Dtest=RegionReadExecutorTest` → 6 
passed.
   - `mvn compile -pl iotdb-core/datanode` (incl. spotless:check) → BUILD 
SUCCESS.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to