[ https://issues.apache.org/jira/browse/IGNITE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vyacheslav Koptilin reassigned IGNITE-21307: -------------------------------------------- Assignee: Vyacheslav Koptilin > Call failure handler in case of failure in WatchProcessor > --------------------------------------------------------- > > Key: IGNITE-21307 > URL: https://issues.apache.org/jira/browse/IGNITE-21307 > Project: Ignite > Issue Type: Improvement > Reporter: Denis Chudov > Assignee: Vyacheslav Koptilin > Priority: Major > Labels: ignite-3 > > For the linearized watch processing, we have > WatchProcessor#notificationFuture that is rewritten for each revision > processing and meta storage safe time advance. If some watch processor > completes exceptionally, this means that no further updates will be > processed, because they need the previous updates to be processed > successfully. This is implemented in futures chaining like this: > > {code:java} > notificationFuture = notificationFuture > .thenRunAsync(() -> revisionCallback.onSafeTimeAdvanced(time), > watchExecutor) > .whenComplete((ignored, e) -> { > if (e != null) { > LOG.error("Error occurred when notifying safe time advanced > callback", e); > } > }); {code} > For now, we dont have any failure handing of exceptionally completed > notification future. It leads to the endless log records with the same > exception's stack trace, caused by meta storage safe time advances: > > {code:java} > [2024-01-16T21:42:35,515][ERROR][%isot_n_0%JRaft-FSMCaller-Disruptor-metastorage-_stripe_0-0][WatchProcessor] > Error occurred when notifying safe time advanced callback > java.util.concurrent.CompletionException: > org.apache.ignite.internal.lang.IgniteInternalException: IGN-CMN-65535 > TraceId:3877e098-6a1b-4f30-88a8-a4c13411d573 Peers are not ready > [groupId=5_part_0] > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) > ~[?:?] > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: org.apache.ignite.internal.lang.IgniteInternalException: Peers are > not ready [groupId=5_part_0] > at > org.apache.ignite.internal.raft.RaftGroupServiceImpl.randomNode(RaftGroupServiceImpl.java:725) > ~[ignite-raft-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.raft.RaftGroupServiceImpl.randomNode(RaftGroupServiceImpl.java:709) > ~[ignite-raft-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.raft.RaftGroupServiceImpl.refreshLeader(RaftGroupServiceImpl.java:234) > ~[ignite-raft-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.raft.RaftGroupServiceImpl.start(RaftGroupServiceImpl.java:190) > ~[ignite-raft-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.raft.client.TopologyAwareRaftGroupService.start(TopologyAwareRaftGroupService.java:187) > ~[ignite-replicator-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.raft.client.TopologyAwareRaftGroupServiceFactory.startRaftGroupService(TopologyAwareRaftGroupServiceFactory.java:73) > ~[ignite-replicator-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.raft.Loza.startRaftGroupService(Loza.java:350) > ~[ignite-raft-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.table.distributed.TableManager.lambda$startPartitionAndStartClient$27(TableManager.java:917) > ~[ignite-table-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:827) > ~[ignite-core-9.0.127-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.table.distributed.TableManager.lambda$startPartitionAndStartClient$28(TableManager.java:913) > ~[ignite-table-9.0.127-SNAPSHOT.jar:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > ... 4 more {code} > So, the node can't operate properly and just produces tons of logs. Such > nodes should be halted. > UPD: > We decided to just add invocation of {{failureProcessor.process}} in all > places in {{org.apache.ignite.internal.metastorage.server.WatchProcessor}} > where exceptions happen, like > {code:java} > notifyWatchFuture = watchAndEvents.watch.onUpdate(event) > .whenComplete((v, e) -> { > if (e != null) { > if (e instanceof CompletionException) { > e = e.getCause(); > } > if (!(e instanceof NodeStoppingException)) { > // TODO: IGNITE-14693 Implement Meta > storage exception handling > LOG.error("Error occurred when processing > a watch event", e); > } > watchAndEvents.watch.onError(e); > } > }); > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)