As a follow-up, you can disregard what I was saying about nimbus crashing but I'm still interested in fixing these noisy errors in logs.
@Rui thanks. I did check ZK and did not see refs to the old versions in there at least? On Mon, Oct 25, 2021 at 11:31 AM Rui Abreu <[email protected]> wrote: > Hi Andrew, > > Not sure how much this helps, but in version 1.x, state was on the > following znodes: > > /$storm-znode/storms > /$storm-znode/assignments > /$storm-znode/blobstore > > > Deleting all references (with rm or deleteall, depending on Zookeeper's > version), followed by a Nimbus's rolling restart should suffice. > > On Mon, Oct 25, 2021, 18:49 Andrew Neilson <[email protected]> wrote: > >> Hi, >> >> We're running a v2.2.0 cluster with two nimbus hosts and recently noticed >> storm-nimbus on the leader is effectively in a restart loop. >> >> When I look at nimbus.log on that host it is full of log entries related >> to old versions of topologies we're running. There are the two types of >> exceptions I am seeing >> >> 1. get blob meta exception: >> >> For *topology-A *for example, we're currently on topology-A-25: >> >> 2021-10-25 13:39:51.064 o.a.s.d.n.Nimbus pool-29-thread-62 [WARN] >> Exception when getting heartbeat timeout. >> 2021-10-25 13:39:51.075 o.a.s.d.n.Nimbus pool-29-thread-16 [WARN] get >> blob meta exception. >> org.apache.storm.utils.WrappedKeyNotFoundException: >> topology-A-5-1633368551-stormjar.jar >> >> For *topology-B*, we're on topology-B-24: >> >> 2021-10-25 13:38:51.106 o.a.s.d.n.Nimbus pool-29-thread-21 [WARN] get >> blob meta exception. >> org.apache.storm.utils.WrappedKeyNotFoundException: >> topology-B-11-1632770137-stormcode.ser >> >> 2. Send HB exception: >> >> 2021-10-25 13:39:51.745 o.a.s.d.n.Nimbus pool-29-thread-36 [WARN] >> Exception when getting heartbeat timeout. >> 2021-10-25 13:39:51.760 o.a.s.d.n.Nimbus pool-29-thread-37 [WARN] Send HB >> exception. (topology id='topology-A-10-1632769783') >> org.apache.storm.utils.WrappedNotAliveException: topology-A-10-1632769783 >> >> This seems isolated to two versions of "topology-A" and one version of >> "topology-B". >> >> I'm not seeing references to these topology versions in Zookeeper. Does >> anyone know how to safely clear out this old state? If not, any suggestions >> on how to debug this? Further, is this related to any known bug? >> >> Thanks, >> Andrew >> >
