[Contd..] BTW I'm running storm 2.1.0 on OpenJDK11 and I'm able to reproduce this issue every time during pacemaker rolling bounce. Today I downgraded to storm 2.0.0 and ran into the same issue. Did anyone face the same issue? Let me know if you need more details.
On Sun, Nov 17, 2019 at 7:48 PM Sharath Raghavan <[email protected]> wrote: > Hello everyone, > I'm working on upgrading storm from 1.2.1 to 2.1.0. While performing some > fault tolerance testing, I noticed a weird behavior. > > *Scenario:* > I submitted topology and everything works fine. Now, if I bounce pacemaker > server - where the pacemaker client is connected to - it is fails to > heartbeat and gets stuck in a retry loop forever. > > From what I understand - pacemaker server is receiving SEND_PULSE messages > and responds well which is then received by client (SEND_PULSE_RESPONSE) > for the right message_id. However the client fails while looking up message > sent previously. Log says - "No message for slot: <message_id>". > > Any idea why this could be? > > Detailed logs below - > 2019-11-18 03:04:19,023+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:159 - Sending pacemaker message to host0979.com: > HBMessage(type:SEND_PULSE, data:<HBMessageData > pulse:HBPulse(id:/workerbeats/if-usmsc-streams-p1-19_7-storm2-SNAPSHOT-3-1574046188/c40d6ed2-c0f7-414b-8452-6af03036caef-<ip>-6700, > details:1F 8B 08 00 00 00 00 00 00 00 E5 5D 0B 7C 96 B5 D5 0F 50 04 A1 A5 > 17 0A 82 C2 A8 80 DC 0B 45 CA 55 11 54 2E 0A 8A 5C 3A 01 07 02 52 44 94 4B > A1 45 C1 21 A2 02 DE 50 51 D0 79 C1 FD EA 74 8A 8A 13 1D 2A 6E BA 5F 99 A8 > 38 71 63 EA 26 4A 51 E6 07 53 3F DD 07 9B B8 39 71 FB BE F2 A4 4F C2 DB E4 > 9C E4 24 EF DE 2D EF D7 1F 6E 6E E7 FF 27 49 93 7F 9E 93 9C E4 A4 29 AB C7 > 18 EB 3B E7 D2 C2...)>) > 2019-11-18 03:04:19,024+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:165 - Put message in slot: 1 for host0979.com > 2019-11-18 03:04:19,028+0000 [host0979.com-pm-1] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:1) > 2019-11-18 03:04:19,028+0000 [host0979.com-pm-1] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:1) > 2019-11-18 03:04:19,028+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:179 - Got Response: HBMessage(type:SEND_PULSE_RESPONSE, > data:null, message_id:1) > > 2019-11-18 03:05:19,041+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:159 - Sending pacemaker message to host0979.com: > HBMessage(type:SEND_PULSE, data:<HBMessageData > pulse:HBPulse(id:/workerbeats/if-usmsc-streams-p1-19_7-storm2-SNAPSHOT-3-1574046188/c40d6ed2-c0f7-414b-8452-6af03036caef-<ip>-6700, > details:1F 8B 08 00 00 00 00 00 00 00 E5 9D 09 98 56 C5 99 EF 5F 36 6D 43 > A3 88 A8 08 74 D3 3B 2D C8 BE 35 8A 01 14 15 E2 86 82 0A 04 04 64 8F EC B4 > BB D1 06 41 8D 41 41 24 88 09 3E 83 A6 EF C4 05 13 50 16 31 10 35 B6 19 46 > A3 62 42 AE A2 E8 70 91 64 D0 90 E8 B8 E4 D1 C8 24 43 9F E2 54 F1 7D 55 EF > 5B F5 56 9D 7C 93 D3 D7 49 66 BC B7 FE FF AE 53 5F D5 AF 4E AD EF 69 0E 8D > 00 A0 CF 8C A9 5D...)>) > 2019-11-18 03:05:19,041+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:165 - Put message in slot: 2 for host0979.com > 2019-11-18 03:05:19,044+0000 [host0979.com-pm-1] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:2) > 2019-11-18 03:05:19,044+0000 [host0979.com-pm-1] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:2) > 2019-11-18 03:05:19,044+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:179 - Got Response: HBMessage(type:SEND_PULSE_RESPONSE, > data:null, message_id:2) > > > > *------- <Bounced pacemaker server @ host0979.com <http://host0979.com>> > -------* > > 2019-11-18 03:06:19,054+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:159 - Sending pacemaker message to host0979.com: > HBMessage(type:SEND_PULSE, data:<HBMessageData > pulse:HBPulse(id:/workerbeats/if-usmsc-streams-p1-19_7-storm2-SNAPSHOT-3-1574046188/c40d6ed2-c0f7-414b-8452-6af03036caef-<ip>-6700, > details:1F 8B 08 00 00 00 00 00 00 00 ED 7D 0B 94 56 C5 95 EE A6 01 45 40 > 40 A3 23 30 18 10 45 50 FA F1 F7 FB 41 2B 60 30 20 22 0F 41 45 14 BA 91 97 > 08 34 84 97 20 2A 18 11 11 05 89 BC BA D5 44 54 12 8C 90 2C B8 A2 92 51 07 > 34 4C 24 2B 4C 02 09 C9 90 84 49 48 F4 5E 88 71 46 46 E5 C6 18 35 B7 39 C5 > A9 E2 FF AB F6 AE DA 75 CE FC 97 D3 F7 F6 02 69 D7 FE BE 53 A7 4E D5 57 67 > 9F AA 5D BB DA 40...)>) > > *2019-11-18 03:06:19,055+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:165 - Put message in slot: 3 for host0979.com > <http://host0979.com>*2019-11-18 03:06:20,056+0000 > [executor-heartbeat-timer] WARN PacemakerClient:192 - Not getting > response or getting null response. Making 9 more attempts for host0979.com > . > 2019-11-18 03:06:21,056+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 8 more attempts for host0979.com. > 2019-11-18 03:06:22,057+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 7 more attempts for host0979.com. > 2019-11-18 03:06:23,057+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 6 more attempts for host0979.com. > 2019-11-18 03:06:24,057+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 5 more attempts for host0979.com. > 2019-11-18 03:06:25,058+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 4 more attempts for host0979.com. > 2019-11-18 03:06:26,058+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 3 more attempts for host0979.com. > 2019-11-18 03:06:27,059+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 2 more attempts for host0979.com. > 2019-11-18 03:06:28,059+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 1 more attempts for host0979.com. > 2019-11-18 03:06:29,059+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 0 more attempts for host0979.com. > 2019-11-18 03:06:32,275+0000 [executor-heartbeat-timer] ERROR > rejectedExecution:770 - Failed to submit a listener notification task. > Event loop shut down? > java.util.concurrent.RejectedExecutionException: event executor terminated > at > org.apache.storm.shade.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:855) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:328) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:321) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:778) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:768) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:432) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:112) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:89) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1010) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:610) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:465) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:1003) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.AbstractChannel.close(AbstractChannel.java:238) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.pacemaker.PacemakerClient.close_channel(PacemakerClient.java:260) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.pacemaker.PacemakerClient.close(PacemakerClient.java:267) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.pacemaker.PacemakerClientPool.rotateClients(PacemakerClientPool.java:92) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.pacemaker.PacemakerClientPool.send(PacemakerClientPool.java:54) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.cluster.PaceMakerStateStorage.set_worker_hb(PaceMakerStateStorage.java:127) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.cluster.StormClusterStateImpl.workerHeartbeat(StormClusterStateImpl.java:509) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.daemon.worker.Worker.doExecutorHeartbeats(Worker.java:372) > ~[storm-client-2.1.0.jar:2.1.0] > at org.apache.storm.StormTimer$1.run(StormTimer.java:110) > [storm-client-2.1.0.jar:2.1.0] > at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:226) > [storm-client-2.1.0.jar:2.1.0] > 2019-11-18 03:06:32,276+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:261 - channel host0979.com/<ip>:6699 closed > 2019-11-18 03:06:32,276+0000 [executor-heartbeat-timer] ERROR > PaceMakerStateStorage:138 - couldn't get response after 10 attempts. Failed > to set_worker_hb. Will make 9 more attempts. > 2019-11-18 03:06:32,278+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:159 - Sending pacemaker message to host0977.com: > HBMessage(type:SEND_PULSE, data:<HBMessageData > pulse:HBPulse(id:/workerbeats/if-usmsc-streams-p1-19_7-storm2-SNAPSHOT-3-1574046188/c40d6ed2-c0f7-414b-8452-6af03036caef-<ip>-6700, > details:1F 8B 08 00 00 00 00 00 00 00 ED 7D 0B 94 56 C5 95 EE A6 01 45 40 > 40 A3 23 30 18 10 45 50 FA F1 F7 FB 41 2B 60 30 20 22 0F 41 45 14 BA 91 97 > 08 34 84 97 20 2A 18 11 11 05 89 BC BA D5 44 54 12 8C 90 2C B8 A2 92 51 07 > 34 4C 24 2B 4C 02 09 C9 90 84 49 48 F4 5E 88 71 46 46 E5 C6 18 35 B7 39 C5 > A9 E2 FF AB F6 AE DA 75 CE FC 97 D3 F7 F6 02 69 D7 FE BE 53 A7 4E D5 57 67 > 9F AA 5D BB DA 40...)>) > 2019-11-18 03:06:32,280+0000 [host0977.com-pm-1] DEBUG > PacemakerClient:143 - Channel is ready: [id: 0xa4d6db58] > > *2019-11-18 03:06:32,282+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:165 - Put message in slot: 0 for host0977.com > <http://host0977.com>*2019-11-18 03:06:32,294+0000 [host0977.com-pm-1] > ERROR PacemakerClientHandler:60 - Exception occurred in Pacemaker. > java.nio.channels.NotYetConnectedException: null > at > org.apache.storm.shade.io.netty.channel.AbstractChannel$AbstractUnsafe.flush0()(Unknown > Source) ~[storm-shaded-deps-2.1.0.jar:2.1.0] > 2019-11-18 03:06:32,296+0000 [host0977.com-pm-1] INFO > PacemakerClientHandler:37 - Connection established from /<ip>:58206 to > host0977.com/<ip>:6699 > 2019-11-18 03:06:32,397+0000 [Timer-0] INFO PacemakerClient:246 - > reconnecting to host0977.com > 2019-11-18 03:06:32,397+0000 [Timer-0] DEBUG PacemakerClient:261 - > channel host0977.com/<ip>:6699 closed > 2019-11-18 03:06:32,404+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:143 - Channel is ready: [id: 0xcf170c9f] > 2019-11-18 03:06:32,409+0000 [host0977.com-pm-2] INFO > PacemakerClientHandler:37 - Connection established from /<ip>:58212 to > host0977.com/<ip>:6699 > 2019-11-18 03:06:33,284+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 9 more attempts for host0977.com. > 2019-11-18 03:06:33,305+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:33,306+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > > *2019-11-18 03:06:33,306+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0*2019-11-18 03:06:34,284+0000 > [executor-heartbeat-timer] WARN PacemakerClient:192 - Not getting > response or getting null response. Making 8 more attempts for host0977.com > . > 2019-11-18 03:06:34,285+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:34,286+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:34,286+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:35,284+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 7 more attempts for host0977.com. > 2019-11-18 03:06:35,286+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:35,286+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:35,286+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:36,285+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 6 more attempts for host0977.com. > 2019-11-18 03:06:36,287+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:36,287+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:36,287+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:37,285+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 5 more attempts for host0977.com. > 2019-11-18 03:06:37,287+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:37,287+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:37,287+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:38,286+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 4 more attempts for host0977.com. > 2019-11-18 03:06:38,288+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:38,288+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:38,288+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:39,286+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 3 more attempts for host0977.com. > 2019-11-18 03:06:39,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:39,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:39,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:40,287+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 2 more attempts for host0977.com. > 2019-11-18 03:06:40,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:40,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:40,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:41,288+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 1 more attempts for host0977.com. > 2019-11-18 03:06:41,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:41,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:41,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:42,288+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 0 more attempts for host0977.com. > 2019-11-18 03:06:42,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:42,289+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:42,290+0000 [host0977.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > 2019-11-18 03:06:45,527+0000 [executor-heartbeat-timer] ERROR > rejectedExecution:770 - Failed to submit a listener notification task. > Event loop shut down? > java.util.concurrent.RejectedExecutionException: event executor terminated > at > org.apache.storm.shade.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:855) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:328) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:321) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:778) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:768) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:432) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:112) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:89) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1010) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:610) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:465) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:1003) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.shade.io.netty.channel.AbstractChannel.close(AbstractChannel.java:238) > ~[storm-shaded-deps-2.1.0.jar:2.1.0] > at > org.apache.storm.pacemaker.PacemakerClient.close_channel(PacemakerClient.java:260) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.pacemaker.PacemakerClient.close(PacemakerClient.java:267) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.pacemaker.PacemakerClientPool.rotateClients(PacemakerClientPool.java:92) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.pacemaker.PacemakerClientPool.send(PacemakerClientPool.java:54) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.cluster.PaceMakerStateStorage.set_worker_hb(PaceMakerStateStorage.java:127) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.cluster.StormClusterStateImpl.workerHeartbeat(StormClusterStateImpl.java:509) > ~[storm-client-2.1.0.jar:2.1.0] > at > org.apache.storm.daemon.worker.Worker.doExecutorHeartbeats(Worker.java:372) > ~[storm-client-2.1.0.jar:2.1.0] > at org.apache.storm.StormTimer$1.run(StormTimer.java:110) > [storm-client-2.1.0.jar:2.1.0] > at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:226) > [storm-client-2.1.0.jar:2.1.0] > 2019-11-18 03:06:45,528+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:261 - channel host0977.com/<ip>:6699 closed > 2019-11-18 03:06:45,529+0000 [executor-heartbeat-timer] ERROR > PaceMakerStateStorage:138 - couldn't get response after 10 attempts. Failed > to set_worker_hb. Will make 8 more attempts. > 2019-11-18 03:06:45,532+0000 [host0978.com-pm-1] DEBUG > PacemakerClient:143 - Channel is ready: [id: 0x94bfcb44] > 2019-11-18 03:06:45,530+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:159 - Sending pacemaker message to host0978.com: > HBMessage(type:SEND_PULSE, data:<HBMessageData > pulse:HBPulse(id:/workerbeats/if-usmsc-streams-p1-19_7-storm2-SNAPSHOT-3-1574046188/c40d6ed2-c0f7-414b-8452-6af03036caef-<ip>-6700, > details:1F 8B 08 00 00 00 00 00 00 00 ED 7D 0B 94 56 C5 95 EE A6 01 45 40 > 40 A3 23 30 18 10 45 50 FA F1 F7 FB 41 2B 60 30 20 22 0F 41 45 14 BA 91 97 > 08 34 84 97 20 2A 18 11 11 05 89 BC BA D5 44 54 12 8C 90 2C B8 A2 92 51 07 > 34 4C 24 2B 4C 02 09 C9 90 84 49 48 F4 5E 88 71 46 46 E5 C6 18 35 B7 39 C5 > A9 E2 FF AB F6 AE DA 75 CE FC 97 D3 F7 F6 02 69 D7 FE BE 53 A7 4E D5 57 67 > 9F AA 5D BB DA 40...)>) > 2019-11-18 03:06:45,535+0000 [executor-heartbeat-timer] DEBUG > PacemakerClient:165 - Put message in slot: 0 for host0978.com > 2019-11-18 03:06:45,536+0000 [host0978.com-pm-1] ERROR > PacemakerClientHandler:60 - Exception occurred in Pacemaker. > java.nio.channels.NotYetConnectedException: null > at > org.apache.storm.shade.io.netty.channel.AbstractChannel$AbstractUnsafe.flush0()(Unknown > Source) ~[storm-shaded-deps-2.1.0.jar:2.1.0] > 2019-11-18 03:06:45,537+0000 [host0978.com-pm-1] INFO > PacemakerClientHandler:37 - Connection established from /<ip>:49534 to > host0978.com/<ip>:6699 > 2019-11-18 03:06:45,637+0000 [Timer-0] INFO PacemakerClient:246 - > reconnecting to host0978.com > 2019-11-18 03:06:45,638+0000 [Timer-0] DEBUG PacemakerClient:261 - > channel host0978.com/<ip>:6699 closed > 2019-11-18 03:06:45,644+0000 [host0978.com-pm-2] DEBUG > PacemakerClient:143 - Channel is ready: [id: 0x58e0cd56] > 2019-11-18 03:06:45,649+0000 [host0978.com-pm-2] INFO > PacemakerClientHandler:37 - Connection established from /<ip>:49536 to > host0978.com/<ip>:6699 > 2019-11-18 03:06:46,535+0000 [executor-heartbeat-timer] WARN > PacemakerClient:192 - Not getting response or getting null response. > Making 9 more attempts for host0978.com. > 2019-11-18 03:06:46,556+0000 [host0978.com-pm-2] DEBUG > PacemakerClientHandler:43 - Got Message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:46,556+0000 [host0978.com-pm-2] DEBUG > PacemakerClient:216 - Pacemaker client got message: > HBMessage(type:SEND_PULSE_RESPONSE, data:null, message_id:0) > 2019-11-18 03:06:46,556+0000 [host0978.com-pm-2] DEBUG > PacemakerClient:220 - No message for slot: 0 > > > > -- > Thanks > *Sharath* > -- -- Thanks *Sharath*
