> > Thanks Tung for your response. I verified the kernel logs in our setup. > There is no memory error as in your case. > > I see the below pattern though. > 1. There are multiple local publication related messages. But TIPC is not > entering the infinite loop immediately. > Jul 16 13:23:31 in-debbld39-pd kernel: [90527.681211] Failed to > remove local publication {2,1359237139,1359237139}/0 > 2. There are some errors of illegal FSM even after which the infinite loop > is triggered. > Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396954] Illegal FSM > event fa110ede in state d0000 on link > 13500451:INTERCC_BEARER-135004a1:INTERCC_BEARER > Not sure if the above error has some relation with hitting "Unable to > remove publication from failed node" infinite loop issue. > > > From kernel.log file - > > /**************************************************************************************************************************************************************************************************************************************/ > Jul 16 13:23:31 in-debbld39-pd kernel: [90527.681211] Failed to remove > local publication {2,1359237139,1359237139}/0 > Jul 16 13:23:37 in-debbld39-pd kernel: [90533.911632] [UFW BLOCK] IN=eno1 > OUT= MAC=5c:b9:01:fe:f6:d0:40:a8:f0:3b:ce:40:08:00 SRC=10.220.82.25 > DST=10.220.82.39 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=45831 DF PROTO=TCP > SPT=46905 DPT=111 WINDOW=64240 RES=0x00 SYN URGP=0 > Jul 16 13:23:37 in-debbld39-pd kernel: [90533.913988] [UFW BLOCK] IN=eno1 > OUT= MAC=5c:b9:01:fe:f6:d0:d4:f5:ef:70:c9:10:08:00 SRC=10.220.82.110 > DST=10.220.82.39 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=64852 DF PROTO=TCP > SPT=48891 DPT=111 WINDOW=64240 RES=0x00 SYN URGP=0 > Jul 16 13:23:37 in-debbld39-pd kernel: [90533.915163] [UFW BLOCK] IN=eno1 > OUT= MAC=5c:b9:01:fe:f6:d0:ec:b1:d7:83:83:90:08:00 SRC=10.220.82.28 > DST=10.220.82.39 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27835 DF PROTO=TCP > SPT=60415 DPT=111 WINDOW=64240 RES=0x00 SYN URGP=0 > Jul 16 13:23:41 in-debbld39-pd kernel: [90537.913017] Failed to remove > local publication {2,1359237139,1359237139}/0 > Jul 16 13:23:42 in-debbld39-pd kernel: [90539.299673] ACPI Error: > SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found > length 32 (20180810/exfield-393) > Jul 16 13:23:42 in-debbld39-pd kernel: [90539.299684] ACPI Error: Method > parse/execution failed \_SB.PMI0._PMM, AE_AML_BUFFER_LIMIT > (20180810/psparse-516) > Jul 16 13:23:42 in-debbld39-pd kernel: [90539.299695] ACPI Error: > AE_AML_BUFFER_LIMIT, Evaluating _PMM (20180810/power_meter-339) > Jul 16 13:23:48 in-debbld39-pd kernel: [90544.825004] Failed to remove > local publication {2,1359237139,1359237139}/0 > Jul 16 13:23:51 in-debbld39-pd kernel: [90547.993176] Failed to remove > local publication {2,1359237139,1359237139}/0 > Jul 16 13:23:53 in-debbld39-pd kernel: [90549.816973] Failed to remove > local publication {2,1359237139,1359237139}/0 > Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396954] Illegal FSM event > fa110ede in state d0000 on link > 13500451:INTERCC_BEARER-135004a1:INTERCC_BEARER > Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396971] Unable to remove > publication from failed node > Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396971] (type=0, > lower=2701414419, node=0xa1045013, port=0, key=2701414419) > Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396976] Unable to remove > publication from failed node > Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396976] (type=19398666, > lower=1, node=0xa1045013, port=3268044884, key=3268044885) > Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396978] Unable to remove > publication from failed node > > /**************************************************************************************************************************************************************************************************************************************/ > > Thanks, > Prakash > > On Mon, Jul 8, 2024 at 3:32 PM Tung Quang Nguyen < > tung.q.ngu...@dektech.com.au> wrote: > >> >*Jun 29 14:45:36 in-debbld-33 kernel: [1724399.196945] Unable to remove >> publication from failed node Jun 29 14:45:36 in-debbld-33 >> >kernel: >> >[1724399.196945] (type=20185106, lower=1, node=0xd1045013, >> port=2177385505, key=2177385506) Jun 29 14:45:36 in-debbld-33 >> >kernel: >> >[1724399.196948] Unable to remove publication from failed node Jun 29 >> >14:45:36 in-debbld-33 kernel: [1724399.196948] (type=20185106, lower=1, >> node=0xd1045013, port=2177385505, key=2177385506) >> >Jun 29 14:45:36 >> >in-debbld-33 kernel: [1724399.196954] Unable to remove publication from >> failed node Jun 29 14:45:36 in-debbld-33 kernel: >> >[1724399.196954] (type=20185106, lower=1, node=0xd1045013, >> port=2177385505, key=2177385506)* >> > >> >> >############################################################################################################### >> >###### >> > >> > >> > >> >Any idea why this would be happening ? >> Memory error could cause this. I observed the same thing in my system: >> " >> 2024-05-27T07:27:22.747+02:00 kernel:[425219.492047] {...}[Hardware >> Error]: Hardware error from APEI Generic Hardware Error Source: 0 >> ... >> 2024-05-27T07:27:22.747+02:00 kernel:[425219.517622] {...}[Hardware >> Error]: section_type: memory error >> ... >> 2024-05-27T07:27:22.973+02:00 kernel:[425221.704107] tipc: Unable to >> remove publication from failed node >> 2024-05-27T07:27:22.973+02:00 kernel:[425221.704107] (type=143322, >> lower=9, node=0x1001009, port=1295954722, key=1295954723) >> 2024-05-27T07:27:22.973+02:00 kernel:[425221.717912] tipc: Unable to >> remove publication from failed node >> 2024-05-27T07:27:22.973+02:00 kernel:[425221.717912] (type=143322, >> lower=9, node=0x1001009, port=1295954722, key=1295954723) >> " >> >
_______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion