>
> Thanks Tung for your response. I verified the kernel logs in our setup.
> There is no memory error as in your case.
>
> I see the below pattern though.
> 1. There are multiple local publication related messages. But TIPC is not
> entering the infinite loop immediately.
>       Jul 16 13:23:31 in-debbld39-pd kernel: [90527.681211] Failed to
> remove local publication {2,1359237139,1359237139}/0
> 2. There are some errors of illegal FSM even after which the infinite loop
> is triggered.
>     Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396954] Illegal FSM
> event fa110ede in state d0000 on link
> 13500451:INTERCC_BEARER-135004a1:INTERCC_BEARER
>  Not sure if the above error has some relation with hitting "Unable to
> remove publication from failed node" infinite loop issue.
>
>
> From kernel.log file -
>
> /**************************************************************************************************************************************************************************************************************************************/
> Jul 16 13:23:31 in-debbld39-pd kernel: [90527.681211] Failed to remove
> local publication {2,1359237139,1359237139}/0
> Jul 16 13:23:37 in-debbld39-pd kernel: [90533.911632] [UFW BLOCK] IN=eno1
> OUT= MAC=5c:b9:01:fe:f6:d0:40:a8:f0:3b:ce:40:08:00 SRC=10.220.82.25
> DST=10.220.82.39 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=45831 DF PROTO=TCP
> SPT=46905 DPT=111 WINDOW=64240 RES=0x00 SYN URGP=0
> Jul 16 13:23:37 in-debbld39-pd kernel: [90533.913988] [UFW BLOCK] IN=eno1
> OUT= MAC=5c:b9:01:fe:f6:d0:d4:f5:ef:70:c9:10:08:00 SRC=10.220.82.110
> DST=10.220.82.39 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=64852 DF PROTO=TCP
> SPT=48891 DPT=111 WINDOW=64240 RES=0x00 SYN URGP=0
> Jul 16 13:23:37 in-debbld39-pd kernel: [90533.915163] [UFW BLOCK] IN=eno1
> OUT= MAC=5c:b9:01:fe:f6:d0:ec:b1:d7:83:83:90:08:00 SRC=10.220.82.28
> DST=10.220.82.39 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27835 DF PROTO=TCP
> SPT=60415 DPT=111 WINDOW=64240 RES=0x00 SYN URGP=0
> Jul 16 13:23:41 in-debbld39-pd kernel: [90537.913017] Failed to remove
> local publication {2,1359237139,1359237139}/0
> Jul 16 13:23:42 in-debbld39-pd kernel: [90539.299673] ACPI Error:
> SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found
> length 32 (20180810/exfield-393)
> Jul 16 13:23:42 in-debbld39-pd kernel: [90539.299684] ACPI Error: Method
> parse/execution failed \_SB.PMI0._PMM, AE_AML_BUFFER_LIMIT
> (20180810/psparse-516)
> Jul 16 13:23:42 in-debbld39-pd kernel: [90539.299695] ACPI Error:
> AE_AML_BUFFER_LIMIT, Evaluating _PMM (20180810/power_meter-339)
> Jul 16 13:23:48 in-debbld39-pd kernel: [90544.825004] Failed to remove
> local publication {2,1359237139,1359237139}/0
> Jul 16 13:23:51 in-debbld39-pd kernel: [90547.993176] Failed to remove
> local publication {2,1359237139,1359237139}/0
> Jul 16 13:23:53 in-debbld39-pd kernel: [90549.816973] Failed to remove
> local publication {2,1359237139,1359237139}/0
> Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396954] Illegal FSM event
> fa110ede in state d0000 on link
> 13500451:INTERCC_BEARER-135004a1:INTERCC_BEARER
> Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396971] Unable to remove
> publication from failed node
> Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396971]  (type=0,
> lower=2701414419, node=0xa1045013, port=0, key=2701414419)
> Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396976] Unable to remove
> publication from failed node
> Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396976]  (type=19398666,
> lower=1, node=0xa1045013, port=3268044884, key=3268044885)
> Jul 16 13:23:53 in-debbld39-pd kernel: [90550.396978] Unable to remove
> publication from failed node
>
> /**************************************************************************************************************************************************************************************************************************************/
>
> Thanks,
> Prakash
>
> On Mon, Jul 8, 2024 at 3:32 PM Tung Quang Nguyen <
> tung.q.ngu...@dektech.com.au> wrote:
>
>> >*Jun 29 14:45:36 in-debbld-33 kernel: [1724399.196945] Unable to remove
>> publication from failed node Jun 29 14:45:36 in-debbld-33
>> >kernel:
>> >[1724399.196945]  (type=20185106, lower=1, node=0xd1045013,
>> port=2177385505, key=2177385506) Jun 29 14:45:36 in-debbld-33
>> >kernel:
>> >[1724399.196948] Unable to remove publication from failed node Jun 29
>> >14:45:36 in-debbld-33 kernel: [1724399.196948]  (type=20185106, lower=1,
>> node=0xd1045013, port=2177385505, key=2177385506)
>> >Jun 29 14:45:36
>> >in-debbld-33 kernel: [1724399.196954] Unable to remove publication from
>> failed node Jun 29 14:45:36 in-debbld-33 kernel:
>> >[1724399.196954] (type=20185106, lower=1, node=0xd1045013,
>> port=2177385505, key=2177385506)*
>> >
>>
>> >###############################################################################################################
>> >######
>> >
>> >
>> >
>> >Any idea why this would be happening ?
>> Memory error could cause this. I observed the same thing in my system:
>> "
>> 2024-05-27T07:27:22.747+02:00 kernel:[425219.492047] {...}[Hardware
>> Error]: Hardware error from APEI Generic Hardware Error Source: 0
>> ...
>> 2024-05-27T07:27:22.747+02:00 kernel:[425219.517622] {...}[Hardware
>> Error]:   section_type: memory error
>> ...
>> 2024-05-27T07:27:22.973+02:00 kernel:[425221.704107] tipc: Unable to
>> remove publication from failed node
>> 2024-05-27T07:27:22.973+02:00 kernel:[425221.704107]  (type=143322,
>> lower=9, node=0x1001009, port=1295954722, key=1295954723)
>> 2024-05-27T07:27:22.973+02:00 kernel:[425221.717912] tipc: Unable to
>> remove publication from failed node
>> 2024-05-27T07:27:22.973+02:00 kernel:[425221.717912]  (type=143322,
>> lower=9, node=0x1001009, port=1295954722, key=1295954723)
>> "
>>
>

_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to