Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-09-21 Thread A V Mahesh
Hi HansN,

 >> any  how GA is tagged.

Sorry I mean  RC2 tagged

-AVM

On 9/21/2016 12:41 PM, A V Mahesh wrote:
> Hi HansN,
>
> I just tested with uniform buffer sizes in all nodes and sending 
> messages with normal phase the results looks OK,
> even after hitting the TIPC_ERR_OVERLOAD.
>
> So my conclusion is, in general all node will have same buffer sizes 
> let us go with V2  patch,  any  how GA is tagged ,
> so we have enough time for testing and if we get some issues we can 
> resolve them by next release.
>
> ==
>  
>
>
> Sep 21 11:51:40 SC-1 osafamfd[15792]: NO Node 'PL-4' joined the cluster
> Sep 21 11:51:40 SC-1 osafimmnd[15741]: NO Implementer connected: 17 
> (MsgQueueService132111) <0, 2040f>
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
>
> ==
>  
>
>
>
> On 9/21/2016 11:37 AM, A 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-09-21 Thread A V Mahesh
Hi HansN,

On 9/20/2016 4:17 PM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> I think only logging is needed as proposed in the patch, as some services are 
> already handling dropped messages. This logging will help in
> trouble shooting. Keeping TIPC_DEST_DROPPABLE to true will only make TIPC to 
> silently drop messages, the original problem persists and needs investigation,
> i.e. why the socket receive buffer is overloaded, one reason may be that the 
> MDS poll/receive loop together with the "big" mutex lock, (ticket #520).
[AVM]   One valid reason could be, in case of  TIPC_ERR_OVERLOAD 
recd_bytes is NOT zero ,  so buffer is overloaded can occur at TIPC or 
MDS level ,
   I  will investigate more and update.

> Did you check why MDS message loss mechanism doesn't detect on TIPC dropped 
> messages, AMF
> do detect this via e.g "out of sync", "msg id mismatch" and so on?
[AVM]  You mean  IMMD  message loss mechanism ?

-AVM
>
> /Regards HansN
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: den 20 september 2016 12:29
> To: Anders Widell ; Hans Nordebäck 
> 
> Cc: opensaf-devel@lists.sourceforge.net; mathi.naic...@oracle.com
> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>
> HI Anders Widell / HansN,
>
> On 9/16/2016 2:03 PM, Anders Widell wrote:
>> The idea was to just log reception of error info messages, for
>> trouble-shooting purposes.
> After multiple attempts,  i manged to simulate TIPC_ERR_OVERLOAD
> error.After  TIPC_ERR_OVERLOAD error is hit
> the cluster going to UN-recoverable state , because the send buffers are full.
>
> So we have two options :
>
> 1)  Set  TIPC_DEST_DROPPABLE to false ,  log TIPC_ERR_OVERLOAD error and then 
>  graceful  exist of sender,
>which allows remaining nodes to be survived.
>
> 2)  keep the current configuration as it is ( TIPC_DEST_DROPPABLE to true )
>
> =
> Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Received node_up from 2040f:
> msg_id 1
> Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Node 'PL-4' joined the cluster Sep 20 
> 15:14:09 SC-1 osafimmnd[3695]: NO Implementer connected: 19
> (MsgQueueService132111) <0, 2040f>
> *Sep 20 15:16:59 SC-1 osafimmd[3684]: 77 MDTM: undelivered message condition 
> ancillary data: TIPC_ERR_OVERLOAD* Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA 
> Director Service in NOACTIVE state - fevs replies pending:1 fevs highest 
> processed:218744 Sep 20 15:17:00 SC-1 osafamfnd[3773]: NO 
> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
> Recovery is 'nodeFailfast'
> Sep 20 15:17:00 SC-1 osafamfnd[3773]: ER 
> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown 
> Recovery is:nodeFailfast Sep 20 15:17:00 SC-1 osafamfnd[3773]: Rebooting 
> OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is 
> node failfast, OwnNodeId = 131343, SupervisionTime = 60 Sep 20 15:17:00 SC-1 
> osafimmnd[3695]: WA DISCARD DUPLICATE FEVS
> message:218744
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Error code 2 returned for message 
> type 82 - ignoring Sep 20 15:17:00 SC-1 opensaf_reboot: Rebooting local node; 
> timeout=60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA SC Absence IS allowed:900 
> IMMD service is DOWN Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO IMMD SERVICE IS 
> DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS Sep 20 15:17:00 
> SC-1 osafntfimcnd[3742]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) 
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:20002010f
> sv_id:27
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 1 <2,
> 2010f> (safLogService)
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:d0d0002010f
> sv_id:26
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:12010f
> sv_id:27
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 2 <16,
> 2010f> (@safLogService_appl)
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:130002010f
> sv_id:27
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 3 <19,
> 2010f> (@OpenSafImmReplicatorA)
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:140002010f
> sv_id:26
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:150002010f
> sv_id:27
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 4 <21,
> 2010f> (safClmService)
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1a0002010f
> sv_id:27
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 5 <26,
> 2010f> (safAmfService)
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1b0002010f
> sv_id:26
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bc0002010f
> sv_id:26
> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bd0002010f
> 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-09-20 Thread Hans Nordebäck
Hi Mahesh,

I think only logging is needed as proposed in the patch, as some services are 
already handling dropped messages. This logging will help in
trouble shooting. Keeping TIPC_DEST_DROPPABLE to true will only make TIPC to 
silently drop messages, the original problem persists and needs investigation,
i.e. why the socket receive buffer is overloaded, one reason may be that the 
MDS poll/receive loop together with the "big" mutex lock, (ticket #520).
Did you check why MDS message loss mechanism doesn't detect on TIPC dropped 
messages, AMF 
do detect this via e.g "out of sync", "msg id mismatch" and so on?

/Regards HansN

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: den 20 september 2016 12:29
To: Anders Widell ; Hans Nordebäck 

Cc: opensaf-devel@lists.sourceforge.net; mathi.naic...@oracle.com
Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

HI Anders Widell / HansN,

On 9/16/2016 2:03 PM, Anders Widell wrote:
> The idea was to just log reception of error info messages, for 
> trouble-shooting purposes.

After multiple attempts,  i manged to simulate TIPC_ERR_OVERLOAD 
error.After  TIPC_ERR_OVERLOAD error is hit
the cluster going to UN-recoverable state , because the send buffers are full.

So we have two options :

1)  Set  TIPC_DEST_DROPPABLE to false ,  log TIPC_ERR_OVERLOAD error and then  
graceful  exist of sender,
  which allows remaining nodes to be survived.

2)  keep the current configuration as it is ( TIPC_DEST_DROPPABLE to true )

=
Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Received node_up from 2040f: 
msg_id 1
Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Node 'PL-4' joined the cluster Sep 20 
15:14:09 SC-1 osafimmnd[3695]: NO Implementer connected: 19
(MsgQueueService132111) <0, 2040f>
*Sep 20 15:16:59 SC-1 osafimmd[3684]: 77 MDTM: undelivered message condition 
ancillary data: TIPC_ERR_OVERLOAD* Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA 
Director Service in NOACTIVE state - fevs replies pending:1 fevs highest 
processed:218744 Sep 20 15:17:00 SC-1 osafamfnd[3773]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep 20 15:17:00 SC-1 osafamfnd[3773]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast Sep 20 15:17:00 SC-1 osafamfnd[3773]: Rebooting OpenSAF NodeId 
= 131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: 
WA DISCARD DUPLICATE FEVS
message:218744
Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Error code 2 returned for message type 
82 - ignoring Sep 20 15:17:00 SC-1 opensaf_reboot: Rebooting local node; 
timeout=60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA SC Absence IS allowed:900 
IMMD service is DOWN Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO IMMD SERVICE IS 
DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS Sep 20 15:17:00 SC-1 
osafntfimcnd[3742]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Sep 20 
15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:20002010f
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 1 <2, 
2010f> (safLogService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:d0d0002010f
sv_id:26
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:12010f
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 2 <16, 
2010f> (@safLogService_appl)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:130002010f
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 3 <19, 
2010f> (@OpenSafImmReplicatorA)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:140002010f
sv_id:26
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:150002010f
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 4 <21, 
2010f> (safClmService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1a0002010f
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 5 <26, 
2010f> (safAmfService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1b0002010f
sv_id:26
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bc0002010f
sv_id:26
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bd0002010f
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 6 <1469, 
2010f> (MsgQueueService131343) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO 
Removing client id:5c2010f
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 10 <1472, 
2010f> (safEvtService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client 
id:5c40002010f
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 8 <1476, 
2010f> (safSmfService) Sep 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-09-20 Thread A V Mahesh
HI Anders Widell / HansN,

On 9/16/2016 2:03 PM, Anders Widell wrote:
> The idea was to just log reception of error info messages, for 
> trouble-shooting purposes.

After multiple attempts,  i manged to simulate TIPC_ERR_OVERLOAD 
error.After  TIPC_ERR_OVERLOAD error is hit
the cluster going to UN-recoverable state , because the send buffers are 
full.

So we have two options :

1)  Set  TIPC_DEST_DROPPABLE to false ,  log TIPC_ERR_OVERLOAD error  
and then  graceful  exist of sender,
  which allows remaining nodes to be survived.

2)  keep the current configuration as it is ( TIPC_DEST_DROPPABLE to true )

=
Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Received node_up from 2040f: 
msg_id 1
Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Node 'PL-4' joined the cluster
Sep 20 15:14:09 SC-1 osafimmnd[3695]: NO Implementer connected: 19 
(MsgQueueService132111) <0, 2040f>
*Sep 20 15:16:59 SC-1 osafimmd[3684]: 77 MDTM: undelivered message 
condition ancillary data: TIPC_ERR_OVERLOAD*
Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Director Service in NOACTIVE 
state - fevs replies pending:1 fevs highest processed:218744
Sep 20 15:17:00 SC-1 osafamfnd[3773]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'avaDown' : Recovery is 'nodeFailfast'
Sep 20 15:17:00 SC-1 osafamfnd[3773]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown 
Recovery is:nodeFailfast
Sep 20 15:17:00 SC-1 osafamfnd[3773]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA DISCARD DUPLICATE FEVS 
message:218744
Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 15:17:00 SC-1 opensaf_reboot: Rebooting local node; timeout=60
Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA SC Absence IS allowed:900 IMMD 
service is DOWN
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO IMMD SERVICE IS DOWN, HYDRA IS 
CONFIGURED => UNREGISTERING IMMND form MDS
Sep 20 15:17:00 SC-1 osafntfimcnd[3742]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:20002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 1 <2, 
2010f> (safLogService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:d0d0002010f 
sv_id:26
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:12010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 2 <16, 
2010f> (@safLogService_appl)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:130002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 3 <19, 
2010f> (@OpenSafImmReplicatorA)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:140002010f 
sv_id:26
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:150002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 4 <21, 
2010f> (safClmService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1a0002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 5 <26, 
2010f> (safAmfService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1b0002010f 
sv_id:26
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bc0002010f 
sv_id:26
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bd0002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 6 
<1469, 2010f> (MsgQueueService131343)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c2010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 10 
<1472, 2010f> (safEvtService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c40002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 8 
<1476, 2010f> (safSmfService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c60002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 9 
<1478, 2010f> (safLckService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c70002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 7 
<1479, 2010f> (safMsgGrpService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5cc0002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5ce0002010f 
sv_id:27
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 12 
<1486, 2010f> (safCheckPointService)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 13 <0, 
2020f(down)> (MsgQueueService131599)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 14 <0, 
2020f(down)> (@OpenSafImmReplicatorB)
Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 15 <0, 
2020f(down)> (@safAmfService2020f)

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-09-16 Thread Anders Widell
I don't think we need (or even should) inform the sender when MDS 
receives an error information message from TIPC. Note that these error 
information messages are received asynchronously, when the sender has 
already received an OK return code from the MDS send call. The idea was 
to just log reception of error info messages, for trouble-shooting 
purposes. We already have a mechanism in MDS that informs the receiver 
about lost MDS messages. If we wish to inform the sender we would need 
to introduce a second mechanism in MDS, and at this point I don't think 
it is needed. Another approach we could consider is that MDS retransmits 
the message transparently without informing the sender. This would 
require MDS to internally store sent messages for a while, so that they 
can be retransmitted. It would also require the receiver to re-order 
received messages, since a retransmitted message will be received out of 
sequence.

regards,

Anders Widell


On 09/16/2016 06:40 AM, A V Mahesh wrote:
> Hi HansN,
>
> I managed to create TIPC_ERRINFO/TIPC_RETDATA  error cases ( not 
> TIPC_ERR_OVERLOAD error )  with normal messages
> and It is observed that  TIPC_DEST_DROPPABLE set to true even error 
> TIPC_ERRINFO is NOT notified ( it means TIPC_ERR_OVERLOAD ) ,
> if TIPC_DEST_DROPPABLE set to false TIPC_ERRINFO/TIPC_RETDATA errors 
> are notified.
>
> Now I will also check implication of TIPC_DEST_DROPPABLE set to false 
> on multicast and broadcast  messages, based on that
> we can re-arrange the TIPC_DEST_DROPPABLE setting to false conditions  
> based on agent `i_msg_loss_indication = true` condition
> mds can return to agent the same error  TIPC_ERR_OVERLOAD.
>
> TIPC_DEST_DROPPABLE to false:
>
> ==
>
> Sep 15 16:10:39 SC-1 osafimmnd[32051]: NO Implementer disconnected 13 
> <0, 2040f> (MsgQueueService132111)
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]: NO MDS event from svc_id 25 
> (change:4, dest:567413369208836)
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : 2
> Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 15 16:10:39 SC-1 osafamfd[32114]: NO Node 'PL-4' left the cluster
>
> ==
>
> TIPC_DEST_DROPPABLE to true:
>
> ==
>
> Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Implementer disconnected 13 
> <0, 2040f> (MsgQueueService132111)
> Sep 15 15:59:55 SC-1 osafimmd[26450]: NO MDS event from svc_id 25 
> (change:4, dest:567412923957252)
> Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Global discard node received 
> for nodeId:2040f pid:410
> Sep 15 15:59:55 SC-1 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-09-15 Thread A V Mahesh
Hi HansN,

I managed to create TIPC_ERRINFO/TIPC_RETDATA  error cases ( not 
TIPC_ERR_OVERLOAD error )  with normal messages
and It is observed that  TIPC_DEST_DROPPABLE set to true even error 
TIPC_ERRINFO is NOT notified ( it means TIPC_ERR_OVERLOAD ) ,
if TIPC_DEST_DROPPABLE set to false TIPC_ERRINFO/TIPC_RETDATA errors are 
notified.

Now I will also check implication of TIPC_DEST_DROPPABLE set to false on 
multicast and broadcast  messages, based on that
we can re-arrange the TIPC_DEST_DROPPABLE setting to false conditions  
based on agent `i_msg_loss_indication = true` condition
mds can return to agent the same error  TIPC_ERR_OVERLOAD.

TIPC_DEST_DROPPABLE to false:

==

Sep 15 16:10:39 SC-1 osafimmnd[32051]: NO Implementer disconnected 13 
<0, 2040f> (MsgQueueService132111)
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]: NO MDS event from svc_id 25 
(change:4, dest:567413369208836)
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafimmd[32040]:  777 MDTM: undelivered message 
condition ancillary data: TIPC_ERRINFO abort err : 2
Sep 15 16:10:39 SC-1 osafimmd[32040]:  MDTM: undelivered message 
condition ancillary data: TIPC_RETDATA
Sep 15 16:10:39 SC-1 osafamfd[32114]: NO Node 'PL-4' left the cluster

==

TIPC_DEST_DROPPABLE to true:

==

Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Implementer disconnected 13 
<0, 2040f> (MsgQueueService132111)
Sep 15 15:59:55 SC-1 osafimmd[26450]: NO MDS event from svc_id 25 
(change:4, dest:567412923957252)
Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Global discard node received 
for nodeId:2040f pid:410
Sep 15 15:59:55 SC-1 osafamfd[28810]: NO Node 'PL-4' left the cluster
Sep 15 15:59:58 SC-1 kernel: [ 5147.648737] tipc: Resetting link 
<1.1.1:eth0-1.1.4:eth0>, peer not responding
Sep 15 15:59:58 SC-1 kernel: [ 5147.648756] tipc: Lost link 
<1.1.1:eth0-1.1.4:eth0> on network plane A
Sep 15 15:59:58 SC-1 kernel: [ 5147.648771] tipc: Lost contact with <1.1.4>

==

-AVM


On 9/1/2016 10:59 AM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> I have not tested this, but the following should work:
>
> - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE
>
> - set socket receive buffer to a small value:
>
>   optval = "small socket recieive buffer size" , 5000 ?
>
>   setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, , optlen)
>
> -  sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller 
> values)
>
> - add some delays when processing messages in 
> mdtm_process_recv_events(), to provoke overloading the socket receive 
> buffer.
>
> We experience dropped packages in a 75 node system, and as a 
> workaround increasing the default so receive buffer size it seems 
> working for that setup.
>
> /Thanks 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-09-08 Thread A V Mahesh
Hi HansN,

So far I was not successful in creating TIPC_ERR_OVERLOAD case ,
so I am planing to rebuilding `tipc.ko` with less OVERLOAD_LIMIT_BASE 
value of tipc.

Currently I am working on priority open tickets on the 5.1.RC1 milestone,
I will get back to you soon.

-AVM

On 9/8/2016 2:02 PM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> Any updates on this?
>
> /Thanks HansN
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: den 1 september 2016 07:55
> To: Hans Nordebäck 
> Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 
> ; mathi.naic...@oracle.com
> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>
> Hi HansN,
>
>   >> I have not tested this
>
> Ok Thanks for the tips, I will check partially TIPC_DEST_DROPPABLE enabled & 
> disabled case and then we can conclude .
>
> -AVM
>
> On 9/1/2016 10:59 AM, Hans Nordebäck wrote:
>> Hi Mahesh,
>>
>> I have not tested this, but the following should work:
>>
>> - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE
>>
>> - set socket receive buffer to a small value:
>>
>>optval = "small socket recieive buffer size" , 5000 ?
>>
>>setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, , optlen)
>>
>> -  sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller
>> values)
>>
>> - add some delays when processing messages in
>> mdtm_process_recv_events(), to provoke overloading the socket receive
>> buffer.
>>
>> We experience dropped packages in a 75 node system, and as a
>> workaround increasing the default so receive buffer size it seems
>> working for that setup.
>>
>> /Thanks HansN
>>
>> On 09/01/2016 05:50 AM, A V Mahesh wrote:
>>> Hi HansN,
>>>
>>> Do you have any tips to created overload case,
>>>
>>> I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled
>>> cases.
>>>
>>> -AVM
>>>
>>>
>>> On 9/1/2016 9:12 AM, A V Mahesh wrote:
 Hi HansN,

 Sorry for the delay.

 I will test it and get back to you soon.

 -AVM


 On 8/31/2016 4:29 PM, Hans Nordebäck wrote:
> Hi Mahesh,
> Any updates on this?
>
> /Regards HansN
>
> -Original Message-
> From: Anders Widell
> Sent: den 25 augusti 2016 13:11
> To: A V Mahesh ; Hans Nordebäck
> ; mathi.naic...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>
> Hi!
>
> This is what the TIPC user documentation says about
> TIPC_DEST_DROPPABLE:
> "This option governs the handling of messages sent by the socket if
> the message cannot be delivered to its destination, either because
> the receiver is congested or because the specified receiver does
> not exist.
> If enabled, the message is discarded; otherwise the message is
> returned to the sender."
>
> This is what the TIPC user documentation says about the return
> value from the recvmsg() system call: "When used with a
> connectionless socket, a return value of 0 indicates the arrival of
> a returned data message that was originally sent by this socket."
>
> I think the documentation is pretty clear. If you set
> TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g.
> when the receive buffer is full. The sender will not be notified in
> this case. If TIPC_DEST_DROPPABLE is set to false, the message will
> be returned to the sender in case of a full receive buffer. The
> sender knows that it has received such a returned message when the
> recvmsg() call returns zero.
>
> regards,
> Anders Widell
>
> On 08/25/2016 11:30 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>>
>> On 8/23/2016 5:22 PM, Hans Nordebäck wrote:
>>
>>> Hi Mahesh,
>>>
>>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc
>>> may drop messages silently,  at receive sock buffer full
>>> condition,  but do not return any ancillary message.
>>> If TIPC_DROPPABLE = false tipc may drop message but will send an
>>> ancillary message to inform about TIPC_ERR_OVERLOAD.
>> [AVM]
>>
>> My observation are understanding is different, based on TIPC code
>> and Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD
>> error returned when TIPC is unable to enqueue an incoming message
>> on the receiving socket's receive queue irrelevant of
>> TIPC_DEST_DROPPABLE enabled or disabled.
>>
>> The only difference between TIPC_DEST_DROPPABLE enabled or
>> disabled is , If  TIPC_DEST_DROPPABLE enabled, the message is
>> discarded and
>> recvmsg() returned size is ZERO and application will get errors,
>> if TIPC_DEST_DROPPABLE disabled  the message is returned to the
>> sender it means the recvmsg() returned size is user send data size

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-09-08 Thread Hans Nordebäck
Hi Mahesh,

Any updates on this?

/Thanks HansN

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: den 1 september 2016 07:55
To: Hans Nordebäck 
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 
; mathi.naic...@oracle.com
Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

Hi HansN,

 >> I have not tested this

Ok Thanks for the tips, I will check partially TIPC_DEST_DROPPABLE enabled & 
disabled case and then we can conclude .

-AVM

On 9/1/2016 10:59 AM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> I have not tested this, but the following should work:
>
> - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE
>
> - set socket receive buffer to a small value:
>
>   optval = "small socket recieive buffer size" , 5000 ?
>
>   setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, , optlen)
>
> -  sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller
> values)
>
> - add some delays when processing messages in 
> mdtm_process_recv_events(), to provoke overloading the socket receive 
> buffer.
>
> We experience dropped packages in a 75 node system, and as a 
> workaround increasing the default so receive buffer size it seems 
> working for that setup.
>
> /Thanks HansN
>
> On 09/01/2016 05:50 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> Do you have any tips to created overload case,
>>
>> I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled 
>> cases.
>>
>> -AVM
>>
>>
>> On 9/1/2016 9:12 AM, A V Mahesh wrote:
>>> Hi HansN,
>>>
>>> Sorry for the delay.
>>>
>>> I will test it and get back to you soon.
>>>
>>> -AVM
>>>
>>>
>>> On 8/31/2016 4:29 PM, Hans Nordebäck wrote:
 Hi Mahesh,
 Any updates on this?

 /Regards HansN

 -Original Message-
 From: Anders Widell
 Sent: den 25 augusti 2016 13:11
 To: A V Mahesh ; Hans Nordebäck 
 ; mathi.naic...@oracle.com
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

 Hi!

 This is what the TIPC user documentation says about
 TIPC_DEST_DROPPABLE:
 "This option governs the handling of messages sent by the socket if 
 the message cannot be delivered to its destination, either because 
 the receiver is congested or because the specified receiver does 
 not exist.
 If enabled, the message is discarded; otherwise the message is 
 returned to the sender."

 This is what the TIPC user documentation says about the return 
 value from the recvmsg() system call: "When used with a 
 connectionless socket, a return value of 0 indicates the arrival of 
 a returned data message that was originally sent by this socket."

 I think the documentation is pretty clear. If you set 
 TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g.
 when the receive buffer is full. The sender will not be notified in 
 this case. If TIPC_DEST_DROPPABLE is set to false, the message will 
 be returned to the sender in case of a full receive buffer. The 
 sender knows that it has received such a returned message when the
 recvmsg() call returns zero.

 regards,
 Anders Widell

 On 08/25/2016 11:30 AM, A V Mahesh wrote:
> Hi HansN,
>
>
> On 8/23/2016 5:22 PM, Hans Nordebäck wrote:
>
>> Hi Mahesh,
>>
>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc 
>> may drop messages silently,  at receive sock buffer full 
>> condition,  but do not return any ancillary message.
>> If TIPC_DROPPABLE = false tipc may drop message but will send an 
>> ancillary message to inform about TIPC_ERR_OVERLOAD.
> [AVM]
>
> My observation are understanding is different, based on TIPC code 
> and Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD 
> error returned when TIPC is unable to enqueue an incoming message 
> on the receiving socket's receive queue irrelevant of 
> TIPC_DEST_DROPPABLE enabled or disabled.
>
> The only difference between TIPC_DEST_DROPPABLE enabled or 
> disabled is , If  TIPC_DEST_DROPPABLE enabled, the message is 
> discarded and
> recvmsg() returned size is ZERO and application will get errors, 
> if TIPC_DEST_DROPPABLE disabled  the message is returned to the 
> sender it means the recvmsg() returned size is user send data size 
> and application will get errors .
>
> I did check the TIPC code and documentations  and I haven't get 
> any evidences that  TIPC_ERR_OVERLOAD error code will be send only 
> If TIPC_DEST_DROPPABLE = false.
>
> Even while testing #1227
> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my 
> observations and understanding was, an individual TIPC socket is 
> only allowed to queue up

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-31 Thread Hans Nordebäck
Hi Mahesh,

I have not tested this, but the following should work:

- Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE

- set socket receive buffer to a small value:

   optval = "small socket recieive buffer size" , 5000 ?

   setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, , optlen)

-  sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller values)

- add some delays when processing messages in 
mdtm_process_recv_events(), to provoke overloading the socket receive 
buffer.

We experience dropped packages in a 75 node system, and as a workaround 
increasing the default so receive buffer size it seems working for that 
setup.

/Thanks HansN

On 09/01/2016 05:50 AM, A V Mahesh wrote:
> Hi HansN,
>
> Do you have any tips to created overload case,
>
> I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled 
> cases.
>
> -AVM
>
>
> On 9/1/2016 9:12 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> Sorry for the delay.
>>
>> I will test it and get back to you soon.
>>
>> -AVM
>>
>>
>> On 8/31/2016 4:29 PM, Hans Nordebäck wrote:
>>> Hi Mahesh,
>>> Any updates on this?
>>>
>>> /Regards HansN
>>>
>>> -Original Message-
>>> From: Anders Widell
>>> Sent: den 25 augusti 2016 13:11
>>> To: A V Mahesh ; Hans Nordebäck 
>>> ; mathi.naic...@oracle.com
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>>>
>>> Hi!
>>>
>>> This is what the TIPC user documentation says about 
>>> TIPC_DEST_DROPPABLE:
>>> "This option governs the handling of messages sent by the socket if 
>>> the message cannot be delivered to its destination, either because 
>>> the receiver is congested or because the specified receiver does not 
>>> exist.
>>> If enabled, the message is discarded; otherwise the message is 
>>> returned to the sender."
>>>
>>> This is what the TIPC user documentation says about the return value 
>>> from the recvmsg() system call: "When used with a connectionless 
>>> socket, a return value of 0 indicates the arrival of a returned data 
>>> message that was originally sent by this socket."
>>>
>>> I think the documentation is pretty clear. If you set 
>>> TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. 
>>> when the receive buffer is full. The sender will not be notified in 
>>> this case. If TIPC_DEST_DROPPABLE is set to false, the message will 
>>> be returned to the sender in case of a full receive buffer. The 
>>> sender knows that it has received such a returned message when the 
>>> recvmsg() call returns zero.
>>>
>>> regards,
>>> Anders Widell
>>>
>>> On 08/25/2016 11:30 AM, A V Mahesh wrote:
 Hi HansN,


 On 8/23/2016 5:22 PM, Hans Nordebäck wrote:

> Hi Mahesh,
>
> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may
> drop messages silently,  at receive sock buffer full condition,  but
> do not return any ancillary message.
> If TIPC_DROPPABLE = false tipc may drop message but will send an
> ancillary message to inform about TIPC_ERR_OVERLOAD.
 [AVM]

 My observation are understanding is different, based on TIPC code and
 Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error
 returned when TIPC is unable to enqueue an incoming message on the
 receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE
 enabled or disabled.

 The only difference between TIPC_DEST_DROPPABLE enabled or disabled is
 , If  TIPC_DEST_DROPPABLE enabled, the message is discarded and
 recvmsg() returned size is ZERO and application will get errors, if
 TIPC_DEST_DROPPABLE disabled  the message is returned to the sender it
 means the recvmsg() returned size is user send data size and
 application will get errors .

 I did check the TIPC code and documentations  and I haven't get any
 evidences that  TIPC_ERR_OVERLOAD error code will be send only If
 TIPC_DEST_DROPPABLE = false.

 Even while testing #1227
 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my
 observations and understanding was, an individual TIPC socket is only
 allowed to queue up
 OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before
 it starts rejecting them.
 Once a socket receiving queue length exceeds the maximum limit value,
 the receiving socket will send out a reject message  with
 TIPC_ERR_OVERLOAD error code with cmsg_type as
 TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0
 Programmer's Guide  confirmed the same .

 tipc/socket.c
 ===
 /* Reject message if there isn't room to queue it */

 recv_q_len = (u32)atomic_read(_queue_size);
 if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
  if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
  return 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-31 Thread A V Mahesh
Hi HansN,

Sorry for the delay.

I will test it and get back to you soon.

-AVM


On 8/31/2016 4:29 PM, Hans Nordebäck wrote:
> Hi Mahesh,
> Any updates on this?
>
> /Regards HansN
>
> -Original Message-
> From: Anders Widell
> Sent: den 25 augusti 2016 13:11
> To: A V Mahesh ; Hans Nordebäck 
> ; mathi.naic...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>
> Hi!
>
> This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE:
> "This option governs the handling of messages sent by the socket if the 
> message cannot be delivered to its destination, either because the receiver 
> is congested or because the specified receiver does not exist.
> If enabled, the message is discarded; otherwise the message is returned to 
> the sender."
>
> This is what the TIPC user documentation says about the return value from the 
> recvmsg() system call: "When used with a connectionless socket, a return 
> value of 0 indicates the arrival of a returned data message that was 
> originally sent by this socket."
>
> I think the documentation is pretty clear. If you set TIPC_DEST_DROPPABLE to 
> true, the receiver can discard messages e.g. when the receive buffer is full. 
> The sender will not be notified in this case. If TIPC_DEST_DROPPABLE is set 
> to false, the message will be returned to the sender in case of a full 
> receive buffer. The sender knows that it has received such a returned message 
> when the recvmsg() call returns zero.
>
> regards,
> Anders Widell
>
> On 08/25/2016 11:30 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>>
>> On 8/23/2016 5:22 PM, Hans Nordebäck wrote:
>>
>>> Hi Mahesh,
>>>
>>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may
>>> drop messages silently,  at receive sock buffer full condition,  but
>>> do not return any ancillary message.
>>> If TIPC_DROPPABLE = false tipc may drop message but will send an
>>> ancillary message to inform about TIPC_ERR_OVERLOAD.
>> [AVM]
>>
>> My observation are understanding is different, based on TIPC code and
>> Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error
>> returned when TIPC is unable to enqueue an incoming message on the
>> receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE
>> enabled or disabled.
>>
>> The only difference between TIPC_DEST_DROPPABLE enabled or disabled is
>> , If  TIPC_DEST_DROPPABLE enabled, the message is discarded and
>> recvmsg() returned size is ZERO and application will get errors, if
>> TIPC_DEST_DROPPABLE disabled  the message is returned to the sender it
>> means the recvmsg() returned size is user send data size and
>> application will get errors .
>>
>> I did check the TIPC code and documentations  and I haven't  get any
>> evidences that  TIPC_ERR_OVERLOAD error code will be send only If
>> TIPC_DEST_DROPPABLE = false.
>>
>> Even while testing #1227
>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my
>> observations and understanding was, an individual TIPC socket is only
>> allowed to queue up
>> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before
>> it starts rejecting them.
>> Once a socket receiving queue length exceeds the maximum limit value,
>> the receiving socket will send out a reject message  with
>> TIPC_ERR_OVERLOAD error code with cmsg_type as
>> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0
>> Programmer's Guide  confirmed the same .
>>
>> tipc/socket.c
>> ===
>> /* Reject message if there isn't room to queue it */
>>
>> recv_q_len = (u32)atomic_read(_queue_size);
>> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
>>  if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
>>  return TIPC_ERR_OVERLOAD;
>> }
>> recv_q_len = skb_queue_len(>sk_receive_queue);
>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
>>  if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
>>  return TIPC_ERR_OVERLOAD;
>> }
>> ===
>>
>>
>> 2.1.17. setsockopt() of  TIPC 2.0 Programmer's Guide
>> ===
>> TIPC_DEST_DROPPABLE
>> This option governs the handling of messages sent by the socket if the
>> message cannot be delivered to its destination, either because the
>> receiver is congested or because the specified receiver does not
>> exist. If enabled, the message is discarded; otherwise the message is
>> returned to the sender.
>>
>> By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM
>> socket types, and enabled for SOCK_RDM and SOCK_DGRAM, This
>> arrangement ensures proper teardown of failed connections when
>> connection-oriented data transfer is used, without increasing the
>> complexity of connectionless data transfer.
>>
>> TIPC_SRC_DROPPABLE
>> This option 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-25 Thread Anders Widell
Hi!

This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: 
"This option governs the handling of messages sent by the socket if the 
message cannot be delivered to its destination, either because the 
receiver is congested or because the specified receiver does not exist. 
If enabled, the message is discarded; otherwise the message is returned 
to the sender."

This is what the TIPC user documentation says about the return value 
from the recvmsg() system call: "When used with a connectionless socket, 
a return value of 0 indicates the arrival of a returned data message 
that was originally sent by this socket."

I think the documentation is pretty clear. If you set 
TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. when 
the receive buffer is full. The sender will not be notified in this 
case. If TIPC_DEST_DROPPABLE is set to false, the message will be 
returned to the sender in case of a full receive buffer. The sender 
knows that it has received such a returned message when the recvmsg() 
call returns zero.

regards,
Anders Widell

On 08/25/2016 11:30 AM, A V Mahesh wrote:
> Hi HansN,
>
>
> On 8/23/2016 5:22 PM, Hans Nordebäck wrote:
>
>> Hi Mahesh,
>>
>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may 
>> drop messages silently,  at receive sock buffer full condition,  but 
>> do not return any ancillary message.
>> If TIPC_DROPPABLE = false tipc may drop message but will send an 
>> ancillary message to inform about TIPC_ERR_OVERLOAD.
> [AVM]
>
> My observation are understanding is different, based on TIPC code and  
> Linux TIPC 2.0 Programmer's Guide ,
> that the TIPC_ERR_OVERLOAD error returned when TIPC is unable to 
> enqueue an incoming message on the receiving socket's receive queue
> irrelevant of TIPC_DEST_DROPPABLE enabled or disabled.
>
> The only difference between TIPC_DEST_DROPPABLE enabled or disabled is 
> , If  TIPC_DEST_DROPPABLE enabled, the message is discarded and 
> recvmsg() returned size is ZERO and application will get errors,
> if TIPC_DEST_DROPPABLE disabled  the message is returned to the sender 
> it means the recvmsg() returned size is user send data size and 
> application will get errors .
>
> I did check the TIPC code and documentations  and I haven't  get any 
> evidences that  TIPC_ERR_OVERLOAD error code will be send only
> If TIPC_DEST_DROPPABLE = false.
>
> Even while testing #1227 
> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my 
> observations and understanding was,
> an individual TIPC socket is only allowed to queue up 
> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before 
> it starts rejecting them.
> Once a socket receiving queue length exceeds the maximum limit value, 
> the receiving socket will send out a reject message  with 
> TIPC_ERR_OVERLOAD error code
> with cmsg_type as TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and 
> Linux TIPC 2.0 Programmer's Guide  confirmed the same .
>
> tipc/socket.c
> ===
> /* Reject message if there isn't room to queue it */
>
> recv_q_len = (u32)atomic_read(_queue_size);
> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
> return TIPC_ERR_OVERLOAD;
> }
> recv_q_len = skb_queue_len(>sk_receive_queue);
> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
> return TIPC_ERR_OVERLOAD;
> }
> ===
>
>
> 2.1.17. setsockopt() of  TIPC 2.0 Programmer's Guide
> ===
> TIPC_DEST_DROPPABLE
> This option governs the handling of messages sent by the socket if the 
> message cannot be delivered to its destination,
> either because the receiver is congested or because the specified 
> receiver does not exist. If enabled, the message is discarded; 
> otherwise the message is returned to the sender.
>
> By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM 
> socket types, and enabled for SOCK_RDM and SOCK_DGRAM,
> This arrangement ensures proper teardown of failed connections when 
> connection-oriented data transfer is used, without increasing the 
> complexity of connectionless data transfer.
>
> TIPC_SRC_DROPPABLE
> This option governs the handling of messages sent by the socket if 
> link congestion occurs. If enabled, the message is discarded; 
> otherwise the system queues the message for later transmission.
> By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, 
> and SOCK_RDM socket types (resulting in "reliable" data transfer), and 
> enabled for SOCK_DGRAM (resulting in "unreliable" data transfer).
> ===
>
> Now I will try to create OVERLOAD case and update you soon my latest 
> observations.
>
> -AVM
>
>> Correcting this and adding an abort is 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-25 Thread A V Mahesh
Hi HansN,
  

On 8/23/2016 5:22 PM, Hans Nordebäck wrote:

> Hi Mahesh,
>
> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may drop 
> messages silently,  at receive sock buffer full condition,  but do not return 
> any ancillary message.
> If TIPC_DROPPABLE = false tipc may drop message but will send an ancillary 
> message to inform about TIPC_ERR_OVERLOAD.
[AVM]

My observation are understanding is different, based on TIPC code and  
Linux TIPC 2.0 Programmer's Guide ,
that the TIPC_ERR_OVERLOAD error returned when TIPC is unable to enqueue 
an incoming message on the receiving socket's receive queue
irrelevant of TIPC_DEST_DROPPABLE enabled or disabled.

The only difference between TIPC_DEST_DROPPABLE enabled or disabled is , 
If  TIPC_DEST_DROPPABLE enabled, the message is discarded and recvmsg() 
returned size is ZERO and application will get errors,
if TIPC_DEST_DROPPABLE disabled  the message is returned to the sender 
it means the recvmsg() returned size is user send data size and 
application will get errors .

I did check the TIPC code and documentations  and I haven't  get any 
evidences that  TIPC_ERR_OVERLOAD error code will be send only
If TIPC_DEST_DROPPABLE = false.

Even while testing #1227 
(https://sourceforge.net/p/opensaf/mailman/message/33207717/) my 
observations and understanding was,
an individual TIPC socket is only allowed to queue up 
OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before it 
starts rejecting them.
Once a socket receiving queue length exceeds the maximum limit value, 
the receiving socket will send out a reject message  with 
TIPC_ERR_OVERLOAD error code
with cmsg_type as TIPC_ERRINFO/TIPC_RETDATA, and the tipc code  and 
Linux TIPC 2.0 Programmer's Guide  confirmed the same .

tipc/socket.c
===
/* Reject message if there isn't room to queue it */

recv_q_len = (u32)atomic_read(_queue_size);
if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
 if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
 return TIPC_ERR_OVERLOAD;
}
recv_q_len = skb_queue_len(>sk_receive_queue);
if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
 if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
 return TIPC_ERR_OVERLOAD;
}
===


2.1.17. setsockopt() of  TIPC 2.0 Programmer's Guide
===
TIPC_DEST_DROPPABLE
This option governs the handling of messages sent by the socket if the 
message cannot be delivered to its destination,
either because the receiver is congested or because the specified 
receiver does not exist. If enabled, the message is discarded; otherwise 
the message is returned to the sender.

By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM 
socket types, and enabled for SOCK_RDM and SOCK_DGRAM,
This arrangement ensures proper teardown of failed connections when 
connection-oriented data transfer is used, without increasing the 
complexity of connectionless data transfer.

TIPC_SRC_DROPPABLE
This option governs the handling of messages sent by the socket if link 
congestion occurs. If enabled, the message is discarded; otherwise the 
system queues the message for later transmission.
By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, and 
SOCK_RDM socket types (resulting in "reliable" data transfer), and 
enabled for SOCK_DGRAM (resulting in "unreliable" data transfer).
===

Now I will try to create OVERLOAD case and update you soon my latest 
observations.

-AVM

> Correcting this and adding an abort is not backward compatible as some 
> service already handle flow control in some way, only log when packages are 
> dropped.
> Regarding ticket #1960 there are other solutions than introducing flow 
> control in MDS, e.g. expose an option to the service to choose connection 
> oriented
> or connection less.
> The problem with dropped messages seems in one case related to, (by MDS), 
> intensive MDS logging.
>
> /Thanks HansN
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: den 23 augusti 2016 11:27
> To: Hans Nordebäck ; Anders Widell 
> ; mathi.naic...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>
> Hi HansN,
>
> It seems I am missing some thing , please allow me to under stand
>
> If I currently understand you observation :
>
> With current Opensaf code ( this #1957 patch NOT applied ) , by default 
> TIPC_DROPPABLE=true ,while running Opensaf with that binary when 
> TIPC_ERR_OVERLOAD  occurring, TIPC is not  given errors TIPC_ERRINFO or  
> TIPC_RETDATA and following code is not being get hit of function 
> recvfrom_connectionless(), is my  understanding right ?
>
> 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-23 Thread Hans Nordebäck
Hi Mahesh,

Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may drop 
messages silently,  at receive sock buffer full condition,  but do not return 
any ancillary message.
If TIPC_DROPPABLE = false tipc may drop message but will send an ancillary 
message to inform about TIPC_ERR_OVERLOAD.
Correcting this and adding an abort is not backward compatible as some service 
already handle flow control in some way, only log when packages are dropped.
Regarding ticket #1960 there are other solutions than introducing flow control 
in MDS, e.g. expose an option to the service to choose connection oriented
or connection less.
The problem with dropped messages seems in one case related to, (by MDS), 
intensive MDS logging.

/Thanks HansN
-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: den 23 augusti 2016 11:27
To: Hans Nordebäck ; Anders Widell 
; mathi.naic...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

Hi HansN,

It seems I am missing some thing , please allow me to under stand

If I currently understand you observation :

With current Opensaf code ( this #1957 patch NOT applied ) , by default 
TIPC_DROPPABLE=true ,while running Opensaf with that binary when 
TIPC_ERR_OVERLOAD  occurring, TIPC is not  given errors TIPC_ERRINFO or  
TIPC_RETDATA and following code is not being get hit of function 
recvfrom_connectionless(), is my  understanding right ?

=

*if (anc->cmsg_type == TIPC_ERRINFO) {*
 /* TIPC_ERRINFO - TIPC error code associated with a returned data message 
or a connection termination message  so abort */
 m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
data: TIPC_ERRINFO abort err :%s", strerror(errno) );
*abort();*
*} else if (anc->cmsg_type == TIPC_RETDATA) {*
 /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return 
rejected messages to the sender )
we will hit this when we implement MDS retransmit lost messages abort 
can be replaced with flow control logic*/
 for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
 m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
 cptr++;
 }
 /* TIPC_RETDATA -The contents of a returned data message  so abort */
 m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
data: TIPC_RETDATA abort err :%s", strerror(errno) );
*abort();*
}

=

-AVM


On 8/23/2016 1:08 PM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> Please see response below with [HansN] /Thanks HansN
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: den 23 augusti 2016 08:25
> To: Hans Nordebäck ; Anders Widell 
> ; mathi.naic...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>
> Hi HansN
>
> Please see response below with [AVM]
>
> -AVM
>
> On 8/23/2016 11:41 AM, Hans Nordebäck wrote:
>> Hi Mahesh,
>>
>> please see comments below.
>>
>> /Thanks HansN
>>
>>
>> On 08/23/2016 07:21 AM, A V Mahesh wrote:
>>> Hi HansN,
>>>
>>> Let us fist discuss the error handling and abort, then we can come 
>>> back to interpretation of  TIPC currently  does permit  OR does not 
>>> permit an application to send a multicast message with the 
>>> "destination droppable" setting disabled.
>>>
>>> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return 
>>> an undelivered multicast message to its sender and we can  determine 
>>> issue is  because of TIPC_ERR_OVERLOAD, this helps in debugging , so 
>>> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the 
>>> problem.
>>>
>>> But still we need to abort(), the reason for that is current MDS 
>>> implementations doesn't have flow control logic ( no retry because 
>>> of error ) , so Application like AMF can go wrong and cluster will 
>>> go into unstable/recoverble state.
>>>
>> [HansN] In the current implementation messages are dropped silently 
>> and no abort is done.
> [AVM]  I can see  abort(); in current code , you mean abort(); is not working 
> and application(amf) is not existing ?
> [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, 
> (TIPC_ERR_OVERLOAD)  no abort is be performed, e.g amfd detects this in the 
> msg sanity chk and logs "invalid msg id ..."
> ==
> ==
> if (anc->cmsg_type == TIPC_ERRINFO) {
>   /* TIPC_ERRINFO - TIPC error code associated with a returned data 
> message or a connection termination message  so abort */
>   m_MDS_LOG_CRITICAL("MDTM: undelivered message condition 
> 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-23 Thread A V Mahesh
Hi HansN,

It seems I am missing some thing , please allow me to under stand

If I currently understand you observation :

With current Opensaf code ( this #1957 patch NOT applied ) , by default  
TIPC_DROPPABLE=true ,while running Opensaf with that binary
when TIPC_ERR_OVERLOAD  occurring, TIPC is not  given errors 
TIPC_ERRINFO or  TIPC_RETDATA and following code is not being get hit
of function recvfrom_connectionless(), is my  understanding right ?

=

*if (anc->cmsg_type == TIPC_ERRINFO) {*
 /* TIPC_ERRINFO - TIPC error code associated with a returned data 
message or a connection termination message  so abort */
 m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary 
data: TIPC_ERRINFO abort err :%s", strerror(errno) );
*abort();*
*} else if (anc->cmsg_type == TIPC_RETDATA) {*
 /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to 
return rejected messages to the sender )
we will hit this when we implement MDS retransmit lost messages  
abort can be replaced with flow control logic*/
 for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
 m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
 cptr++;
 }
 /* TIPC_RETDATA -The contents of a returned data message  so abort */
 m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary 
data: TIPC_RETDATA abort err :%s", strerror(errno) );
*abort();*
}

=

-AVM


On 8/23/2016 1:08 PM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> Please see response below with [HansN]
> /Thanks HansN
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: den 23 augusti 2016 08:25
> To: Hans Nordebäck ; Anders Widell 
> ; mathi.naic...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>
> Hi HansN
>
> Please see response below with [AVM]
>
> -AVM
>
> On 8/23/2016 11:41 AM, Hans Nordebäck wrote:
>> Hi Mahesh,
>>
>> please see comments below.
>>
>> /Thanks HansN
>>
>>
>> On 08/23/2016 07:21 AM, A V Mahesh wrote:
>>> Hi HansN,
>>>
>>> Let us fist discuss the error handling and abort, then we can come
>>> back to interpretation of  TIPC currently  does permit  OR does not
>>> permit an application to send a multicast message with the
>>> "destination droppable" setting disabled.
>>>
>>> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return
>>> an undelivered multicast message to its sender and we can  determine
>>> issue is  because of TIPC_ERR_OVERLOAD, this helps in debugging , so
>>> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the
>>> problem.
>>>
>>> But still we need to abort(), the reason for that is current MDS
>>> implementations doesn't have flow control logic ( no retry because of
>>> error ) , so Application like AMF can go wrong and cluster will go
>>> into unstable/recoverble state.
>>>
>> [HansN] In the current implementation messages are dropped silently
>> and no abort is done.
> [AVM]  I can see  abort(); in current code , you mean abort(); is not working 
> and application(amf) is not existing ?
> [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, 
> (TIPC_ERR_OVERLOAD)  no abort is be performed, e.g
> amfd detects this in the msg sanity chk and logs "invalid msg id ..."
> 
> if (anc->cmsg_type == TIPC_ERRINFO) {
>   /* TIPC_ERRINFO - TIPC error code associated with a returned data 
> message or a connection termination message  so abort */
>   m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
> data: TIPC_ERRINFO abort err :%s", strerror(errno) );
> *abort();*
> } else if (anc->cmsg_type == TIPC_RETDATA) {
>   /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return 
> rejected messages to the sender )
>  we will hit this when we implement MDS retransmit lost messages 
> abort can be replaced with flow control logic*/
>   for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
>   m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
>   cptr++;
>   }
>   /* TIPC_RETDATA -The contents of a returned data message  so abort */
>   m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
> data: TIPC_RETDATA abort err :%s", strerror(errno) );
> *abort();*
> }
> 
>> This patch enables logging
>> when packages are dropped to help in debugging. I don't agree that we
>> should also introduce abort, but instead:
>> 1) Implement a solution to handle dropped packages, ticket #1960
> [AVM]  This is nothing but flow control implementation in 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-23 Thread Hans Nordebäck
Hi Mahesh,

Please see response below with [HansN]
/Thanks HansN

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: den 23 augusti 2016 08:25
To: Hans Nordebäck ; Anders Widell 
; mathi.naic...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

Hi HansN

Please see response below with [AVM]

-AVM

On 8/23/2016 11:41 AM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> please see comments below.
>
> /Thanks HansN
>
>
> On 08/23/2016 07:21 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> Let us fist discuss the error handling and abort, then we can come 
>> back to interpretation of  TIPC currently  does permit  OR does not 
>> permit an application to send a multicast message with the 
>> "destination droppable" setting disabled.
>>
>> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return 
>> an undelivered multicast message to its sender and we can  determine 
>> issue is  because of TIPC_ERR_OVERLOAD, this helps in debugging , so 
>> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the 
>> problem.
>>
>> But still we need to abort(), the reason for that is current MDS 
>> implementations doesn't have flow control logic ( no retry because of 
>> error ) , so Application like AMF can go wrong and cluster will go 
>> into unstable/recoverble state.
>>
> [HansN] In the current implementation messages are dropped silently 
> and no abort is done.
[AVM]  I can see  abort(); in current code , you mean abort(); is not working 
and application(amf) is not existing ?
[HansN] In case of TIPC_DROPPABLE=true and messages are dropped, 
(TIPC_ERR_OVERLOAD)  no abort is be performed, e.g 
amfd detects this in the msg sanity chk and logs "invalid msg id ..." 

if (anc->cmsg_type == TIPC_ERRINFO) {
 /* TIPC_ERRINFO - TIPC error code associated with a returned data message 
or a connection termination message  so abort */
 m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
data: TIPC_ERRINFO abort err :%s", strerror(errno) );
*abort();*
} else if (anc->cmsg_type == TIPC_RETDATA) {
 /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return 
rejected messages to the sender )
we will hit this when we implement MDS retransmit lost messages abort 
can be replaced with flow control logic*/
 for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
 m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
 cptr++;
 }
 /* TIPC_RETDATA -The contents of a returned data message  so abort */
 m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
data: TIPC_RETDATA abort err :%s", strerror(errno) );
*abort();*
}

> This patch enables logging
> when packages are dropped to help in debugging. I don't agree that we 
> should also introduce abort, but instead:
> 1) Implement a solution to handle dropped packages, ticket #1960
[AVM]  This is nothing but flow control implementation in MDS, this is future 
enhancement

> 2) Investigate why packages may be dropped, the receiving MDS thread 
> is a real time thread and should be able to consume a large amount of 
> incoming messages.
> E.g. is the receiving MDS thread "live hanging" due to locks, file I/O 
> etc?
>> This was the reason we haven't gone for it while addressing Ticket
>> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/)
>> So currently we don't have any advantage of disabling 
>> TIPC_DEST_DROPPABLE and not allowing multicast  messages.
>>
>> -AVM
>>
>>
>> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>>   osaf/libs/core/mds/mds_dt_tipc.c |  32
>>> +---
>>>   1 files changed, 25 insertions(+), 7 deletions(-)
>>>
>>>
>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c
>>> b/osaf/libs/core/mds/mds_dt_tipc.c
>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>>   m_MDS_LOG_INFO("MDTM: Successfully set default 
>>> socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>>   }
>>>   +int droppable = 0;
>>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC,
>>> TIPC_DEST_DROPPABLE, , sizeof(droppable)) != 0) {
>>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero
>>> err :%s\n", strerror(errno));
>>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE
>>> to zero err :%s\n", strerror(errno));
>>> +osafassert(0);
>>> +} else {
>>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set
>>> TIPC_DEST_DROPPABLE to zero");
>>> +}
>>> +
>>>   return NCSCC_RC_SUCCESS;
>>>   }
>>>   @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>>   unsigned 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-23 Thread Anders Widell
Hi!

I don't think the sender would need to unconditionally abort() down in 
the MDS layer when it gets back an undelivered message from TIPC. We 
have the message loss callback in MDS, which can be used by the receiver 
to detect lost messages. The receiver can take an appropriate action 
when it receives this callback. If the appropriate action is to restart 
the sender, then the receiver can inform the sender about the message 
loss so that the sender can restart itself.

regards,

Anders Widell

On 08/23/2016 07:21 AM, A V Mahesh wrote:
> Hi HansN,
>
> Let us fist discuss the error handling and abort, then we can come 
> back to
> interpretation of  TIPC currently  does permit  OR does not permit an 
> application to send
> a multicast message with the "destination droppable" setting disabled.
>
> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return 
> an undelivered multicast message to its sender
> and we can  determine issue is  because of TIPC_ERR_OVERLOAD, this 
> helps in debugging ,
> so that application may increased SO_SNDBUF/SO_RCVBUF to reduce the 
> problem.
>
> But still we need to abort(), the reason for that is current MDS 
> implementations doesn't
> have flow control logic ( no retry because of error ) , so Application 
> like AMF can go wrong and cluster will go into unstable/recoverble state.
>
> This was the reason we haven't gone for it while addressing Ticket 
> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/)
> So currently we don't have any advantage of disabling 
> TIPC_DEST_DROPPABLE and not allowing multicast  messages.
>
> -AVM
>
>
> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>   osaf/libs/core/mds/mds_dt_tipc.c |  32 
>> +---
>>   1 files changed, 25 insertions(+), 7 deletions(-)
>>
>>
>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c 
>> b/osaf/libs/core/mds/mds_dt_tipc.c
>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>   m_MDS_LOG_INFO("MDTM: Successfully set default 
>> socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>   }
>>   +int droppable = 0;
>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, 
>> TIPC_DEST_DROPPABLE, , sizeof(droppable)) != 0) {
>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero 
>> err :%s\n", strerror(errno));
>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE 
>> to zero err :%s\n", strerror(errno));
>> +osafassert(0);
>> +} else {
>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set 
>> TIPC_DEST_DROPPABLE to zero");
>> +}
>> +
>>   return NCSCC_RC_SUCCESS;
>>   }
>>   @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>   unsigned char *cptr;
>>   int i;
>>   int has_addr;
>> +int anc_data[2];
>> +
>>   ssize_t sz;
>> has_addr = (from != NULL) && (addrlen != NULL);
>> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>  if the message was sent using a TIPC name or name 
>> sequence as the
>>  destination rather than a TIPC port ID So abort for 
>> TIPC_ERRINFO and TIPC_RETDATA*/
>>   if (anc->cmsg_type == TIPC_ERRINFO) {
>> -/* TIPC_ERRINFO - TIPC error code associated with a 
>> returned data message or a connection termination message  so abort */
>> -m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>> condition ancillary data: TIPC_ERRINFO abort err :%s", 
>> strerror(errno) );
>> -abort();
>> +anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0));
>> +if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>> +LOG_CR("MDTM: undelivered message condition 
>> ancillary data: TIPC_ERR_OVERLOAD");
>> +m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>> condition ancillary data: TIPC_ERR_OVERLOAD");
>> +} else {
>> +/* TIPC_ERRINFO - TIPC error code associated 
>> with a returned data message or a connection termination message  so 
>> abort */
>> +LOG_CR("MDTM: undelivered message condition 
>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
>> +m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>> condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
>> +}
>>   } else if (anc->cmsg_type == TIPC_RETDATA) {
>> -/* If we set TIPC_DEST_DROPPABLE off messge 
>> (configure TIPC to return rejected messages to the sender )
>> +/* If we set TIPC_DEST_DROPPABLE off message 
>> (configure TIPC to return rejected messages to the sender )
>>  we will hit this when we implement MDS 
>> retransmit lost messages  abort can be replaced with flow control 
>> logic*/
>>   for (i = anc->cmsg_len - 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-23 Thread A V Mahesh
Hi HansN

Please see response below with [AVM]

-AVM

On 8/23/2016 11:41 AM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> please see comments below.
>
> /Thanks HansN
>
>
> On 08/23/2016 07:21 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> Let us fist discuss the error handling and abort, then we can come 
>> back to
>> interpretation of  TIPC currently  does permit  OR does not permit an 
>> application to send
>> a multicast message with the "destination droppable" setting disabled.
>>
>> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return 
>> an undelivered multicast message to its sender
>> and we can  determine issue is  because of TIPC_ERR_OVERLOAD, this 
>> helps in debugging ,
>> so that application may increased SO_SNDBUF/SO_RCVBUF to reduce the 
>> problem.
>>
>> But still we need to abort(), the reason for that is current MDS 
>> implementations doesn't
>> have flow control logic ( no retry because of error ) , so 
>> Application like AMF can go wrong and cluster will go into 
>> unstable/recoverble state.
>>
> [HansN] In the current implementation messages are dropped silently 
> and no abort is done. 
[AVM]  I can see  abort(); in current code , you mean abort(); is not 
working and application(amf) is not existing ?

if (anc->cmsg_type == TIPC_ERRINFO) {
 /* TIPC_ERRINFO - TIPC error code associated with a returned data 
message or a connection termination message  so abort */
 m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary 
data: TIPC_ERRINFO abort err :%s", strerror(errno) );
*abort();*
} else if (anc->cmsg_type == TIPC_RETDATA) {
 /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to 
return rejected messages to the sender )
we will hit this when we implement MDS retransmit lost messages  
abort can be replaced with flow control logic*/
 for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
 m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
 cptr++;
 }
 /* TIPC_RETDATA -The contents of a returned data message  so abort */
 m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary 
data: TIPC_RETDATA abort err :%s", strerror(errno) );
*abort();*
}

> This patch enables logging
> when packages are dropped to help in debugging. I don't agree that we 
> should also introduce abort, but instead:
> 1) Implement a solution to handle dropped packages, ticket #1960
[AVM]  This is nothing but flow control implementation in MDS, this is 
future enhancement

> 2) Investigate why packages may be dropped, the receiving MDS thread 
> is a real time thread and should be able to consume a large amount of 
> incoming messages.
> E.g. is the receiving MDS thread "live hanging" due to locks, file I/O 
> etc?
>> This was the reason we haven't gone for it while addressing Ticket 
>> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/)
>> So currently we don't have any advantage of disabling 
>> TIPC_DEST_DROPPABLE and not allowing multicast  messages.
>>
>> -AVM
>>
>>
>> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>>   osaf/libs/core/mds/mds_dt_tipc.c |  32 
>>> +---
>>>   1 files changed, 25 insertions(+), 7 deletions(-)
>>>
>>>
>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c 
>>> b/osaf/libs/core/mds/mds_dt_tipc.c
>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>>   m_MDS_LOG_INFO("MDTM: Successfully set default 
>>> socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>>   }
>>>   +int droppable = 0;
>>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, 
>>> TIPC_DEST_DROPPABLE, , sizeof(droppable)) != 0) {
>>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero 
>>> err :%s\n", strerror(errno));
>>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE 
>>> to zero err :%s\n", strerror(errno));
>>> +osafassert(0);
>>> +} else {
>>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set 
>>> TIPC_DEST_DROPPABLE to zero");
>>> +}
>>> +
>>>   return NCSCC_RC_SUCCESS;
>>>   }
>>>   @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>>   unsigned char *cptr;
>>>   int i;
>>>   int has_addr;
>>> +int anc_data[2];
>>> +
>>>   ssize_t sz;
>>> has_addr = (from != NULL) && (addrlen != NULL);
>>> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>>  if the message was sent using a TIPC name or name 
>>> sequence as the
>>>  destination rather than a TIPC port ID So abort for 
>>> TIPC_ERRINFO and TIPC_RETDATA*/
>>>   if (anc->cmsg_type == TIPC_ERRINFO) {
>>> -/* TIPC_ERRINFO - TIPC error code associated with a 
>>> 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-23 Thread Hans Nordebäck
Hi Mahesh,

please see comments below.

/Thanks HansN


On 08/23/2016 07:21 AM, A V Mahesh wrote:
> Hi HansN,
>
> Let us fist discuss the error handling and abort, then we can come 
> back to
> interpretation of  TIPC currently  does permit  OR does not permit an 
> application to send
> a multicast message with the "destination droppable" setting disabled.
>
> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return 
> an undelivered multicast message to its sender
> and we can  determine issue is  because of TIPC_ERR_OVERLOAD, this 
> helps in debugging ,
> so that application may increased SO_SNDBUF/SO_RCVBUF to reduce the 
> problem.
>
> But still we need to abort(), the reason for that is current MDS 
> implementations doesn't
> have flow control logic ( no retry because of error ) , so Application 
> like AMF can go wrong and cluster will go into unstable/recoverble state.
>
[HansN] In the current implementation messages are dropped silently and 
no abort is done. This patch enables logging
when packages are dropped to help in debugging. I don't agree that we 
should also introduce abort, but instead:
1) Implement a solution to handle dropped packages, ticket #1960
2) Investigate why packages may be dropped, the receiving MDS thread is 
a real time thread and should be able to consume a large amount of 
incoming messages.
E.g. is the receiving MDS thread "live hanging" due to locks, file I/O etc?
> This was the reason we haven't gone for it while addressing Ticket 
> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/)
> So currently we don't have any advantage of disabling 
> TIPC_DEST_DROPPABLE and not allowing multicast  messages.
>
> -AVM
>
>
> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>   osaf/libs/core/mds/mds_dt_tipc.c |  32 
>> +---
>>   1 files changed, 25 insertions(+), 7 deletions(-)
>>
>>
>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c 
>> b/osaf/libs/core/mds/mds_dt_tipc.c
>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>   m_MDS_LOG_INFO("MDTM: Successfully set default 
>> socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>   }
>>   +int droppable = 0;
>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, 
>> TIPC_DEST_DROPPABLE, , sizeof(droppable)) != 0) {
>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero 
>> err :%s\n", strerror(errno));
>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE 
>> to zero err :%s\n", strerror(errno));
>> +osafassert(0);
>> +} else {
>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set 
>> TIPC_DEST_DROPPABLE to zero");
>> +}
>> +
>>   return NCSCC_RC_SUCCESS;
>>   }
>>   @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>   unsigned char *cptr;
>>   int i;
>>   int has_addr;
>> +int anc_data[2];
>> +
>>   ssize_t sz;
>> has_addr = (from != NULL) && (addrlen != NULL);
>> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>  if the message was sent using a TIPC name or name 
>> sequence as the
>>  destination rather than a TIPC port ID So abort for 
>> TIPC_ERRINFO and TIPC_RETDATA*/
>>   if (anc->cmsg_type == TIPC_ERRINFO) {
>> -/* TIPC_ERRINFO - TIPC error code associated with a 
>> returned data message or a connection termination message  so abort */
>> -m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>> condition ancillary data: TIPC_ERRINFO abort err :%s", 
>> strerror(errno) );
>> -abort();
>> +anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0));
>> +if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>> +LOG_CR("MDTM: undelivered message condition 
>> ancillary data: TIPC_ERR_OVERLOAD");
>> +m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>> condition ancillary data: TIPC_ERR_OVERLOAD");
>> +} else {
>> +/* TIPC_ERRINFO - TIPC error code associated 
>> with a returned data message or a connection termination message  so 
>> abort */
>> +LOG_CR("MDTM: undelivered message condition 
>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
>> +m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>> condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
>> +}
>>   } else if (anc->cmsg_type == TIPC_RETDATA) {
>> -/* If we set TIPC_DEST_DROPPABLE off messge 
>> (configure TIPC to return rejected messages to the sender )
>> +/* If we set TIPC_DEST_DROPPABLE off message 
>> (configure TIPC to return rejected messages to the sender )
>>  we will hit this when we implement MDS 
>> retransmit lost messages  abort can be 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-22 Thread A V Mahesh
Hi HansN,

Let us fist discuss the error handling and abort, then we can come back to
interpretation of  TIPC currently  does permit  OR does not permit an 
application to send
a multicast message with the "destination droppable" setting disabled.

Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return an 
undelivered multicast message to its sender
and we can  determine issue is  because of TIPC_ERR_OVERLOAD, this helps 
in debugging ,
so that application may increased SO_SNDBUF/SO_RCVBUF to reduce the problem.

But still we need to abort(), the reason for that is current MDS 
implementations doesn't
have flow control logic ( no retry because of error ) , so Application 
like AMF can go wrong and cluster will go into unstable/recoverble state.

This was the reason we haven't gone for it while addressing Ticket #1227 
(https://sourceforge.net/p/opensaf/mailman/message/33207717/)
So currently we don't have any advantage of disabling 
TIPC_DEST_DROPPABLE and not allowing multicast  messages.

-AVM


On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>   osaf/libs/core/mds/mds_dt_tipc.c |  32 +---
>   1 files changed, 25 insertions(+), 7 deletions(-)
>
>
> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c 
> b/osaf/libs/core/mds/mds_dt_tipc.c
> --- a/osaf/libs/core/mds/mds_dt_tipc.c
> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>   m_MDS_LOG_INFO("MDTM: Successfully set default socket 
> option TIPC_IMP = %d", TIPCIMPORTANCE);
>   }
>   
> +int droppable = 0;
> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, TIPC_DEST_DROPPABLE, 
> , sizeof(droppable)) != 0) {
> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero err 
> :%s\n", strerror(errno));
> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE to zero 
> err :%s\n", strerror(errno));
> +osafassert(0);
> +} else {
> +m_MDS_LOG_NOTIFY("MDTM: Successfully set TIPC_DEST_DROPPABLE 
> to zero");
> +}
> +
>   return NCSCC_RC_SUCCESS;
>   }
>   
> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>   unsigned char *cptr;
>   int i;
>   int has_addr;
> + int anc_data[2];
> +
>   ssize_t sz;
>   
>   has_addr = (from != NULL) && (addrlen != NULL);
> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>  if the message was sent using a TIPC name or name 
> sequence as the
>  destination rather than a TIPC port ID So abort for 
> TIPC_ERRINFO and TIPC_RETDATA*/
>   if (anc->cmsg_type == TIPC_ERRINFO) {
> - /* TIPC_ERRINFO - TIPC error code associated 
> with a returned data message or a connection termination message  so abort */
> - m_MDS_LOG_CRITICAL("MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) );
> - abort();
> + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) 
> + 0));
> + if (anc_data[0] == TIPC_ERR_OVERLOAD) {
> + LOG_CR("MDTM: undelivered message 
> condition ancillary data: TIPC_ERR_OVERLOAD");
> + m_MDS_LOG_CRITICAL("MDTM: undelivered 
> message condition ancillary data: TIPC_ERR_OVERLOAD");
> + } else {
> + /* TIPC_ERRINFO - TIPC error code 
> associated with a returned data message or a connection termination message  
> so abort */
> + LOG_CR("MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
> + m_MDS_LOG_CRITICAL("MDTM: undelivered 
> message condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
> + }
>   } else if (anc->cmsg_type == TIPC_RETDATA) {
> - /* If we set TIPC_DEST_DROPPABLE off messge 
> (configure TIPC to return rejected messages to the sender )
> + /* If we set TIPC_DEST_DROPPABLE off message 
> (configure TIPC to return rejected messages to the sender )
>  we will hit this when we implement MDS 
> retransmit lost messages  abort can be replaced with flow control logic*/
>   for (i = anc->cmsg_len - sizeof(*anc); i > 0; 
> i--) {
> - m_MDS_LOG_DBG("MDTM: returned byte 
> 0x%02x\n", *cptr);
> + LOG_CR("MDTM: returned byte 0x%02x\n", 
> *cptr);
> + m_MDS_LOG_CRITICAL("MDTM: returned byte 
> 0x%02x\n", *cptr);
>   cptr++;
>   }
>   

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-19 Thread A V Mahesh
Ok

I need some time to  re-check the TIPC code , will back to you soon.

-AVM


On 8/19/2016 1:46 PM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> there is a problem that TIPC may silently drop messages at overload 
> situations, as MDS uses the SOCK_RDM option.
>
> At least it has to be logged when messages are dropped. It is allowed 
> in TIPC to set TIPC_DROPPABLE=false and also
>
> use multicast. The concern may be that the send buffer size may also 
> be overloaded at receive buffer full,
>
> as the ancillary message has to be  sent, this case is though very 
> unlikely.
>
> I'll update the patch with the logging of the returned message 
> removed, and only log that a message has been dropped, which
>
> should be enough for debugging purposes.
>
> /Thanks HansN
>
>
>
> On 08/18/2016 11:27 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> It seem you missed to see below :
>>
>> On 8/12/2016 9:11 AM, A V Mahesh wrote:
>>> Hi HansN,
>>>
>>> We were having ticket for this  raised by  Hans Feldt 
>>> `https://sourceforge.net/p/opensaf/tickets/634/`
>>>
>>> at that time i have give my analysis base the MDS code at that time 
>>> as below please check.
>>>
>>> 
>>>  
>>>
>>>
>>> The Linux TIPC 2.0 Programmer's Guide in section 1.5.7. Multicast 
>>> Message Delivery mention that.
>>>
>>> The TIPC currently does not permit an application to send a 
>>> multicast message with the "destination droppable" setting disabled.
>>> Consequently, TIPC will never try to return an undeliverable 
>>> multicast message to its sender.
>>>
>>> so if we set destination droppable disabled , multicast is not 
>>> permitted
>>> I experimented setting TIPC_DEST_DROPPABLE=off in multicast_demo and 
>>> observed that multicast is working
>>>
>>> As if The Opensaf using multicast , it is not allowed to set 
>>> TIPC_DEST_DROPPABLE=off
>>>
>>> ==
>>>  
>>
>>
>>
>> So the TIPC_DEST_DROPPABLE should be enabled only if 
>> MDS_TIPC_MCAST_ENABLED is disabled,
>> currently  by default TIPC Multicast Messaging Setting  enabled 
>> (MDS_TIPC_MCAST_ENABLED =1 )
>> in /etc/opensaf/nid.conf , if TIPC Multicast Messagingis disabled we 
>> can set  TIPC_DEST_DROPPABLE
>> dynamically.
>>
>> == 
>>
>>
>> # This is valid when above MDS_TRANSPORT is set to TIPC.
>> # Setting MDS_TIPC_MCAST_ENABLED to 1 or 0, allows OpenSAF
>> # to enable or disable TIPC Multicast Messaging.
>> # By Default TIPC  Multicast Messaging is Enabled.
>> # Note: In case of TIPC Multicast Messaging disabled (0), the 
>> performance
>> # of OpenSAF will be considerably lower as compared to Enabled (1).
>> export MDS_TIPC_MCAST_ENABLED=1
>>
>> == 
>>
>>
>> -AVM
>>
>>
>> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>>   osaf/libs/core/mds/mds_dt_tipc.c |  32 
>>> +---
>>>   1 files changed, 25 insertions(+), 7 deletions(-)
>>>
>>>
>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c 
>>> b/osaf/libs/core/mds/mds_dt_tipc.c
>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>>   m_MDS_LOG_INFO("MDTM: Successfully set default 
>>> socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>>   }
>>>   +int droppable = 0;
>>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, 
>>> TIPC_DEST_DROPPABLE, , sizeof(droppable)) != 0) {
>>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero 
>>> err :%s\n", strerror(errno));
>>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE 
>>> to zero err :%s\n", strerror(errno));
>>> +osafassert(0);
>>> +} else {
>>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set 
>>> TIPC_DEST_DROPPABLE to zero");
>>> +}
>>> +
>>>   return NCSCC_RC_SUCCESS;
>>>   }
>>>   @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>>   unsigned char *cptr;
>>>   int i;
>>>   int has_addr;
>>> +int anc_data[2];
>>> +
>>>   ssize_t sz;
>>> has_addr = (from != NULL) && (addrlen != NULL);
>>> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>>  if the message was sent using a TIPC name or name 
>>> sequence as the
>>>  destination rather than a TIPC port ID So abort for 
>>> TIPC_ERRINFO and TIPC_RETDATA*/
>>>   if (anc->cmsg_type == TIPC_ERRINFO) {
>>> -/* TIPC_ERRINFO - TIPC error code associated with a 
>>> returned data message or a connection termination message  so abort */
>>> -m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>>> condition ancillary data: TIPC_ERRINFO abort err :%s", 
>>> strerror(errno) );
>>> 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-19 Thread Hans Nordebäck
Hi Mahesh,

there is a problem that TIPC may silently drop messages at overload 
situations, as MDS uses the SOCK_RDM option.

At least it has to be logged when messages are dropped. It is allowed in 
TIPC to set TIPC_DROPPABLE=false and also

use multicast. The concern may be that the send buffer size may also be 
overloaded at receive buffer full,

as the ancillary message has to be  sent, this case is though very unlikely.

I'll update the patch with the logging of the returned message removed, 
and only log that a message has been dropped, which

should be enough for debugging purposes.

/Thanks HansN



On 08/18/2016 11:27 AM, A V Mahesh wrote:
> Hi HansN,
>
> It seem you missed to see below :
>
> On 8/12/2016 9:11 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> We were having ticket for this  raised by  Hans Feldt 
>> `https://sourceforge.net/p/opensaf/tickets/634/`
>>
>> at that time i have give my analysis base the MDS code at that time 
>> as below please check.
>>
>> 
>>  
>>
>>
>> The Linux TIPC 2.0 Programmer's Guide in section 1.5.7. Multicast 
>> Message Delivery mention that.
>>
>> The TIPC currently does not permit an application to send a multicast 
>> message with the "destination droppable" setting disabled.
>> Consequently, TIPC will never try to return an undeliverable 
>> multicast message to its sender.
>>
>> so if we set destination droppable disabled , multicast is not permitted
>> I experimented setting TIPC_DEST_DROPPABLE=off in multicast_demo and 
>> observed that multicast is working
>>
>> As if The Opensaf using multicast , it is not allowed to set 
>> TIPC_DEST_DROPPABLE=off
>>
>> ==
>>  
>
>
> So the TIPC_DEST_DROPPABLE should be enabled only if 
> MDS_TIPC_MCAST_ENABLED is disabled,
> currently  by default TIPC Multicast Messaging Setting  enabled 
> (MDS_TIPC_MCAST_ENABLED =1 )
> in /etc/opensaf/nid.conf , if TIPC Multicast Messagingis disabled we 
> can set  TIPC_DEST_DROPPABLE
> dynamically.
>
> == 
>
>
> # This is valid when above MDS_TRANSPORT is set to TIPC.
> # Setting MDS_TIPC_MCAST_ENABLED to 1 or 0, allows OpenSAF
> # to enable or disable TIPC Multicast Messaging.
> # By Default TIPC  Multicast Messaging is Enabled.
> # Note: In case of TIPC Multicast Messaging disabled (0), the performance
> # of OpenSAF will be considerably lower as compared to Enabled (1).
> export MDS_TIPC_MCAST_ENABLED=1
>
> == 
>
>
> -AVM
>
>
> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>   osaf/libs/core/mds/mds_dt_tipc.c |  32 
>> +---
>>   1 files changed, 25 insertions(+), 7 deletions(-)
>>
>>
>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c 
>> b/osaf/libs/core/mds/mds_dt_tipc.c
>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>   m_MDS_LOG_INFO("MDTM: Successfully set default 
>> socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>   }
>>   +int droppable = 0;
>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, 
>> TIPC_DEST_DROPPABLE, , sizeof(droppable)) != 0) {
>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero 
>> err :%s\n", strerror(errno));
>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE 
>> to zero err :%s\n", strerror(errno));
>> +osafassert(0);
>> +} else {
>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set 
>> TIPC_DEST_DROPPABLE to zero");
>> +}
>> +
>>   return NCSCC_RC_SUCCESS;
>>   }
>>   @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>   unsigned char *cptr;
>>   int i;
>>   int has_addr;
>> +int anc_data[2];
>> +
>>   ssize_t sz;
>> has_addr = (from != NULL) && (addrlen != NULL);
>> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>  if the message was sent using a TIPC name or name 
>> sequence as the
>>  destination rather than a TIPC port ID So abort for 
>> TIPC_ERRINFO and TIPC_RETDATA*/
>>   if (anc->cmsg_type == TIPC_ERRINFO) {
>> -/* TIPC_ERRINFO - TIPC error code associated with a 
>> returned data message or a connection termination message  so abort */
>> -m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>> condition ancillary data: TIPC_ERRINFO abort err :%s", 
>> strerror(errno) );
>> -abort();
>> +anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0));
>> +if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>> +LOG_CR("MDTM: undelivered message condition 
>> ancillary data: TIPC_ERR_OVERLOAD");
>> + 

Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

2016-08-18 Thread A V Mahesh
Hi HansN,

It seem you missed to see below :

On 8/12/2016 9:11 AM, A V Mahesh wrote:
> Hi HansN,
>
> We were having ticket for this  raised by  Hans Feldt 
> `https://sourceforge.net/p/opensaf/tickets/634/`
>
> at that time i have give my analysis base the MDS code at that time as 
> below please check.
>
> 
>  
>
>
> The Linux TIPC 2.0 Programmer's Guide in section 1.5.7. Multicast 
> Message Delivery mention that.
>
> The TIPC currently does not permit an application to send a multicast 
> message with the "destination droppable" setting disabled.
> Consequently, TIPC will never try to return an undeliverable multicast 
> message to its sender.
>
> so if we set destination droppable disabled , multicast is not permitted
> I experimented setting TIPC_DEST_DROPPABLE=off in multicast_demo and 
> observed that multicast is working
>
> As if The Opensaf using multicast , it is not allowed to set 
> TIPC_DEST_DROPPABLE=off
>
> ==
>  

So the TIPC_DEST_DROPPABLE should be enabled only if 
MDS_TIPC_MCAST_ENABLED is disabled,
currently  by default TIPC Multicast Messaging Setting  enabled 
(MDS_TIPC_MCAST_ENABLED =1 )
in /etc/opensaf/nid.conf , if TIPC Multicast Messagingis disabled we can 
set  TIPC_DEST_DROPPABLE
dynamically.

==

# This is valid when above MDS_TRANSPORT is set to TIPC.
# Setting MDS_TIPC_MCAST_ENABLED to 1 or 0, allows OpenSAF
# to enable or disable TIPC Multicast Messaging.
# By Default TIPC  Multicast Messaging is Enabled.
# Note: In case of TIPC Multicast Messaging disabled (0), the performance
# of OpenSAF will be considerably lower as compared to Enabled (1).
export MDS_TIPC_MCAST_ENABLED=1

==

-AVM


On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>   osaf/libs/core/mds/mds_dt_tipc.c |  32 +---
>   1 files changed, 25 insertions(+), 7 deletions(-)
>
>
> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c 
> b/osaf/libs/core/mds/mds_dt_tipc.c
> --- a/osaf/libs/core/mds/mds_dt_tipc.c
> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>   m_MDS_LOG_INFO("MDTM: Successfully set default socket 
> option TIPC_IMP = %d", TIPCIMPORTANCE);
>   }
>   
> +int droppable = 0;
> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, TIPC_DEST_DROPPABLE, 
> , sizeof(droppable)) != 0) {
> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero err 
> :%s\n", strerror(errno));
> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE to zero 
> err :%s\n", strerror(errno));
> +osafassert(0);
> +} else {
> +m_MDS_LOG_NOTIFY("MDTM: Successfully set TIPC_DEST_DROPPABLE 
> to zero");
> +}
> +
>   return NCSCC_RC_SUCCESS;
>   }
>   
> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>   unsigned char *cptr;
>   int i;
>   int has_addr;
> + int anc_data[2];
> +
>   ssize_t sz;
>   
>   has_addr = (from != NULL) && (addrlen != NULL);
> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>  if the message was sent using a TIPC name or name 
> sequence as the
>  destination rather than a TIPC port ID So abort for 
> TIPC_ERRINFO and TIPC_RETDATA*/
>   if (anc->cmsg_type == TIPC_ERRINFO) {
> - /* TIPC_ERRINFO - TIPC error code associated 
> with a returned data message or a connection termination message  so abort */
> - m_MDS_LOG_CRITICAL("MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) );
> - abort();
> + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) 
> + 0));
> + if (anc_data[0] == TIPC_ERR_OVERLOAD) {
> + LOG_CR("MDTM: undelivered message 
> condition ancillary data: TIPC_ERR_OVERLOAD");
> + m_MDS_LOG_CRITICAL("MDTM: undelivered 
> message condition ancillary data: TIPC_ERR_OVERLOAD");
> + } else {
> + /* TIPC_ERRINFO - TIPC error code 
> associated with a returned data message or a connection termination message  
> so abort */
> + LOG_CR("MDTM: undelivered message 
> condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
> + m_MDS_LOG_CRITICAL("MDTM: undelivered 
> message condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
> + }
>