Hi Peter,
I think the most sensible you could do here is to raise the priority of the 
sensitive sender.
This unfortunately requires a change to the application code (a new 
setsockopt), but it might be worth a try.
Which level is your sending socket using now?

///jon

> -----Original Message-----
> From: Peter Koss <[email protected]>
> Sent: September 20, 2018 12:46 PM
> To: Jon Maloy <[email protected]>; tipc-
> [email protected]
> Subject: RE: What affects congestion beyond window size, and what might
> have reduced congestion thresholds in TIPC 2.0.x?
> 
> Hi Jon,
> 
> To describe the problem, this is a performance matching issue to us,
> between (2.0.5 TIPC and Wind River Linux 6)  and (1.7.7 TIPC and Wind River
> Linux 3).  Our performance is much less under the former, something like half
> as much as the latter.
> 
> We see the sender getting EAGAIN using 150 window size (and 50 and 300),
> that's probably the core issue to us, which is happening at a much less busy
> processing rate than the older environment.   There was no link reset at 150
> noted, but the EAGAIN is happening at an unexpectedly low rate.   We use
> non-blocking socket flags for this data, on both old and new versions.   Our
> program is sensitive to small delays, hence the non-blocking choice.
> 
> I think we are most interested in any tuning available specific to 2.0.5 , or
> anything changed in TIPC to perhaps slow down these things.
> 
> Regards
> 
> Peter
> 
> 
> 
> 
> -----Original Message-----
> From: Jon Maloy <[email protected]>
> Sent: Thursday, September 20, 2018 11:11 AM
> To: Peter Koss <[email protected]>; tipc-
> [email protected]
> Subject: RE: What affects congestion beyond window size, and what might
> have reduced congestion thresholds in TIPC 2.0.x?
> 
> Hi Peter,
> See below.
> 
> > -----Original Message-----
> > From: Peter Koss <[email protected]>
> > Sent: September 20, 2018 11:31 AM
> > To: Jon Maloy <[email protected]>; tipc-
> > [email protected]
> > Subject: RE: What affects congestion beyond window size, and what
> > might have reduced congestion thresholds in TIPC 2.0.x?
> >
> >
> > Hi Jon,
> >
> > Again, thanks for thinking about this.  Kernel version :
> >      Linux a33ems1 3.10.62-ltsi-WR6.0.0.36_standard #1 SMP PREEMPT Mon
> 
> Ok. A pretty old kernel then.
> 
> > Aug 20 17:25:51 CDT 2018 x86_64 x86_64 x86_64 GNU/Linux
> >
> > The retransmission error (i.e., kind of catastrophic interruption)
> > only was noted upon experiments pushing the window size to 300-400, it
> > was not noted at lower window sizes.
> >
> > Our thoughts are around how to see evidence of retransmission going on
> > prior to that, or ways to see evidence of that in TIPC 2.0.5.
> 
> You use tipc-config to list the link statistics. There you will see the 
> amount of
> retransmissions, just as you can see the amount of congestions.
> 
> > compare to TIPC 1.7.7 under Wind River Linux 3 as a data point, but
> > care mostly about addressing it under 2.0.5 and Wind River Linux 6.
> >
> > This is not running a VM.   Not sure about your question on Remote
> > Procedure Call
> 
> Sorry, I did of course mean RPS (Receive Packet Steering). But in such an old
> kernel TIPC is not supporting that, so that cannot be the problem.
> 
> > activated, if there's a command to run or code construct to check I
> > could get that.
> 
> I still don't understand what it is you consider a problem. If you use a 
> window
> size of 150, you don't have the link reset, you say.
> So, what is it that is not working?
> 
> BR
> ///jon
> 
> >
> > Regards
> >
> > PK
> > -----Original Message-----
> > From: Jon Maloy <[email protected]>
> > Sent: Thursday, September 20, 2018 7:28 AM
> > To: Peter Koss <[email protected]>; tipc-
> > [email protected]
> > Subject: RE: What affects congestion beyond window size, and what
> > might have reduced congestion thresholds in TIPC 2.0.x?
> >
> > Hi Peter,
> > See my comments below.
> >
> > > -----Original Message-----
> > > From: Peter Koss <[email protected]>
> > > Sent: September 19, 2018 6:11 PM
> > > To: Jon Maloy <[email protected]>; tipc-
> > > [email protected]
> > > Subject: RE: What affects congestion beyond window size, and what
> > > might have reduced congestion thresholds in TIPC 2.0.x?
> > >
> > > Thanks for responding.
> > >
> > > There was code in TIPC 1.7x that gave some node receive queue
> > > inforHimation, but that is now obsolete in 2.0.x.  We are using
> > > socket receive calls to get data instead, which seems to suggest one
> > > of two
> > > problems: either the receive side queue is filling up and exceeding
> > > limits, or the ack back to the sender is having trouble.  We do see
> > > the
> > sender getting an errno=EAGAIN.
> > > Overall the performance levels we see are much less than with TIPC
> > > 1.7.x under Wind River Linux 3 than with TIPC 2.0.x under Wind River
> > > Linux
> > 6.
> >
> > Which Linux kernel version are you running?
> >
> > >
> > > case TIPC_NODE_RECVQ_DEPTH:
> > > +               value = (u32)atomic_read(&tipc_queue_size);     <== This 
> > > is
> > obsolete
> > > now, call occurs, we get 0.
> > > +               break;
> > > +       case TIPC_SOCK_RECVQ_DEPTH:
> > > +               value = skb_queue_len(&sk->sk_receive_queue);
> > > +               break;
> > >
> > >
> > > Questions we have currently:
> > > - What is the socket receive queue limit (default)?
> >
> > That depends  on the Linux version you are using. Prior to 4.6 it was
> > 64 Mb, in later versions it is 2 Mb, but with a much better flow control
> algorithm.
> >
> > > - Is it wise to try a window size > 150?
> >
> > I have never done it myself except for experimental purposes, but I
> > see no problem with it.
> > Do you have any particular reason to do so? Does it give significant
> > better throughput than  at 150 ?
> >
> > > - Is there a good way to control or influence the flow control
> > > sender/receiver coordination,
> > You can improve the window size to potentially improve link level
> > throughput, and you can increase sending socket importance priority to
> > reduce the risk of receive socket buffer overflow.
> >
> > > or a best way to adjust receive buffer limit?
> > If you want to change this you follow the instruction under 5.2 at the
> > following link:
> > http://tipc.sourceforge.net/programming.html#incr_rcvbuf
> > But I see no sign that buffer overflow is your problem.
> >
> > >
> > > For context, the first sign of errors shows up as congestion, where
> > > the max value will increase to slightly above whatever window size
> > > we set (50,150,300,400).
> > >
> > > pl0_1:~$ /usr/sbin/tipc-config -ls | grep "Send queue max"
> > >   Congestion link:0  Send queue max:2 avg:1
> > >   Congestion link:93121  Send queue max:162 avg:3
> > >   Congestion link:206724  Send queue max:164 avg:3
> > >   Congestion link:67839  Send queue max:167 avg:3
> > >   Congestion link:214788  Send queue max:166 avg:3
> > >   Congestion link:205240  Send queue max:165 avg:3
> > >   Congestion link:240955  Send queue max:166 avg:3
> > >   Congestion link:0  Send queue max:0 avg:0
> > >   Congestion link:0  Send queue max:1 avg:0
> > >   Congestion link:0  Send queue max:0 avg:0
> >
> > This is all normal and unproblematic. We allow an oversubscription of
> > one message (max 46 1500 byte packets) on each link to make the
> > algorithm simpler. So you will often find the max value higher than
> > the nominal upper limit.
> >
> > >
> > > The next following error occur only when the window size is high,
> > > 300-400, not seen at
> > > 50 or 150, so we think this may be extraneous to our issue.   It also 
> > > makes
> > us
> > > wonder
> > > whether going above 150 is wise, hence the question above.
> > >
> > > Sep 17 05:42:00 pl0_4 kernel: tipc: Retransmission failure on link
> > > <1.1.5:bond1-1.1.2:bond1> Sep 17 05:42:00 pl0_4 kernel: tipc:
> > > Resetting link
> >
> > This is your real problem. For some reason a packet has been
> > retransmitted
> > >100 times on a link without going through. Then the link is reset,
> > >and all
> > associated connections as well.
> > We have seen this happen for various reasons over the years, and fixed
> > them all.
> > Is possibly RPC activated on your receiving node?
> > Are you running a VM with a virtio interface? This one tends to be
> > overwhelmed sometimes and just stops sending for 30 seconds, something
> > leading to broken links.
> >
> > But again, it all depends on which kernel and environment you are running.
> > Please update me on this.
> >
> > BR
> > ///jon
> >
> > > Sep 17 05:42:00 pl0_4 kernel: Link 1001002<eth:bond1>::WW Sep 17
> > > 05:42:00
> > > pl0_4 kernel: tipc: Lost link <1.1.5:bond1-1.1.2:bond1> on network
> > > plane A Sep 17 05:42:00 pl0_4 kernel: tipc: Lost contact with
> > > <1.1.2> Sep 17 05:42:00
> > > pl0_10 kernel: tipc: Resetting link <1.1.2:bond1-1.1.5:bond1>,
> > > requested by peer Sep 17 05:42:00 pl0_10 kernel: tipc: Lost link
> > > <1.1.2:bond1-1.1.5:bond1> on network plane A
> > >
> > > Thanks in advance, advice is appreciated.
> > >
> > > PK
> > >
> > > -----Original Message-----
> > > From: Jon Maloy <[email protected]>
> > > Sent: Tuesday, September 18, 2018 12:15 PM
> > > To: Peter Koss <[email protected]>; tipc-
> > > [email protected]
> > > Subject: RE: What affects congestion beyond window size, and what
> > > might have reduced congestion thresholds in TIPC 2.0.x?
> > >
> > > Hi Peter,
> > > The only parameter of those mentioned below that would have any
> > > effect on congestion is TIPC_MAX_LINK_WIN, which should reduce
> > > occurrences
> > of
> > > link level congestion.
> > > However, you don't describe which symptoms you see caused by this
> > > congestion.
> > > - Is it only a higher 'congested'  counter when you look at the link
> > > statistics? If so, you don't have a problem at all, this is a
> > > totally normal and frequent occurrence. (Maybe we should have given
> > > this field a different name to avert
> > > confusion.)
> > > - If this causes a severely reduced throughput you may have a
> > > problem, but I don't find that very likely.
> > > - If you are losing messages at the socket level (dropped because of
> > > receive buffer overflow) you *do* have a problem, but this can most
> > > often be remedied by extending the socket receive buffer limit.
> > >
> > > BR
> > > ///Jon Maloy
> > >
> > > -----Original Message-----
> > > From: Peter Koss <[email protected]>
> > > Sent: September 18, 2018 12:33 PM
> > > To: [email protected]
> > > Subject: [tipc-discussion] What affects congestion beyond window
> > > size, and what might have reduced congestion thresholds in TIPC 2.0.x?
> > >
> > >
> > > In TIPC 1.7.6, we battled with congestion quite a bit.    We ultimately
> settled
> > > on adjusting these parameters in TIPC, which we also used in TIPC
> > > 1.7.7.  This was running on Wind River Linux 3, where TIPC was an
> > > independent module from the kernel.
> > >
> > > SOL_TIPC                                          changed from 271 to 50. 
> > >  (probably not
> > > affecting congestion)
> > > TIPC_MAX_LINK_WIN                   changed from 50 to 150
> > > TIPC_NODE_RECVQ_DEPTH        set to 131
> > >
> > > Using Wind River Linux 6, we get TIPC 2.0.5 as part of the kernel,
> > > and we see congestion at occurring at much lower overall load levels
> > > (less traffic
> > overall),
> > > compared to TIPC 1.7.7 & WR3.   We've made the same changes as above
> > via
> > > a loadable module for TIPC 2.0.5, and also noted that
> > > TIPC_NODE_RECVQ_DEPTH is now obsoleted.   Upon observing
> > congestion,
> > > we have changed the default window size, and max window size, up to
> > > 300 and even 400.  This helps congestion a little bit, but not 
> > > sufficiently.
> > >
> > >
> > > Does anyone know:
> > > -What has changed in TIPC 2.0.x that affects this?
> > > -Are there other parameters to change, to assist this?
> > > -Is there a replacement set of parameters that affect what
> > > TIPC_NODE_RECVQ_DEPTH influences?
> > >
> > >
> > >
> > > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
> > intended
> > > solely for the use of the addressee(s). If you are not the intended
> > > recipient, please notify so to the sender by e-mail and delete the
> > > original message. In such cases, please notify us immediately at
> > > [email protected] <mailto:[email protected]> . Further, you are not
> > > to copy, disclose, or distribute this e-mail or its contents to any
> > > unauthorized person(s). Any such actions are considered unlawful.
> > > This e-mail may contain viruses. Infinite has taken every reasonable
> > > precaution to minimize this risk, but is not liable for any damage
> > > you may sustain as a result of any virus in this e-mail. You should
> > > carry out your
> > own virus checks before opening the e-mail or attachments.
> > > Infinite reserves the right to monitor and review the content of all
> > > messages sent to or from this e-mail address. Messages sent to or
> > > from this e-mail address may be stored on the Infinite e-mail system.
> > >
> > >
> > >
> > > ***INFINITE******** End of Disclaimer********INFINITE********
> > >
> > > _______________________________________________
> > > tipc-discussion mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion


_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to