Re: [tipc-discussion] Link Congestion problem

Stephens, Allan Fri, 21 Sep 2007 11:03:23 -0700

Hi Felix:

>From the information you provided, I can't offer any immediate
explanation for the results you obtained in the first setup you
described.  However, any performance comparison of this sort must be
done very carefully because it can be significantly influenced by many
factors.  For example, are all nodes involved identical in nature, and
are all links being carried by identical media?  (That is, are some CPUs
faster than others, or more lightly loaded, and are all of a nodes links
being carried by the same Ethernet interface, or -- failing that -- over
interfaces with the same bandwidth capacity?)  If there are no obvious
differences of this sort it may be helpful to dump the link statistics
for the various nodes involved to see if there are any indicators that
would explain the different throughput values you are obtaining; if one
of the links is encountering a higher level of congestion or message
loss than the other, this might explain the lesser throughput.


The second scenario you describe may be due to the fact that you're
running into CPU limitations on the node that is running both a client
and server.  If your application runs the server process at a higher
priority than the client (which is typically the thing to do), then
there may not be enough cycles available to do all of the work the node
needs doing; as a result, Server 1 will be given preference to do its
job at the expense of Client 2.  It would be helpful to measure the CPU
load on all of the nodes to see if this is the case.

Regards,
Al

> -----Original Message-----
> From: Nayman Felix-QA5535 [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, September 20, 2007 6:13 PM
> To: Stephens, Allan; Randy Macleod; 
> [email protected]
> Subject: RE: [tipc-discussion] Link Congestion problem
> 
> 
> We changed our sockets to be blocking(both on the server and 
> the client) and are no longer setting destination droppable 
> to false.  When running one client (on one node) and a server 
> (on a different node) we saw a throughput of about 56,000 
> msgs/sec with our hardware.
> If we had two clients(on one node) and two servers(servers 
> with the same TIPC Name both on a different node), the total 
> throughput to the node that the servers is on is still around 
> 56,000 msgs/sec and the throughput is split evenly between 
> the servers.  
> 
> In all cases I sent a total of 25 Million messages with a 
> throttle of 100,000 msgs/sec.
> The output you see is printed every 10 seconds.
> 
> Couple of more observations:
> 
> Next Test:
> 
> Setup:
> Client1(node 1) -> Server1 (Node 2)
> Client2(node 3) -> Server2 (Node 2)
> 
> Server 1 and Server 2 have different TIPC Names.  
> 
> Any ideas on why we see such a difference in throughput on 
> the two different links?  The sum is still around 56,000 msgs/sec.
> 
> Server 1:
> Msgs per second = 30826 Messages in the time diff = 308260 
> Total Msgs= 308260 Msgs per second = 46178 Messages in the 
> time diff = 461780 Total Msgs= 770040 Msgs per second = 
> 40708.5 Messages in the time diff = 407085 Total Msgs=
> 1177125
> Msgs per second = 44332.2 Messages in the time diff = 443322 
> Total Msgs=
> 1620447
> Msgs per second = 40013.1 Messages in the time diff = 400131 
> Total Msgs=
> 2020578
> Msgs per second = 44903.7 Messages in the time diff = 449037 
> Total Msgs=
> 2469615
> Msgs per second = 53762.3 Messages in the time diff = 537623 
> Total Msgs=
> 3007238
> Msgs per second = 42374.3 Messages in the time diff = 423743 
> Total Msgs=
> 3430981
> Msgs per second = 52915 Messages in the time diff = 529150 Total Msgs=
> 3960131
> Msgs per second = 50055.5 Messages in the time diff = 500555 
> Total Msgs=
> 4460686
> Msgs per second = 35875.1 Messages in the time diff = 358751 
> Total Msgs=
> 4819437
> Msgs per second = 47139.7 Messages in the time diff = 471397 
> Total Msgs=
> 5290834
> Msgs per second = 51357 Messages in the time diff = 513570 Total Msgs=
> 5804404
> 
> Server 2:
> Msgs per second = 2721.4 Messages in the time diff = 27214 Total Msgs=
> 27214
> Msgs per second = 10614.5 Messages in the time diff = 106145 
> Total Msgs=
> 133359
> Msgs per second = 15566.2 Messages in the time diff = 155662 
> Total Msgs=
> 289021
> Msgs per second = 11808 Messages in the time diff = 118080 Total Msgs=
> 407101
> Msgs per second = 16563.3 Messages in the time diff = 165633 
> Total Msgs=
> 572734
> Msgs per second = 8827.1 Messages in the time diff = 88271 Total Msgs=
> 661005
> Msgs per second = 5155.5 Messages in the time diff = 51555 
> Total Msgs= 712560 Msgs per second = 11713.7 Messages in the 
> time diff = 117137 Total Msgs=
> 829697
> Msgs per second = 5957 Messages in the time diff = 59570 Total Msgs=
> 889267
> Msgs per second = 6401.6 Messages in the time diff = 64016 Total Msgs=
> 953283
> Msgs per second = 17586.5 Messages in the time diff = 175865 
> Total Msgs=
> 1129148
> Msgs per second = 9707.6 Messages in the time diff = 97076 Total Msgs=
> 1226224
> Msgs per second = 7532.1 Messages in the time diff = 75321 Total Msgs=
> 1301545
> --------------------------------------------------------------
> ----------
> ----------------
> 
> Next Test:
> 
> I'm not sure I understand the results below.  Server 1 
> receiving messages is not affected by the fact that client 2 
> is on the same node, but the throughput to server
> 2 is cut in half from the 56,000 msgs/sec apparently because 
> of the fact that client2 is sending at the same time.
> Any thoughts?
> 
> Client1(Node 3) -> Server1(Node 2)
> Client2(Node 2) -> Server2(Node 1)
>  
> results:
> Node2 (Server 1 receiving):
> Msgs per second = 51563.5 Messages in the time diff = 515635 
> Total Msgs=
> 515635
> Msgs per second = 55464.2 Messages in the time diff = 554642 
> Total Msgs=
> 1070277
> Msgs per second = 55152.8 Messages in the time diff = 551528 
> Total Msgs=
> 1621805
> Msgs per second = 55367.1 Messages in the time diff = 553671 
> Total Msgs=
> 2175476
> Msgs per second = 55160.2 Messages in the time diff = 551602 
> Total Msgs=
> 2727078
> Msgs per second = 55514.2 Messages in the time diff = 555142 
> Total Msgs= 3282220 Msgs per second = 55330.6 Messages in the 
> time diff = 553306 Total Msgs=
> 3835526
> Msgs per second = 55629.5 Messages in the time diff = 556295 
> Total Msgs=
> 4391821
> Msgs per second = 55452.2 Messages in the time diff = 554522 
> Total Msgs=
> 4946343
> Msgs per second = 55362.2 Messages in the time diff = 553622 
> Total Msgs=
> 5499965
> Msgs per second = 55208.9 Messages in the time diff = 552089 
> Total Msgs=
> 6052054
> Msgs per second = 55422.5 Messages in the time diff = 554225 
> Total Msgs=
> 6606279
> Msgs per second = 55507.2 Messages in the time diff = 555072 
> Total Msgs=
> 7161351
> Msgs per second = 55380 Messages in the time diff = 553800 Total Msgs=
> 7715151
> Msgs per second = 55617.1 Messages in the time diff = 556171 
> Total Msgs=
> 8271322
> Msgs per second = 55447 Messages in the time diff = 554470 Total Msgs=
> 8825792
> Msgs per second = 55585.6 Messages in the time diff = 555856 
> Total Msgs=
> 9381648
> Msgs per second = 55378.9 Messages in the time diff = 553789 
> Total Msgs=
> 9935437
>  
> Node 1 (Server 2 receiving):
> Msgs per second = 27768.8 Messages in the time diff = 277688 
> Total Msgs=
> 334396
> Msgs per second = 25784.2 Messages in the time diff = 257842 
> Total Msgs=
> 592238
> Msgs per second = 25504.7 Messages in the time diff = 255047 
> Total Msgs=
> 847285
> Msgs per second = 25539 Messages in the time diff = 255390 Total Msgs=
> 1102675
> Msgs per second = 25264.1 Messages in the time diff = 252641 
> Total Msgs=
> 1355316
> Msgs per second = 25267.9 Messages in the time diff = 252679 
> Total Msgs=
> 1607995
> Msgs per second = 25420.1 Messages in the time diff = 254201 
> Total Msgs=
> 1862196
> Msgs per second = 24963.8 Messages in the time diff = 249638 
> Total Msgs=
> 2111834
> Msgs per second = 25287.4 Messages in the time diff = 252874 
> Total Msgs=
> 2364708
> Msgs per second = 25242.8 Messages in the time diff = 252428 
> Total Msgs=
> 2617136
> Msgs per second = 25691.3 Messages in the time diff = 256913 
> Total Msgs=
> 2874049
> Msgs per second = 25387.8 Messages in the time diff = 253878 
> Total Msgs=
> 3127927
> Msgs per second = 25679.9 Messages in the time diff = 256799 
> Total Msgs=
> 3384726
> Msgs per second = 25464.7 Messages in the time diff = 254647 
> Total Msgs=
> 3639373
> Msgs per second = 25251.5 Messages in the time diff = 252515 
> Total Msgs=
> 3891888
> Msgs per second = 25608.7 Messages in the time diff = 256087 
> Total Msgs=
> 4147975
> Msgs per second = 25280 Messages in the time diff = 252800 Total Msgs=
> 4400775
> Msgs per second = 25394.5 Messages in the time diff = 253945 
> Total Msgs= 4654720 Msgs per second = 25430.7 Messages in the 
> time diff = 254307 Total Msgs=
> 4909027
> Msgs per second = 25349.2 Messages in the time diff = 253492 
> Total Msgs=
> 5162519
> Msgs per second = 25126.1 Messages in the time diff = 251261 
> Total Msgs= 5413780 Msgs per second = 25475.7 Messages in the 
> time diff = 254757 Total Msgs=
> 5668537
> 
> 
> 
> 
> -----Original Message-----
> From: Stephens, Allan [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 20, 2007 9:14 AM
> To: Randy Macleod; Nayman Felix-QA5535;
> [email protected]
> Subject: RE: [tipc-discussion] Link Congestion problem
> 
> Hi Felix:
> 
> I think Randy's analysis has pretty much said it all, so I 
> don't really have anything to add.  I'm a bit puzzled about 
> why you're trying to use non-blocking sockets with 
> non-discardable message traffic.  The non-discardable message 
> type suggests that you don't want to lose any messages, in 
> which case you have no alternative but to wait when you are 
> generating messages faster than they are being consumed.  If 
> there is some critical processing that needs to occur on a 
> regular basis, and which would be hampered by a blocking send 
> operation, maybe this processing should be handled by a 
> different thread of control ...
> 
> Regards,
> Al
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of 
> > Randy Macleod
> > Sent: Wednesday, September 19, 2007 9:14 PM
> > To: Nayman Felix-QA5535; [email protected]
> > Subject: Re: [tipc-discussion] Link Congestion problem
> > 
> >   Hi!
> > 
> > On 9/19/07, Nayman Felix-QA5535 <[EMAIL PROTECTED]> wrote:
> > >
> > > I'm running TIPC 1.5.12 on a linux 2.6.9 kernel on two
> > different nodes
> > > and I'm seeing a problem with link congestion after about
> > 26 messages
> > > being sent.  I'm running connectionless traffic ( also with
> > the socket
> > > set to non-blocking via fcntl), with the domain set to
> > closest first,
> > > the destination droppable flag set to FALSE, the message 
> importance 
> > > set to HIGH,  and the message size set to 2000 bytes.
> > 
> > Your MTU is probably 1500 B so TIPC will have to split each message 
> > thereby sending 2 packets. After the 25th user-space send, 
> you have 50
> 
> > items in the link queue (if you have a fast cpu and and ack message 
> > from the remote tipc hasn't arrived
> > yet.) Since your link window is 50, the 26th send fails, 
> and doesn't 
> > block.
> > You retry getting EAGAIN until the far tipc node sends an 
> ack, opening
> 
> > up room in the link queue. Yes it really takes that long! If you 
> > measure it it will likely only be ~100 us so long is a 
> relative term.
> > 
> > The only hitch is that you say you have set the message priority to 
> > HIGH so the link congestion should be 50/3*5 as can be seen here:
> > <http://lxr.linux.no/source/net/tipc/link.c?v=2.6.17.13#L1037>
> > and here:
> > <http://lxr.linux.no/source/net/tipc/link.c?v=2.6.17.13#L2710>
> > 
> > (tipc in 2.6.17.x is very close to tipc-1.5.12 and I don't have the
> > 1.5.12 code handy)
> > 
> > 
> > >
> > > When I run the program, which is a modified version of the
> > hello world
> > > program,  with the server running on one node and the
> > client running
> > > on a different node I'm getting back the error: Resource
> > temporarily
> > > unavailable and an errno of 11(EAGAIN) after the 26th
> > message.  So, I' 
> > > updated my code to retry if an errno of EAGAIN is returned,
> > but I need
> > > to retry some value which was more than 2500 (I believe it
> > was around
> > > 2650 or something like
> > > that) times before it successfully can send 27 messages
> > over the link.
> > 
> > Once the acks start coming back you should get fewer 
> EAGAINs in a row 
> > because of the way the protocol works...
> > 
> > 
> > >
> > >
> > > tipc-config -ls  shows the following indicating that link
> > congestion
> > > is
> > > happening:
> > >
> > > Link <1.1.169:eth0-1.1.168:eth0>
> > >   ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms
> > Window:50 packets
> > >   RX packets:29 fragments:0/0 bundles:0/0
> > >   TX packets:3627 fragments:3620/1809 bundles:0/0
> > >   TX profile sample:105 packets  average:1356 octets
> > >   0-64:7% -256:0% -1024:40% -4096:53% -16354:0% -32768:0% 
> -66000:0%
> > >   RX states:2175142 probes:1087411 naks:0 defs:0 dups:0
> > >   TX states:2174945 probes:1087534 naks:0 acks:0 dups:0
> > >   Congestion bearer:0 link:36  Send queue max:36 avg:0
> > >
> > >
> > > Any ideas as to why I'm seeing link congestion?
> > 
> > Ummm,
> > You're code is sending too fast and you have asked to not block...
> > the network round trip time is longer than the time it takes to 
> > enqueue 25 packets... ;-)
> > 
> > > If you'd like I can attach some sample code.
> > >
> > >
> > > Just before sending this note, I tried commenting out the 
> code that 
> > > makes the socket non-blocking via fcntl and then I don't see link 
> > > congestion anymore.  So why does making the socket
> > non-blocking lead
> > > to link congestion?
> > 
> > If blocking is allowed, the the calling process is put to sleep 
> > briefly until the link congestion is abated:
> > http://lxr.linux.no/source/net/tipc/socket.c?v=2.6.17.13#L524
> > 
> > 
> > So you have to keep trying to send, hopefully also doing 
> other useful 
> > work while you wait for the ack.
> > 
> > Assuming that this is correct, then I'd like to suggest that tipc 
> > could notify userspace that the link is no longer congested... I've 
> > done things like that before but all in userspace. I'll explain at 
> > length if anyone is interested in the details...
> > 
> > Given that congestion abatement notification doesn't exist yet, you 
> > have to:
> > 0. live with it,
> > 1. put in some flow control on the send side, 2. increase the link 
> > send window, 3. both 1 & 2.
> > 
> > Another question that comes to mind is:
> > What is the maximum time that a process be suspended?
> > I'd guess it would be the (default) link timeout time of ~1.5 s.
> > That would only occur if the far end failed.
> > --
> > // Randy
> > 
> > --------------------------------------------------------------
> > -----------
> > This SF.net email is sponsored by: Microsoft Defy all challenges. 
> > Microsoft(R) Visual Studio 2005.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________
> > tipc-discussion mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> > 
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] Link Congestion problem

Reply via email to