Re: Some interrupt coalescing tests

2001-10-22 Thread Mike Silbersack


On Thu, 18 Oct 2001, Terry Lambert wrote:

 In the non-LRP case, the percentage drop in interrupt overhead
 is ~10% (as has been observed by others).  THis makes sense,
 too, if you consider that NETISR driving of receives means
 less time in interrupt processing.  If we multiply the 15%
 (100% - 85% = 15% in transmit) by 3 (12000/(12000-8000) =
 100% / 33% = 3), then we get 45% in transmit in the non-LRP
 case.

Hrm, so the reduction I saw is repeatable; good.  If there are no
objections, I'll fix up the whitespace later this week, fix the warning,
and commit it.  (The whitespace applies to the unified diff I created;
your context one may have correct whitespace.)

 It would be nice if someone could confirm that slightly less
 than 1/2 of the looping is on the transmit side for a non-LRP
 kernel, but that's about what we should expect...

If it is, I wonder if we could put a larger number of packets in the queue
and disable transmit interrupts for a while.  MMmm, dynamic queues.
Sounds like something that would take a lot of work to get right,
unfortunately.  (Presumably, this tactic could be applied to most network
cards, although all the better ones probably have really good transmit
interrupt mitigation.)

 I'm really surprised abuse of the HTTP protocol itself in
 denial of service attacks isn't more common.

Well, the attack your proposed would require symmetric bandwidth (if I
understood it correctly), and is of course traceable.  My guess would be
that even script kiddies are smart enough to avoid attacks which could
easily be traceable back to their drones.

 Even ignoring this, there's a pretty clear off the shelf
 hardware path to a full 10 gigabits, with PCI-X (8 gigabits
 times 2 busses gets you there, which is 25 times the largest
 UUNet hosting center pipe size today).

Are you sure about that?  I recently heard that Internet2 will be moving
to 2.4Gbps backbones in the near future, and I assume that qwest wouldn't
be willing to donate that bandwidth unless they had similar capabilities
already.  (They being generalized to all backbone providers.)

 Fair share is more a problem for slower interfaces without
 hardware coelescing, and software is an OK band-aid for
 them (IMO).

 I suspect that you will want to spend most of your CPU time
 doing processing, rather than interrupt handing, in any case.

 -- Terry

Yep, probably.  Are you implementing fair sharing soon?

Mike Silby Silbersack


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Some interrupt coalescing tests

2001-10-18 Thread Terry Lambert

Mike Silbersack wrote:
 What probably should be done, if you have time, is to add a bit of
 profiling to your patch to find out how it helps most.  I'm curious how
 many times it ends up looping, and also why it is looping (whether this is
 due to receive or transmit.)  I think knowing this information would help
 optimize the drivers further, and perhaps suggest a tact we haven't
 thought of.

On 960 megabits per second on a Tigon III (full wire speed,
non-jumbogram), the looping is almost entirely (~85%) on
the receive side.

It loops for 75% of the hardware interrupts in the LRP case
(reduction of interrupts from 12,000 to 8,000 -- 33%).

This is really expected, since in the LRP case, the receive
processing is significantly higher, and even in that case,
we are not driving the CPU to the wall in interrupt processing.

In the non-LRP case, the percentage drop in interrupt overhead
is ~10% (as has been observed by others).  THis makes sense,
too, if you consider that NETISR driving of receives means
less time in interrupt processing.  If we multiply the 15%
(100% - 85% = 15% in transmit) by 3 (12000/(12000-8000) =
100% / 33% = 3), then we get 45% in transmit in the non-LRP
case.

It would be nice if someone could confirm that slightly less
than 1/2 of the looping is on the transmit side for a non-LRP
kernel, but that's about what we should expect...


  I don't know if anyone has tested what happens to apache in
  a denial of service attack consisting of a huge number of
  partial GET requests that are incomplete, and so leave state
  hanging around in the HTTP server...
 
 I'm sure it would keel over and die, since it needs a process
 per socket.  If you're talking about sockets in TIME_WAIT or
 such, see netkill.pl.

I was thinking in terms of connections not getting dropped.

The most correct way to handle this is probably an accept
filter for CRLFCRLF, indicating a complete GET
request (still leaves POST, though, which has a body), with
dropping of long duration incomplete requests.  Unfortunately,
without going into the Content-Length: parsing, we are
pretty much screwed on POST, and very big POSTs still screw
you badly (imagine a Content-Length: 10).  You
can mitigate that by limiting request size, but you are
still talking about putting HTTP parsing in the kernel,
above and beyond simple accept filters.

I'm really surprised abuse of the HTTP protocol itself in
denial of service attacks isn't more common.


  Yes.  Floyd and Druschel recommend using high and low
  watermarks on the amount of data pending processing in
  user space.  The most common approach is to use a fair
  share scheduling algorithm, which reserves a certain
  amount of CPU for user space processing, but this is
  somewhat wasteful, if there is no work, since it denies
  quantum to the interrupt processing, potentially wrongly.
 
 I'm not sure such an algorithm would be wasteful - there must be data
 coming in to trigger such a huge amount of interrupts.  I guess this would
 depend on how efficient your application is, how you set the limits, etc.

Yes.  The waste comment is aimed at the idea that you
will most likely have a heterogeneous loading, so you can
not accurately predict ahead of time that you will spend
80% of your time in the kernel, and 20% processing in user
space, or whatever ratio you come up with.  This becomes
much more of an issue when you have an attack, which will,
by definition, end up being asymmetric.

In practice, however, no one out there has a pipe size in
excess of 400 Mbits outside of a lab, so most people never
really need 1Gbit of throughput, anyway.  If you can make
your system handle full wire speed for 1Gbit, you are pretty
much safe from any attack that someone might want to throw
at you, at least until the pipes get larger.

Even ignoring this, there's a pretty clear off the shelf
hardware path to a full 10 gigabits, with PCI-X (8 gigabits
times 2 busses gets you there, which is 25 times the largest
UUNet hosting center pipe size today).

Fair share is more a problem for slower interfaces without
hardware coelescing, and software is an OK band-aid for
them (IMO).

I suspect that you will want to spend most of your CPU time
doing processing, rather than interrupt handing, in any case.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Some interrupt coalescing tests

2001-10-17 Thread Mike Silbersack


On Sun, 14 Oct 2001, Terry Lambert wrote:

 The one thing I _would_ add -- though I'm waiting for it to
 be a problem before doing it -- is to limit the total number
 of packets processed per interrupt by keeping a running count.

 You would have to be _AMAZINGLY_ loaded to hit this, though;
 since it would mean absolutely continuous DMAs.  I think it
 is self-limiting, should that happen, since once you are out
 of mbufs, you're out.  The correct thing to do is probably to
 let it run out, but keep a seperate transmit reserve, so that
 you can process requests to completion.

What probably should be done, if you have time, is to add a bit of
profiling to your patch to find out how it helps most.  I'm curious how
many times it ends up looping, and also why it is looping (whether this is
due to receive or transmit.)  I think knowing this information would help
optimize the drivers further, and perhaps suggest a tact we haven't
thought of.

 I don't know if anyone has tested what happens to apache in
 a denial of service attack consisting of a huge number of
 partial GET requests that are incomplete, and so leave state
 hanging around in the HTTP server...

I'm sure it would keel over and die, since it needs a process per socket.
If you're talking about sockets in TIME_WAIT or such, see netkill.pl.

 Yes.  Floyd and Druschel recommend using high and low
 watermarks on the amount of data pending processing in
 user space.  The most common approach is to use a fair
 share scheduling algorithm, which reserves a certain
 amount of CPU for user space processing, but this is
 somewhat wasteful, if there is no work, since it denies
 quantum to the interrupt processing, potentially wrongly.

I'm not sure such an algorithm would be wasteful - there must be data
coming in to trigger such a huge amount of interrupts.  I guess this would
depend on how efficient your application is, how you set the limits, etc.

Mike Silby Silbersack



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Some interrupt coalescing tests

2001-10-14 Thread Mike Silbersack


On Sat, 13 Oct 2001, Terry Lambert wrote:

 Mike Silbersack wrote:

 One issue to be careful of here is that the removal of the
 tcptmpl actually causes a performance hit that wasn't there
 in the 4.3 code.  My original complaint about tcptmpl taking
 up 256 instead of 60 bytes stands, but I'm more than half
 convinced that making it take up 60 bytes is OK... or at
 least is more OK than allocating and deallocating each time,
 and I don't yet have a better answer to the problem.  4.3
 doesn't have this change, but 4.4 does.

I need benchmarks to prove the slowdown, Terry.  The testing I performed
(which is limited, of course) showed no measureable speed difference.
Remember that the only time the tcptempl mbuf ever gets allocated now is
when a keepalive is sent, which is a rare event.  The rest of the time,
it's just copying the data from the preexisting structures over to the new
packet.  If you can show me that this method is slower, I will move it
over to a zone allocated setup like you proposed.

  I'm not sure if the number was lower because the celeron couldn't run the
  flooder as quickly, or if the -current box was dropping packets.  I
  suspect the latter, as the -current box was NOTICEABLY slowed down; I
  could watch systat refresh the screen.

 This is unfortunate; it's an effect that I expected with
 the -current code, because of the change to the interrupt
 processing path.

 To clarify here, the slowdown occurred both with and without
 the patch, right?

 The problem here is that when you hit livelock (full CPU
 utilization), then you are pretty much unable to do anything
 at all, unless the code path goes all the way to the top of
 the stack.

Yep, the -current box livelocked with and without the patch.  I'm not sure
if -current is solely to blame, though.  My -current box is using a PNIC,
which incurs additional overhead relative to other tulip clones, according
to the driver's comments.  And the 3com in that box hasn't worked in a
while... maybe I should try debugging that so I have an additional test
point.

  The conclusion?  I think that the dc driver does a good enough job of
  grabbing multiple packets at once, and won't be helped by Terry's patch
  except in a few very cases.

 10% is a good improvement; my gut feeling is that it would
 have been less than that.  This is actually good news for
 me, since it means that my 30% number is bounded by the
 user space program not being run (in other words, I should
 be able to get considerably better performance, using a
 weighted fair share scheduler).  As long as it doesn't
 damage performance, I think that it's proven itself.

Hm, true, I guess the improvement is respectable.  My thought is mostly
that I'm not sure how much it's extending the performance range of a
system; testing with more varied packet loads as suggested by Alfred would
help tell us the answer to this.

  In fact, I have a sneaky suspicion that Terry's patch may
  increase bus traffic slightly.  I'm not sure how much of
  an issue this is, perhaps Bill or Luigi could comment.

 This would be interesting to me, as well.  I gave Luigi an
 early copy of the patch to play with a while ago, and also
 copied Bill.

 I'm interested in how you think it could increase traffic;
 the only credible reason I've been able to come up with is
 the ability to push more packets through, when they would
 otherwise end up being dropped because of the queue full
 condition -- if this is the case, the bus traffic is real
 work, and not additonal overhead.

The extra polling of the bus in cases where there are no additional
packets to grab is what I was wondering about.  I guess in comparison to
the quantity of packet data going by, it's not a real issue.

  In short, if we're going to try to tackle high interrupt load,
  it should be done by disabling interrupts and going to polling
  under high load;

 I would agree with this, except that it's only really a
 useful observation if FreeBSD is being used as purely a
 network processor.  Without interrupts, the polling will
 take a significant portion of the available CPU to do, and
 you can't burn that CPU if, for example, you have an SSL
 card that does your handshakes, but you need to run the SSL
 sessions themselves up in user space.

Straight polling isn't necessarily the solution I was thinking of, but
rather some form of interrupt disabling at high rates.  For example, if
the driver were to keep track of how many interrupts/second it was taking,
perhaps it could up the number of receive buffers from 64 to something
higher, then disable the card's interrupt and set a callback to run in a
short bit of time at which point interrupts would be reenabled and the
interrupt handler would be run.  Ideally, this could reduce the number of
interrupts greatly, increasing efficiency under load.  Paired with this
could be receive polling during transmit, something which does not seem to
be done at current, if I'm reading correctly.

I'm not sure 

Re: Some interrupt coalescing tests

2001-10-14 Thread Terry Lambert

Mike Silbersack wrote:
 Hm, true, I guess the improvement is respectable.  My thought is mostly
 that I'm not sure how much it's extending the performance range of a
 system; testing with more varied packet loads as suggested by Alfred would
 help tell us the answer to this.

I didn't respond to Alfred's post, and I probably should have;
he had some very good comments, including varying the load.

My main interest has been in increasing throughput as much as
possible; as such, my packet load has been geared towards moving
the most data possible.

The tests we did were with just connections per second, 1k HTTP
transfers, and 10k HTTP transfers.  Unfortunately, I can't give
you seperate numbers without the LRP, since we didn't bother
after the connection rate went from ~7000/second to 23500/second
with LRP, it wasn't worth it.


 The extra polling of the bus in cases where there are no additional
 packets to grab is what I was wondering about.  I guess in comparison to
 the quantity of packet data going by, it's not a real issue.

It could be, if you were doing something that was network
triggered, relatively low cost, but CPU intensive; on the
whole, though, there's very little that isn't going to be
network related, these days, and what there is, will end up
not taking the overhead, unless you are also doing networking.

Maybe it should be a tunable?  But these days, everything is
pretty much I/O bound, not CPU bound.

The one thing I _would_ add -- though I'm waiting for it to
be a problem before doing it -- is to limit the total number
of packets processed per interrupt by keeping a running count.

You would have to be _AMAZINGLY_ loaded to hit this, though;
since it would mean absolutely continuous DMAs.  I think it
is self-limiting, should that happen, since once you are out
of mbufs, you're out.  The correct thing to do is probably to
let it run out, but keep a seperate transmit reserve, so that
you can process requests to completion.

I don't know if anyone has tested what happens to apache in
a denial of service attack consisting of a huge number of
partial GET requests that are incomplete, and so leave state
hanging around in the HTTP server...


[ ... polling vs. interrupt load ... ]

 Straight polling isn't necessarily the solution I was thinking of, but
 rather some form of interrupt disabling at high rates.  For example, if
 the driver were to keep track of how many interrupts/second it was taking,
 perhaps it could up the number of receive buffers from 64 to something
 higher, then disable the card's interrupt and set a callback to run in a
 short bit of time at which point interrupts would be reenabled and the
 interrupt handler would be run.  Ideally, this could reduce the number of
 interrupts greatly, increasing efficiency under load.  Paired with this
 could be receive polling during transmit, something which does not seem to
 be done at current, if I'm reading correctly.
 
 I'm not sure of the feasibility of the above, unfortunately - it would
 seem highly dependant on how short of a timeout we can realistically get
 along with how many mbufs we can spare for receive buffers.

Yes.  Floyd and Druschel recommend using high and low
watermarks on the amount of data pending processing in
user space.  The most common approach is to use a fair
share scheduling algorithm, which reserves a certain
amount of CPU for user space processing, but this is
somewhat wasteful, if there is no work, since it denies
quantum to the interrupt processing, potentially wrongly.


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Some interrupt coalescing tests

2001-10-13 Thread Terry Lambert

Mike Silbersack wrote:
 Well, I've been watching everyone argue about the value of interrupt
 coalescing in the net drivers, so I decided to port terry's patch to 4.4 
 -current to see what the results are.

Thanks!


 The network is 100mbps, switched.  To simulate load, I used a syn flooder
 aimed at an unused port.  icmp/rst response limiting was enabled.
 
 With the -current box attacking the -stable box, I was able to notice a
 slight drop in interrupts/second with the patch applied.  The number of
 packets was ~57000/second.
 
 Before: ~46000 ints/sec, 57-63% processor usage due to interrupts.
 After: ~38000 ints/sec, 50-60% processor usage due to interrupts.
 
 In both cases, the box felt responsive.

One issue to be careful of here is that the removal of the
tcptmpl actually causes a performance hit that wasn't there
in the 4.3 code.  My original complaint about tcptmpl taking
up 256 instead of 60 bytes stands, but I'm more than half
convinced that making it take up 60 bytes is OK... or at
least is more OK than allocating and deallocating each time,
and I don't yet have a better answer to the problem.  4.3
doesn't have this change, but 4.4 does.


 With the -stable box attacking the -current box, the patch made no
 difference.  The box bogged down at only ~25000 ints/sec, and response
 limiting reported the number of packets to be ~44000/second.
 
 I'm not sure if the number was lower because the celeron couldn't run the
 flooder as quickly, or if the -current box was dropping packets.  I
 suspect the latter, as the -current box was NOTICEABLY slowed down; I
 could watch systat refresh the screen.

This is unfortunate; it's an effect that I expected with
the -current code, because of the change to the interrupt
processing path.

To clarify here, the slowdown occurred both with and without
the patch, right?

The problem here is that when you hit livelock (full CPU
utilization), then you are pretty much unable to do anything
at all, unless the code path goes all the way to the top of
the stack.


 The conclusion?  I think that the dc driver does a good enough job of
 grabbing multiple packets at once, and won't be helped by Terry's patch
 except in a few very cases.

10% is a good improvement; my gut feeling is that it would
have been less than that.  This is actually good news for
me, since it means that my 30% number is bounded by the
user space program not being run (in other words, I should
be able to get considerably better performance, using a
weighted fair share scheduler).  As long as it doesn't
damage performance, I think that it's proven itself.


 In fact, I have a sneaky suspicion that Terry's patch may
 increase bus traffic slightly.  I'm not sure how much of
 an issue this is, perhaps Bill or Luigi could comment.

This would be interesting to me, as well.  I gave Luigi an
early copy of the patch to play with a while ago, and also
copied Bill.

I'm interested in how you think it could increase traffic;
the only credible reason I've been able to come up with is
the ability to push more packets through, when they would
otherwise end up being dropped because of the queue full
condition -- if this is the case, the bus traffic is real
work, and not additonal overhead.

If you weren't getting any packets, or had a very slow
packet rate, it might increase bus traffic, in that doing
an extra check might always return a negative response (in
the test case in question, that's not true, since it's not
doing more work than it would with the same load, using
interrupts to trigger the same bus traffic.  Note that it
is only a consideration in the case that there is bus
traffic involved when polling an empty ring to see if DMA
has been done to a particular mbuf or cluster, so it takes
an odd card for it to be a problem.


 In short, if we're going to try to tackle high interrupt load,
 it should be done by disabling interrupts and going to polling
 under high load;

I would agree with this, except that it's only really a
useful observation if FreeBSD is being used as purely a
network processor.  Without interrupts, the polling will
take a significant portion of the available CPU to do, and
you can't burn that CPU if, for example, you have an SSL
card that does your handshakes, but you need to run the SSL
sessions themselves up in user space.

For example, the current ClickArray Array 1000 product
does around 700 1024 bit SSL connection setups a second, and,
since it uses a Broadcom card, the card is only doing the
handshaking, and not the rest of the crypto processing.  The
crypto stream processing has to be done in user space, in
the SSL proxy code living there, and as such, would suffer
from doing poling.


 the patch proposed here isn't worth the extra complexity.

I'd argue that the complexity is coming, no matter what.  If
you seperate out the tx_eof and rx_eof entry points, and
externalize them into the ethernet driver interface, in order
to enable polling, you are going to need to have a return

Some interrupt coalescing tests

2001-10-12 Thread Mike Silbersack


Well, I've been watching everyone argue about the value of interrupt
coalescing in the net drivers, so I decided to port terry's patch to 4.4 
-current to see what the results are.  The patch included applies cleanly
to 4.4's if_dc, and will apply to -current with a one line change.
Whitespace is horrible, I copied and pasted the original patch, used patch
-l, etc.

The test setup I used was as follows:
Duron 600, PNIC, running -current
Celeron 450, ADMtek tulip-clone, running -stable

The network is 100mbps, switched.  To simulate load, I used a syn flooder
aimed at an unused port.  icmp/rst response limiting was enabled.

With the -current box attacking the -stable box, I was able to notice a
slight drop in interrupts/second with the patch applied.  The number of
packets was ~57000/second.

Before: ~46000 ints/sec, 57-63% processor usage due to interrupts.
After: ~38000 ints/sec, 50-60% processor usage due to interrupts.

In both cases, the box felt responsive.

With the -stable box attacking the -current box, the patch made no
difference.  The box bogged down at only ~25000 ints/sec, and response
limiting reported the number of packets to be ~44000/second.

I'm not sure if the number was lower because the celeron couldn't run the
flooder as quickly, or if the -current box was dropping packets.  I
suspect the latter, as the -current box was NOTICEABLY slowed down; I
could watch systat refresh the screen.

The conclusion?  I think that the dc driver does a good enough job of
grabbing multiple packets at once, and won't be helped by Terry's patch
except in a few very cases.  In fact, I have a sneaky suspicion that
Terry's patch may increase bus traffic slightly.  I'm not sure how much of
an issue this is, perhaps Bill or Luigi could comment.

In short, if we're going to try to tackle high interrupt load, it should
be done by disabling interrupts and going to polling under high load;
the patch proposed here isn't worth the extra complexity.

I suppose this would all change if we were using LRP and doing lots of
processing in the interrupt handler... but we aren't.

Mike Silby Silbersack


--- if_dc.c.origThu Oct 11 01:39:05 2001
+++ if_dc.c Thu Oct 11 01:39:30 2001
@@ -193,8 +193,8 @@
 static int dc_coal __P((struct dc_softc *, struct mbuf **));
 static void dc_pnic_rx_bug_war __P((struct dc_softc *, int));
 static int dc_rx_resync__P((struct dc_softc *));
-static void dc_rxeof   __P((struct dc_softc *));
-static void dc_txeof   __P((struct dc_softc *));
+static int dc_rxeof   __P((struct dc_softc *));
+static int dc_txeof   __P((struct dc_softc *));
 static void dc_tick__P((void *));
 static void dc_tx_underrun __P((struct dc_softc *));
 static void dc_intr__P((void *));
@@ -2302,7 +2302,7 @@
  * A frame has been uploaded: pass the resulting mbuf chain up to
  * the higher level protocols.
  */
-static void dc_rxeof(sc)
+static int dc_rxeof(sc)
struct dc_softc *sc;
 {
 struct ether_header*eh;
@@ -2311,6 +2311,7 @@
struct dc_desc  *cur_rx;
int i, total_len = 0;
u_int32_t   rxstat;
+  int cnt = 0;
 
ifp = sc-arpcom.ac_if;
i = sc-dc_cdata.dc_rx_prod;
@@ -2355,7 +2356,7 @@
continue;
} else {
dc_init(sc);
-   return;
+  return(cnt);
}
}
 
@@ -2379,6 +2380,7 @@
/* Remove header from mbuf and pass it on. */
m_adj(m, sizeof(struct ether_header));
ether_input(ifp, eh, m);
+  cnt++;
}
 
sc-dc_cdata.dc_rx_prod = i;
@@ -2389,12 +2391,13 @@
  * the list buffers.
  */
 
-static void dc_txeof(sc)
+static int dc_txeof(sc)
struct dc_softc *sc;
 {
struct dc_desc  *cur_tx = NULL;
struct ifnet*ifp;
int idx;
+  int cnt = 0;
 
ifp = sc-arpcom.ac_if;
 
@@ -2452,7 +2455,7 @@
ifp-if_collisions++;
if (!(txstat  DC_TXSTAT_UNDERRUN)) {
dc_init(sc);
-   return;
+  return(cnt);
}
}
 
@@ -2466,13 +2469,14 @@
 
sc-dc_cdata.dc_tx_cnt--;
DC_INC(idx, DC_TX_LIST_CNT);
+  cnt++;
}
 
sc-dc_cdata.dc_tx_cons = idx;
if (cur_tx != NULL)
ifp-if_flags = ~IFF_OACTIVE;
 
-   return;
+  return(cnt);
 }
 
 static void dc_tick(xsc)
@@ -2612,6 +2616,7 @@
struct dc_softc *sc;

Re: Some interrupt coalescing tests

2001-10-12 Thread Alfred Perlstein

* Mike Silbersack [EMAIL PROTECTED] [011012 01:30] wrote:
 
 Well, I've been watching everyone argue about the value of interrupt
 coalescing in the net drivers, so I decided to port terry's patch to 4.4 
 -current to see what the results are.  The patch included applies cleanly
 to 4.4's if_dc, and will apply to -current with a one line change.
 Whitespace is horrible, I copied and pasted the original patch, used patch
 -l, etc.
 
 The test setup I used was as follows:
 Duron 600, PNIC, running -current
 Celeron 450, ADMtek tulip-clone, running -stable
 
 The network is 100mbps, switched.  To simulate load, I used a syn flooder
 aimed at an unused port.  icmp/rst response limiting was enabled.

Actually, you might want to leave that on, it will generate more load.

 
 With the -current box attacking the -stable box, I was able to notice a
 slight drop in interrupts/second with the patch applied.  The number of
 packets was ~57000/second.
 
 Before: ~46000 ints/sec, 57-63% processor usage due to interrupts.
 After: ~38000 ints/sec, 50-60% processor usage due to interrupts.
 
 In both cases, the box felt responsive.

You need to get real hardware to run these tests, obviously you aren't
saturating your line.  I would suspect a better test would be to see
how many pps you get can at the point where cpu utlization reaches
100%.  Basically start at a base of 60,000pps, and see how many more
it takes to drive them both to 100%.

Even your limited tests show a mean improvement of something like
10%.

10% isn't earth shattering, but it is a signifigant improvement.

-Alfred

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Some interrupt coalescing tests

2001-10-12 Thread Mike Silbersack


On Fri, 12 Oct 2001, Alfred Perlstein wrote:

  The network is 100mbps, switched.  To simulate load, I used a syn flooder
  aimed at an unused port.  icmp/rst response limiting was enabled.

 Actually, you might want to leave that on, it will generate more load.

I considered leaving it on, but I'm not sure if that would be constructive
or not.  The primary problem with doing that is related to my test setup -
as we see from the stable - current attack, my current box couldn't take
the interrupt load of that many incoming packets, which would slow down
the outgoing packets.  If I had a better test setup, I'd like to try that.

  Before: ~46000 ints/sec, 57-63% processor usage due to interrupts.
  After: ~38000 ints/sec, 50-60% processor usage due to interrupts.
 
  In both cases, the box felt responsive.

 You need to get real hardware to run these tests, obviously you aren't
 saturating your line.  I would suspect a better test would be to see
 how many pps you get can at the point where cpu utlization reaches
 100%.  Basically start at a base of 60,000pps, and see how many more
 it takes to drive them both to 100%.

 Even your limited tests show a mean improvement of something like
 10%.

 10% isn't earth shattering, but it is a signifigant improvement.

Yes, there is some improvement, but I'm not sure that the actual effect is
worthwhile.  Even with the 10% decrease, you're still going to kill the
box if the interrupt count goes much higher.

If you can setup a 4.4+this patch test of some sort with varying loads to
see the effect, maybe we could characterize the effect of the patch more.
With my setup, I don't think I can really take this testing any further.

Mike Silby Silbersack


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message