Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Michal Mertl
On Fri, 1 Nov 2002, Bill Fenner wrote:

 sonewconn() hands sofree() a self-inconsistent socket -- so-so_head is
 set, so so must be on a queue, but sonewconn() hasn't put it on a queue yet.
 Please try this patch.

   Bill

 Index: uipc_socket2.c
 ===
 RCS file: /home/ncvs/src/sys/kern/uipc_socket2.c,v
 retrieving revision 1.104
 diff -u -r1.104 uipc_socket2.c
 --- uipc_socket2.c18 Sep 2002 19:44:11 -  1.104
 +++ uipc_socket2.c1 Nov 2002 22:40:52 -
 @@ -192,7 +192,7 @@
   return ((struct socket *)0);
   if ((head-so_options  SO_ACCEPTFILTER) != 0)
   connstatus = 0;
 - so-so_head = head;
 + so-so_head = NULL;
   so-so_type = head-so_type;
   so-so_options = head-so_options ~ SO_ACCEPTCONN;
   so-so_linger = head-so_linger;
 @@ -209,6 +209,7 @@
   return ((struct socket *)0);
   }

 + so-so_head = head;
   if (connstatus) {
   TAILQ_INSERT_TAIL(head-so_comp, so, so_list);
   so-so_state |= SS_COMP;


This patch fixes the panics for me. Thanks a lot. I believe it should be
commited.

BTW: I get about 850 fetches pers second on UP an 600 SMP (the same
machine and settings). Don't know if it's expected in this usage pattern.

-- 
Michal Mertl
[EMAIL PROTECTED]





To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Michal Mertl
On Fri, 1 Nov 2002, Terry Lambert wrote:

 Bill Fenner wrote:
  I think this can still crash (just like my patch); the problem is in
  what happens when it fails to allocate memory.  Unless you set one of
  the flags, it's still going to panic in the same place, I think, when
  you run out of memory.
 
  No.  The flags are only checked when so_head is not NULL.  sonewconn()
  was handing sofree() an inconsistent struct so (so_head was set without
  being on either queue), i.e. sonewconn() was creating an invalid data
  structure.

 You're right... I missed that; I was thinking too hard on the other
 situations (e.g. soabort()) that could trigger that code, and no
 enough on the code itself.

  The call in sonewconn() used to be to sodealloc(), which didn't care
  about whether or not the data structure was self-consistent.  The code
  was refactored to do reference counting, but the fact that the socket
  was inconsistent at that point wasn't noticed until now.

 Yeah; I looked at doing a ref() of the thing as a partial fix,
 but the unref() did the sotryfree() anyway.


  The problem is not at all based on what happens in the allocation or
  protocol attach failure cases.  The SYN cache is not involved, this is
  a bug in sonewconn(), plain and simple.

 I still think there is a potential failure case, but the amount of
 code you'd have to read through to follow it is immense.  It has to
 do with the conection completing at NETISR, instead of in a process
 context, in the allocation failure case.  I ran into the same issue
 when trying to run connections to completion up to the accept() at
 interrupt, in the LRP case.  The SYN cache case is very similar, in
 the case of a cookie that hits when there are no resources remaining.
 He might be able to trigger it with his setup, by setting the cache
 size way, way don, and thus relying on cookies, and then flooding it
 with conection requests until he runs it out of resources.

Do I read you correctly that Bill's patch is probably better than yours
(I tested both, both fix the problem)?

If you still believe there's a problem (bug) I may trigger with some
setting please tell me. I don't know how to make syncookies kick in - I
set net.inet.tcp.cachelimit to 100 but it doesn't seem to make a
difference but I don't know what am I doing :-). I imagine syncache
doesn't grow much when I'm connecting from signle IP and connections are quickly
eastablished. I'll be able to do some tests on monday - this is a computer
at work.

FWIW netstat -m during the benchmark run shows (I read it that it doesn't
have problem - even just before the crash):

mbuf usage:
GEN list:   0/0 (in use/in pool)
CPU #0 list:71/160 (in use/in pool)
CPU #1 list:79/160 (in use/in pool)
Total:  150/320 (in use/in pool)
Maximum number allowed on each CPU list: 512
Maximum possible: 34560
Allocated mbuf types:
  80 mbufs allocated to data
  70 mbufs allocated to packet headers
0% of mbuf map consumed
mbuf cluster usage:
GEN list:   0/0 (in use/in pool)
CPU #0 list:38/114 (in use/in pool)
CPU #1 list:41/104 (in use/in pool)
Total:  79/218 (in use/in pool)
Maximum number allowed on each CPU list: 128
Maximum possible: 17280
1% of cluster map consumed
516 KBytes of wired memory reserved (37% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines


-- 
Michal Mertl
[EMAIL PROTECTED]









To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Terry Lambert
Michal Mertl wrote:
 This patch fixes the panics for me. Thanks a lot. I believe it should be
 commited.

I agree (Mark Murray -- this was the patch I was talking about).

 BTW: I get about 850 fetches pers second on UP an 600 SMP (the same
 machine and settings). Don't know if it's expected in this usage pattern.

It's expected.

Fetches per second isn't a very good benchmark, FWIW.  It doesn't
tell us how to repeat it.

A better measure is connections per second (at least for a server
box).

With proper tuning, and some minor patches, 7000/second isn't hard
to get.

If you add the Duke University version of the Rice University patches
for LRP, modify the mbuf allocator for static freelisting and then
pre-populate it, and tune the kernel properly, you should be able to
get over 20,000 connections per second.  The best I've managed with a
modified FreeBSD 4.2, before the SYN-cache code, was 32,000/second.

Use MAST or http_load on a number of simultaneous clients to get
in the neighborhood of those numbers.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Terry Lambert
Michal Mertl wrote:
 Do I read you correctly that Bill's patch is probably better than yours
 (I tested both, both fix the problem)?

That's a hard question, and it takes a longer answer.  8-(.

They fix the problem different ways.  The problem is actually
a secondary effect.  There are several ways to trigger it.  Mine
fixes it by initializing the socket to a valid value on the list,
and Bill's fixes it by initializing it to a valid value off the
list.

Mine will fail under load when the protocol attach fails; the way
it works is that the protocol attach succeeds before the soreserve()
fails, so it's possible to undo the attach, which happens in the
sotryfree().  It's a good fix because it ups the reference count,
and destroys the socket normally (in the caller) on failure.

Bill's won't fail when the protocol attach fails, but it will fail
under other conditions.  For example, if you were to up the amount
of physical RAM in your box, Bill's might start failing, or if you
up'ed the mbuf allocations by manually tuning them larger, Bill's
would definititely fail when you ran out of mbuf clusters, but not
mbufs.  Both of these failures require you to hit the cookie code
(the SYN-cache load getting too high).

Both of them are poor workarounds for a problem, which is really
that some of the code that's being called by the SYN-cache code
to do delayed allocation of resources until a matching ACK, was
never written to be callable at NETISR, and the allocation occurs
in the wrong order.

Bill's fix is marginally better, because it will handle one more
case than mine (but I believe it will actually leak sockets on the
failure case, when you are at resource starvation).

Both of them are voodoo: they rely on causing a different side
effect of a side effect.  As voodoo goes, Bill's is marginally
less invisible than mine, so I've suggested that Mark Murray
commit Bill's, instead of mine, but without reading the code,
just seeing either patch, no one would know what the heck the
patch was intended to do, or why it was needed at all... both
of them look like you are gratuitously moving code around for no
good reason.  8-).


 If you still believe there's a problem (bug) I may trigger with some
 setting please tell me. I don't know how to make syncookies kick in - I
 set net.inet.tcp.cachelimit to 100 but it doesn't seem to make a
 difference but I don't know what am I doing :-). I imagine syncache
 doesn't grow much when I'm connecting from signle IP and connections are quickly
 eastablished. I'll be able to do some tests on monday - this is a computer
 at work.

The problem is that you've tuned your kernel for more committed
memory than you actually have available... you are overcommiting
socket receive buffers (actually, 16K sockets at the current default
would need a full 4G of physical RAM, if there weren't overcommit).

The real fix would be to make the code insensitive to allocation
failures at all points in the process.  Like I said before, it would
require passing the address of the 'so' pointer to one of the underlying
functions, so that all the initialization could be done in one place
(the attach routine would be best).  This would change the protocol
interface for all the protocols, so it's a hard change to sell.


If you want to cause your kernel to freak, even with Bill's patch,
in your kernel config file, increase the number of mbuf's, but not
the number of mbuf clusters (manually tune up the number of mbufs).
This is a boundary failure, and it's possible to cause it to happen
anyway, just by adding RAM, now that Matt Dillon's auto-tuning code
has gone in (the ratio of increase for more RAM is not 1:1 for these
resources).

If you want to see it die slowly, run it at high load; you should
see from vmstat -m that, for every allocation failure on an
incoming connection, you leak a SYN cache entry and an associated
socket structure.  Eventually, your box will lock up, but you may
have to run a week or more to get it to do that, unless you have a
gigabit NIC, and can keep it saturated with connect requests (then
it should lock up in about 36 hours).  With my patch, instead of
locking up, it panic's again (I guess that's a plus, if you prefer
it to reboot and start working again, and don't have a status
monitor daemon on another box that can force the reboot).

If you want it to panic with my patch, tune the number of maxfiles
way, way up.  When the in_pcballoc() fails in tcp_attach, then it
will blow up (say around 40,000 connections or so).  If you try this,
remember that the sysctl for maxfiles is useless for networking
connections: you have to tune in the boot loader, or in the kernel
config for the tuning to have the correct effect on the number of
network connections.

Actually, if you look at the tcp_attach() code in tcp_usrreq.c,
you'll see that it also calls soreserve(); probably, the soreserve()
in the sonewconn() code is plain bogus, but I didn't want to remove
it in the 0/0 case for high/low 

Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Bill Fenner

Terry,

  I think most of your 9k of reasoning is based on the thought that
soreserve() allocates memory.  It doesn't, and never has.

  Bill

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Bill Fenner

Michal,

  Alan Cox pointed out to me that backing out to using sodealloc()
instead of sotryfree() is probably a better fix anyway - it solves
the panic in more or less the same way as mine, but it backs the code
out to be the same as it's been for years -- a much more well-tested
fix =)  He committed it this morning, so could you please test an
up to date -CURRENT (rev 1.105 of uipc_socket2.c) without my patch?

Thanks,
  Bill

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Galen Sampson
Hi

--- Terry Lambert [EMAIL PROTECTED] wrote:
 With proper tuning, and some minor patches, 7000/second isn't hard
 to get.
 
 If you add the Duke University version of the Rice University patches
 for LRP, modify the mbuf allocator for static freelisting and then
 pre-populate it, and tune the kernel properly, you should be able to
 get over 20,000 connections per second.  The best I've managed with a
 modified FreeBSD 4.2, before the SYN-cache code, was 32,000/second.

Out of pure curiosity what is the reason that the Duke and Rice patches were
never incorporated into the base system.  If it really enables the same machine
to provide 4 times the number of connections this seems like it would be a
useful thing to include.

regards,
Galen Sampson

__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: crash with network load (in tcp syncache ?)

2002-11-02 Thread Don Bowman
 From: Galen Sampson [mailto:galen_sampson;yahoo.com]
 
 Out of pure curiosity what is the reason that the Duke and 
 Rice patches were
 never incorporated into the base system.  If it really 
 enables the same machine
 to provide 4 times the number of connections this seems like 
 it would be a
 useful thing to include.

I suspect because of the copyright:
http://www.cs.rice.edu/CS/Systems/ScalaServer/code/rescon-lrp/README.html

This code is copyrighted software and can NOT be redistributed

--don ([EMAIL PROTECTED] www.sandvine.com)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Terry Lambert
Bill Fenner wrote:
   I think most of your 9k of reasoning is based on the thought that
 soreserve() allocates memory.  It doesn't, and never has.

The real problem is the in_pcballoc() in tcp_attach(), which calls
uma_zalloc().

But for mbufs, soreserve allocates space, for potential mbufs, even
though it does not itself allocate mbufs.  I didn't want to go through
the whole state machine to explain the additional failure modes, so I
simplified.

Consider the case where you receive network interrupts for packets
containing data on partially complete, or in-the-process-of sockets,
while you are in the middle of running the NETISR.

This code was not written to be reentrant in the failure case, and
the SYN cache adds this requirement.

The only safe way to do this, with the code unmodified, is to hold
splbio() around the calls, and split the if test I modified into
an if .. if..else .. else if..else.

Even then, it doesn't make one call that give a binary success/fail.

As evidence, I offer that my fix works, too, by changing the order of
operation.  8-).



Note that I'm not complaining about your fix any more than I'm
complaining about mine -- in fact, I have stated repeatedly for
the record that your fix has one less failure mode than my fix,
and should be committed.

Potentially, both of them should be committed (for the SYN cache
disable case, mine suppresses another panic, for the other, yours
suppresses a different panic, and enabled is the common case).

It's just that neither address the real problem, they just suppress
the side effect of a side effect, in different ways.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Terry Lambert
Galen Sampson wrote:
  With proper tuning, and some minor patches, 7000/second isn't hard
  to get.
 
  If you add the Duke University version of the Rice University patches
  for LRP, modify the mbuf allocator for static freelisting and then
  pre-populate it, and tune the kernel properly, you should be able to
  get over 20,000 connections per second.  The best I've managed with a
  modified FreeBSD 4.2, before the SYN-cache code, was 32,000/second.
 
 Out of pure curiosity what is the reason that the Duke and Rice patches were
 never incorporated into the base system.  If it really enables the same machine
 to provide 4 times the number of connections this seems like it would be a
 useful thing to include.

To be accurate, it's 3X.  The 4X number requires a different
kernel memory allocator for mbufs, which my employer at the
time did not permit me to publish the code for, though the
idea has plenty of prior art (back to 1992 and DEC WRL).

The current Rice (FreeBSD 4.0) and Duke patches (FreeBSD 4.4)
require executing a technology transfer license in order to be
able to use them commercially; technically, they have license
restrictions incompatible with FreeBSD.

When the code was first offerred to the project (FreeBSD 2.2),
the project never integrated the code.  I don't know why they
didn't make it in, then.

FWIW, I personally dislike the rescon -- Resource Container
-- code in the newer implementation; for embedded devices, it's
not important to account overhead to a particular process, I
think.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Terry Lambert
Don Bowman wrote:
 I suspect because of the copyright:
 http://www.cs.rice.edu/CS/Systems/ScalaServer/code/rescon-lrp/README.html
 
 This code is copyrighted software and can NOT be redistributed

That explains why the new code was integrate, not why the old code
wasn't, when it was initially offered by Rice.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-02 Thread Bill Fenner

I really don't understand why you keep claiming that the SYN cache
changes anything.  Without the SYN cache, tcp_input() calls
sonewconn(so, 0) on receipt of a SYN; with the SYN cache, tcp_input()
calls some syncache function which calls sonewconn(so, SS_ISCONNECTED)
on receipt of a SYN/ACK.  In either case, it's with the same interrupt
level, etc -- you are in the middle of processing a packet that was
received by tcp_input().

So, you're saying that what we're hitting is a design flaw in 4BSD
and that this problem has been there since day one?

  Bill

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



crash with network load (in tcp syncache ?)

2002-11-01 Thread Michal Mertl
I'm getting panics on SMP -CURRENT while running apachebench (binary ab
from apache distribution, not the Perl one) against httpd on the machine.

The panics don't occur when I have WITNESS and INVARIANTS turned on.

I'm running apache server from ports with no special configuration. I'm
running (on different machine) ./ab -c 10 http://host/index.html.

The panics occur after bit more than the number of connections I have set
as kern.ipc.maxsockets (17000 by default, 3 incresed) - probably
because the first ones expire (when I set net.inet.tcp.msl to 5000 I
can't make it panic - the number of tcpcb in vm.zone doesn't grow past
about 6500 - it doesn't go much down even long after the benchmark run
finishes but that's ok I suppose).

I had

while 1;
sysctl -a |grep tcpcb
sleep 1
end

running and the output was like this


tcpcb:   604,3,  28551, 57,28354
tcpcb:   604,3,  29301, 57,29104
tcpcb:   604,3,  29926, 56,29729
- and then panic into ddb with backtrace below (the one posted here is
actually from kern.ipc.maxsockets being 17000)

My kernel config is basically GENERIC with stripped HW the machine doesn't
dontain.

- verbose booting -

/boot/kernel/kernel text=0x209c10 data=0x2ae18+0x3d54c syms=[0x4+0x2da40+0x4+0x37495]

Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [/boot/kernel/kernel]...
/boot/kernel/acpi.ko text=0x380ec data=0x1a38+0xae8 syms=[0x4+0x56e0+0x4+0x733b]
SMAP type=01 base=  len= 0009f000
SMAP type=02 base= 0009f000 len= 1000
SMAP type=02 base= 000f len= 0001
SMAP type=01 base= 0010 len= 1fefd000
SMAP type=03 base= 1fffd000 len= 2000
SMAP type=04 base= 1000 len= 1000
SMAP type=02 base= fec0 len= 1000
SMAP type=02 base= fee0 len= 1000
SMAP type=02 base=  len= 0001
Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #0: Thu Oct 31 15:34:37 CET 2002
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/TESTIK
Preloaded elf kernel /boot/kernel/kernel at 0xc0422000.
Preloaded elf module /boot/kernel/acpi.ko at 0xc04220a8.
Calibrating clock(s) ... TSC clock: 751634930 Hz, i8254 clock: 1193071 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter i8254  frequency 1193182 Hz
CLK_USE_TSC_CALIBRATION not specified - using old calibration method
CPU: Pentium III/Pentium III Xeon/Celeron (751.71-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x683  Stepping = 3
  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
real memory  = 536858624 (524276K bytes)
Physical memory chunk(s):
0x1000 - 0x0009dfff, 643072 bytes (157 pages)
0x0044c000 - 0x1ffbcfff, 532090880 bytes (129905 pages)
avail memory = 515866624 (503776K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 - irq 0
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): apic id:  1, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  0, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec0
bios32: Found BIOS32 Service Directory header at 0xc00f9e20
bios32: Entry = 0xf0530 (c00f0530)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xf+0x730
pnpbios: Found PnP BIOS data at 0xc00fd270
pnpbios: Entry = f:d2a0  Rev = 1.0
pnpbios: OEM ID cd041
Other BIOS signatures found:
Initializing GEOMetry subsystem
null: null device, zero device
random: entropy source
mem: memory  I/O
Pentium Pro MTRR support enabled
SMP: CPU0 bsp_apic_configure():
 lint0: 0x00010700 lint1: 0x0400 TPR: 0x0010 SVR: 0x01ff
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: ASUS   P2B-Don motherboard
pci_open(1):mode 1 addr port (0x0cf8) is 0x80002358
pci_open(1a):   mode1res=0x8000 (0x8000)
pci_cfgcheck:   device 0 [class=06] [hdr=00] is there (id=71908086)
Using $PIR table, 6 entries at 0xc00f0d20
PCI-Only Interrupts: none
Location  Bus Device Pin  Link  IRQs
slot 1  0   12A   0x60  3 4 5 7 9 10 11 12
slot 1  0   12B   0x61  3 4 5 7 9 10 11 12
slot 1  0   12C   0x62  3 4 5 7 9 10 11 12
slot 1  0   12D   0x63  3 4 5 7 9 10 11 12
slot 2  0   11A   0x61  3 4 5 7 9 10 11 12
slot 2  0   11B   0x62  3 4 5 7 9 10 11 12
slot 2  0   11C   0x63  3 4 5 7 9 10 11 12
slot 2  0   11D   0x60  3 4 5 7 9 10 11 12
slot 3  0   10A   0x62  3 4 5 7 9 10 11 12
slot 3  0   10B   0x63  3 4 5 7 9 10 11 12
slot 3  0   10C   0x60  3 4 5 7 9 10 11 12
slot 3  0   10D   0x61  3 4 5 7 9 10 11 12
slot 4  09A   0x63  3 4 5 7 9 10 11 

[PATCH]Re: crash with network load (in tcp syncache ?)

2002-11-01 Thread Terry Lambert
Michal Mertl wrote:
 I'm getting panics on SMP -CURRENT while running apachebench (binary ab
 from apache distribution, not the Perl one) against httpd on the machine.
 
 The panics don't occur when I have WITNESS and INVARIANTS turned on.

[ ... ]

 #10 0xc01bd46f in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:503
 #11 0xc01f7e1e in sofree (so=0xc58f05d0) at
 /usr/src/sys/kern/uipc_socket.c:312
 #12 0xc01fa508 in sonewconn (head=0xc43874d8, connstatus=2)
 at /usr/src/sys/kern/uipc_socket2.c:208
 #13 0xc023f42f in syncache_socket (sc=0x2, lso=0xc43874d8, m=0xc1662200)
 at /usr/src/sys/netinet/tcp_syncache.c:564
 #14 0xc023f748 in syncache_expand (inc=0xd6a62b3c, th=0xc1f6c834,
 sop=0xd6a62b10, m=0xc1662200)
 /usr/src/sys/netinet/tcp_syncache.c:783
 #15 0xc0239978 in tcp_input (m=0xc1f6c834, off0=20)
 at /usr/src/sys/netinet/tcp_input.c:713


soreserve is called to get mbufs reserved to the socket, and
sbreserve is called, and this fails, because you have too few
mbufs in your system for the number of connections you have
configured.

This is a problem because the sotryfree() in sonewconn() (see
the definition in sys/socketvar.h) sees a so_count of zero, and
calls sofree() directly.

The sofree() fails because the socket is not enqueued as being
an incomplete connection, and not enqueued as being a complete
connection (not on a queue, and so_state does not have SS_INCOMP
or SS_COMP flags set).

Basically, this code dies not expect to be called in this case,
and the call occurs because the SYN cache code runs at NETISR.

Personally, I do not understand why a prereservation for mbufs
is necessary in this particular case: if you are out of mbufs,
the packets should end up dropped, in any case, so it should not
matter.  I guess it's an attempt to protect you from massive
connection attempts acting as a denial of service attack.

One fix would be to reference the socket before making the
call, in syncache_socket().  The basically correct way to do
this would be to invert the order of the if test in sonewconn()
(see attached patch).

This can also fail, though: if the protocol attach fails, then
it will still panic.  Also, if the protocol attach doesn't fail,
and there's an soabort(), if the protocol detach fails, it will
still call sotryfree() in the abort... and, once again, panic.

My suggestion:

1)  Try the attached patch; it will probably cover up the
problem for you.

2)  Make sure you don't put the number of connections you
allow to be larger than the number of mbufs, divided
by 2, divided by the number of mbufs you have set in
the net.inet.tcp.recvspace (i.e.: Do Not Overcommit
Mbufs).

3)  Disable the use of SYN cookies, e.g.:

sysctl net.inet.tcp.syncookies=0

SYN cookies are incredibly evil, and will put pressure
on your resources by drastically increasing pool retention
time, if they end up being invoked.

-- Terry
Index: uipc_socket2.c
===
RCS file: /cvs/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.104
diff -c -r1.104 uipc_socket2.c
*** uipc_socket2.c  18 Sep 2002 19:44:11 -  1.104
--- uipc_socket2.c  1 Nov 2002 17:16:39 -
***
*** 203,210 
  #ifdef MAC
mac_create_socket_from_socket(head, so);
  #endif
!   if (soreserve(so, head-so_snd.sb_hiwat, head-so_rcv.sb_hiwat) ||
!   (*so-so_proto-pr_usrreqs-pru_attach)(so, 0, NULL)) {
sotryfree(so);
return ((struct socket *)0);
}
--- 203,210 
  #ifdef MAC
mac_create_socket_from_socket(head, so);
  #endif
!   if ((*so-so_proto-pr_usrreqs-pru_attach)(so, 0, NULL) ||
!   soreserve(so, head-so_snd.sb_hiwat, head-so_rcv.sb_hiwat)) {
sotryfree(so);
return ((struct socket *)0);
}



Re: crash with network load (in tcp syncache ?)

2002-11-01 Thread Bill Fenner
sonewconn() hands sofree() a self-inconsistent socket -- so-so_head is
set, so so must be on a queue, but sonewconn() hasn't put it on a queue yet.
Please try this patch.

  Bill

Index: uipc_socket2.c
===
RCS file: /home/ncvs/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.104
diff -u -r1.104 uipc_socket2.c
--- uipc_socket2.c  18 Sep 2002 19:44:11 -  1.104
+++ uipc_socket2.c  1 Nov 2002 22:40:52 -
@@ -192,7 +192,7 @@
return ((struct socket *)0);
if ((head-so_options  SO_ACCEPTFILTER) != 0)
connstatus = 0;
-   so-so_head = head;
+   so-so_head = NULL;
so-so_type = head-so_type;
so-so_options = head-so_options ~ SO_ACCEPTCONN;
so-so_linger = head-so_linger;
@@ -209,6 +209,7 @@
return ((struct socket *)0);
}
 
+   so-so_head = head;
if (connstatus) {
TAILQ_INSERT_TAIL(head-so_comp, so, so_list);
so-so_state |= SS_COMP;

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-01 Thread Terry Lambert
Bill Fenner wrote:
 sonewconn() hands sofree() a self-inconsistent socket -- so-so_head is
 set, so so must be on a queue, but sonewconn() hasn't put it on a queue yet.
 Please try this patch.

I think this can still crash (just like my patch); the problem is in
what happens when it fails to allocate memory.  Unless you set one of
the flags, it's still going to panic in the same place, I think, when
you run out of memory.

The code that the SYN-cache uses really needs to be refactored, and
seperated out.

The problem is that the delay in allocation is intentional, to permit
the cache to deal with lighter weight instances, until it decides to
actually create a connection.

There's not a clean way to do this, really, without passing the address
of the socket pointer down, and having lower level code fill it out, I
think.  8-(.

The problem is definitely based on what happens in the allocation or
protocol attach failure cases.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-01 Thread Bill Fenner

I think this can still crash (just like my patch); the problem is in
what happens when it fails to allocate memory.  Unless you set one of
the flags, it's still going to panic in the same place, I think, when
you run out of memory.

No.  The flags are only checked when so_head is not NULL.  sonewconn()
was handing sofree() an inconsistent struct so (so_head was set without
being on either queue), i.e. sonewconn() was creating an invalid data
structure.

The call in sonewconn() used to be to sodealloc(), which didn't care
about whether or not the data structure was self-consistent.  The code
was refactored to do reference counting, but the fact that the socket
was inconsistent at that point wasn't noticed until now.

The problem is not at all based on what happens in the allocation or
protocol attach failure cases.  The SYN cache is not involved, this is
a bug in sonewconn(), plain and simple.

  Bill

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: crash with network load (in tcp syncache ?)

2002-11-01 Thread Terry Lambert
Bill Fenner wrote:
 I think this can still crash (just like my patch); the problem is in
 what happens when it fails to allocate memory.  Unless you set one of
 the flags, it's still going to panic in the same place, I think, when
 you run out of memory.
 
 No.  The flags are only checked when so_head is not NULL.  sonewconn()
 was handing sofree() an inconsistent struct so (so_head was set without
 being on either queue), i.e. sonewconn() was creating an invalid data
 structure.

You're right... I missed that; I was thinking too hard on the other
situations (e.g. soabort()) that could trigger that code, and no
enough on the code itself.

 The call in sonewconn() used to be to sodealloc(), which didn't care
 about whether or not the data structure was self-consistent.  The code
 was refactored to do reference counting, but the fact that the socket
 was inconsistent at that point wasn't noticed until now.

Yeah; I looked at doing a ref() of the thing as a partial fix,
but the unref() did the sotryfree() anyway.


 The problem is not at all based on what happens in the allocation or
 protocol attach failure cases.  The SYN cache is not involved, this is
 a bug in sonewconn(), plain and simple.

I still think there is a potential failure case, but the amount of
code you'd have to read through to follow it is immense.  It has to
do with the conection completing at NETISR, instead of in a process
context, in the allocation failure case.  I ran into the same issue
when trying to run connections to completion up to the accept() at
interrupt, in the LRP case.  The SYN cache case is very similar, in
the case of a cookie that hits when there are no resources remaining.
He might be able to trigger it with his setup, by setting the cache
size way, way don, and thus relying on cookies, and then flooding it
with conection requests until he runs it out of resources.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message