Re: crash with network load (in tcp syncache ?)
On Fri, 1 Nov 2002, Bill Fenner wrote: sonewconn() hands sofree() a self-inconsistent socket -- so-so_head is set, so so must be on a queue, but sonewconn() hasn't put it on a queue yet. Please try this patch. Bill Index: uipc_socket2.c === RCS file: /home/ncvs/src/sys/kern/uipc_socket2.c,v retrieving revision 1.104 diff -u -r1.104 uipc_socket2.c --- uipc_socket2.c18 Sep 2002 19:44:11 - 1.104 +++ uipc_socket2.c1 Nov 2002 22:40:52 - @@ -192,7 +192,7 @@ return ((struct socket *)0); if ((head-so_options SO_ACCEPTFILTER) != 0) connstatus = 0; - so-so_head = head; + so-so_head = NULL; so-so_type = head-so_type; so-so_options = head-so_options ~ SO_ACCEPTCONN; so-so_linger = head-so_linger; @@ -209,6 +209,7 @@ return ((struct socket *)0); } + so-so_head = head; if (connstatus) { TAILQ_INSERT_TAIL(head-so_comp, so, so_list); so-so_state |= SS_COMP; This patch fixes the panics for me. Thanks a lot. I believe it should be commited. BTW: I get about 850 fetches pers second on UP an 600 SMP (the same machine and settings). Don't know if it's expected in this usage pattern. -- Michal Mertl [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
On Fri, 1 Nov 2002, Terry Lambert wrote: Bill Fenner wrote: I think this can still crash (just like my patch); the problem is in what happens when it fails to allocate memory. Unless you set one of the flags, it's still going to panic in the same place, I think, when you run out of memory. No. The flags are only checked when so_head is not NULL. sonewconn() was handing sofree() an inconsistent struct so (so_head was set without being on either queue), i.e. sonewconn() was creating an invalid data structure. You're right... I missed that; I was thinking too hard on the other situations (e.g. soabort()) that could trigger that code, and no enough on the code itself. The call in sonewconn() used to be to sodealloc(), which didn't care about whether or not the data structure was self-consistent. The code was refactored to do reference counting, but the fact that the socket was inconsistent at that point wasn't noticed until now. Yeah; I looked at doing a ref() of the thing as a partial fix, but the unref() did the sotryfree() anyway. The problem is not at all based on what happens in the allocation or protocol attach failure cases. The SYN cache is not involved, this is a bug in sonewconn(), plain and simple. I still think there is a potential failure case, but the amount of code you'd have to read through to follow it is immense. It has to do with the conection completing at NETISR, instead of in a process context, in the allocation failure case. I ran into the same issue when trying to run connections to completion up to the accept() at interrupt, in the LRP case. The SYN cache case is very similar, in the case of a cookie that hits when there are no resources remaining. He might be able to trigger it with his setup, by setting the cache size way, way don, and thus relying on cookies, and then flooding it with conection requests until he runs it out of resources. Do I read you correctly that Bill's patch is probably better than yours (I tested both, both fix the problem)? If you still believe there's a problem (bug) I may trigger with some setting please tell me. I don't know how to make syncookies kick in - I set net.inet.tcp.cachelimit to 100 but it doesn't seem to make a difference but I don't know what am I doing :-). I imagine syncache doesn't grow much when I'm connecting from signle IP and connections are quickly eastablished. I'll be able to do some tests on monday - this is a computer at work. FWIW netstat -m during the benchmark run shows (I read it that it doesn't have problem - even just before the crash): mbuf usage: GEN list: 0/0 (in use/in pool) CPU #0 list:71/160 (in use/in pool) CPU #1 list:79/160 (in use/in pool) Total: 150/320 (in use/in pool) Maximum number allowed on each CPU list: 512 Maximum possible: 34560 Allocated mbuf types: 80 mbufs allocated to data 70 mbufs allocated to packet headers 0% of mbuf map consumed mbuf cluster usage: GEN list: 0/0 (in use/in pool) CPU #0 list:38/114 (in use/in pool) CPU #1 list:41/104 (in use/in pool) Total: 79/218 (in use/in pool) Maximum number allowed on each CPU list: 128 Maximum possible: 17280 1% of cluster map consumed 516 KBytes of wired memory reserved (37% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines -- Michal Mertl [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Michal Mertl wrote: This patch fixes the panics for me. Thanks a lot. I believe it should be commited. I agree (Mark Murray -- this was the patch I was talking about). BTW: I get about 850 fetches pers second on UP an 600 SMP (the same machine and settings). Don't know if it's expected in this usage pattern. It's expected. Fetches per second isn't a very good benchmark, FWIW. It doesn't tell us how to repeat it. A better measure is connections per second (at least for a server box). With proper tuning, and some minor patches, 7000/second isn't hard to get. If you add the Duke University version of the Rice University patches for LRP, modify the mbuf allocator for static freelisting and then pre-populate it, and tune the kernel properly, you should be able to get over 20,000 connections per second. The best I've managed with a modified FreeBSD 4.2, before the SYN-cache code, was 32,000/second. Use MAST or http_load on a number of simultaneous clients to get in the neighborhood of those numbers. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Michal Mertl wrote: Do I read you correctly that Bill's patch is probably better than yours (I tested both, both fix the problem)? That's a hard question, and it takes a longer answer. 8-(. They fix the problem different ways. The problem is actually a secondary effect. There are several ways to trigger it. Mine fixes it by initializing the socket to a valid value on the list, and Bill's fixes it by initializing it to a valid value off the list. Mine will fail under load when the protocol attach fails; the way it works is that the protocol attach succeeds before the soreserve() fails, so it's possible to undo the attach, which happens in the sotryfree(). It's a good fix because it ups the reference count, and destroys the socket normally (in the caller) on failure. Bill's won't fail when the protocol attach fails, but it will fail under other conditions. For example, if you were to up the amount of physical RAM in your box, Bill's might start failing, or if you up'ed the mbuf allocations by manually tuning them larger, Bill's would definititely fail when you ran out of mbuf clusters, but not mbufs. Both of these failures require you to hit the cookie code (the SYN-cache load getting too high). Both of them are poor workarounds for a problem, which is really that some of the code that's being called by the SYN-cache code to do delayed allocation of resources until a matching ACK, was never written to be callable at NETISR, and the allocation occurs in the wrong order. Bill's fix is marginally better, because it will handle one more case than mine (but I believe it will actually leak sockets on the failure case, when you are at resource starvation). Both of them are voodoo: they rely on causing a different side effect of a side effect. As voodoo goes, Bill's is marginally less invisible than mine, so I've suggested that Mark Murray commit Bill's, instead of mine, but without reading the code, just seeing either patch, no one would know what the heck the patch was intended to do, or why it was needed at all... both of them look like you are gratuitously moving code around for no good reason. 8-). If you still believe there's a problem (bug) I may trigger with some setting please tell me. I don't know how to make syncookies kick in - I set net.inet.tcp.cachelimit to 100 but it doesn't seem to make a difference but I don't know what am I doing :-). I imagine syncache doesn't grow much when I'm connecting from signle IP and connections are quickly eastablished. I'll be able to do some tests on monday - this is a computer at work. The problem is that you've tuned your kernel for more committed memory than you actually have available... you are overcommiting socket receive buffers (actually, 16K sockets at the current default would need a full 4G of physical RAM, if there weren't overcommit). The real fix would be to make the code insensitive to allocation failures at all points in the process. Like I said before, it would require passing the address of the 'so' pointer to one of the underlying functions, so that all the initialization could be done in one place (the attach routine would be best). This would change the protocol interface for all the protocols, so it's a hard change to sell. If you want to cause your kernel to freak, even with Bill's patch, in your kernel config file, increase the number of mbuf's, but not the number of mbuf clusters (manually tune up the number of mbufs). This is a boundary failure, and it's possible to cause it to happen anyway, just by adding RAM, now that Matt Dillon's auto-tuning code has gone in (the ratio of increase for more RAM is not 1:1 for these resources). If you want to see it die slowly, run it at high load; you should see from vmstat -m that, for every allocation failure on an incoming connection, you leak a SYN cache entry and an associated socket structure. Eventually, your box will lock up, but you may have to run a week or more to get it to do that, unless you have a gigabit NIC, and can keep it saturated with connect requests (then it should lock up in about 36 hours). With my patch, instead of locking up, it panic's again (I guess that's a plus, if you prefer it to reboot and start working again, and don't have a status monitor daemon on another box that can force the reboot). If you want it to panic with my patch, tune the number of maxfiles way, way up. When the in_pcballoc() fails in tcp_attach, then it will blow up (say around 40,000 connections or so). If you try this, remember that the sysctl for maxfiles is useless for networking connections: you have to tune in the boot loader, or in the kernel config for the tuning to have the correct effect on the number of network connections. Actually, if you look at the tcp_attach() code in tcp_usrreq.c, you'll see that it also calls soreserve(); probably, the soreserve() in the sonewconn() code is plain bogus, but I didn't want to remove it in the 0/0 case for high/low
Re: crash with network load (in tcp syncache ?)
Terry, I think most of your 9k of reasoning is based on the thought that soreserve() allocates memory. It doesn't, and never has. Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Michal, Alan Cox pointed out to me that backing out to using sodealloc() instead of sotryfree() is probably a better fix anyway - it solves the panic in more or less the same way as mine, but it backs the code out to be the same as it's been for years -- a much more well-tested fix =) He committed it this morning, so could you please test an up to date -CURRENT (rev 1.105 of uipc_socket2.c) without my patch? Thanks, Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Hi --- Terry Lambert [EMAIL PROTECTED] wrote: With proper tuning, and some minor patches, 7000/second isn't hard to get. If you add the Duke University version of the Rice University patches for LRP, modify the mbuf allocator for static freelisting and then pre-populate it, and tune the kernel properly, you should be able to get over 20,000 connections per second. The best I've managed with a modified FreeBSD 4.2, before the SYN-cache code, was 32,000/second. Out of pure curiosity what is the reason that the Duke and Rice patches were never incorporated into the base system. If it really enables the same machine to provide 4 times the number of connections this seems like it would be a useful thing to include. regards, Galen Sampson __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RE: crash with network load (in tcp syncache ?)
From: Galen Sampson [mailto:galen_sampson;yahoo.com] Out of pure curiosity what is the reason that the Duke and Rice patches were never incorporated into the base system. If it really enables the same machine to provide 4 times the number of connections this seems like it would be a useful thing to include. I suspect because of the copyright: http://www.cs.rice.edu/CS/Systems/ScalaServer/code/rescon-lrp/README.html This code is copyrighted software and can NOT be redistributed --don ([EMAIL PROTECTED] www.sandvine.com) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Bill Fenner wrote: I think most of your 9k of reasoning is based on the thought that soreserve() allocates memory. It doesn't, and never has. The real problem is the in_pcballoc() in tcp_attach(), which calls uma_zalloc(). But for mbufs, soreserve allocates space, for potential mbufs, even though it does not itself allocate mbufs. I didn't want to go through the whole state machine to explain the additional failure modes, so I simplified. Consider the case where you receive network interrupts for packets containing data on partially complete, or in-the-process-of sockets, while you are in the middle of running the NETISR. This code was not written to be reentrant in the failure case, and the SYN cache adds this requirement. The only safe way to do this, with the code unmodified, is to hold splbio() around the calls, and split the if test I modified into an if .. if..else .. else if..else. Even then, it doesn't make one call that give a binary success/fail. As evidence, I offer that my fix works, too, by changing the order of operation. 8-). Note that I'm not complaining about your fix any more than I'm complaining about mine -- in fact, I have stated repeatedly for the record that your fix has one less failure mode than my fix, and should be committed. Potentially, both of them should be committed (for the SYN cache disable case, mine suppresses another panic, for the other, yours suppresses a different panic, and enabled is the common case). It's just that neither address the real problem, they just suppress the side effect of a side effect, in different ways. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Galen Sampson wrote: With proper tuning, and some minor patches, 7000/second isn't hard to get. If you add the Duke University version of the Rice University patches for LRP, modify the mbuf allocator for static freelisting and then pre-populate it, and tune the kernel properly, you should be able to get over 20,000 connections per second. The best I've managed with a modified FreeBSD 4.2, before the SYN-cache code, was 32,000/second. Out of pure curiosity what is the reason that the Duke and Rice patches were never incorporated into the base system. If it really enables the same machine to provide 4 times the number of connections this seems like it would be a useful thing to include. To be accurate, it's 3X. The 4X number requires a different kernel memory allocator for mbufs, which my employer at the time did not permit me to publish the code for, though the idea has plenty of prior art (back to 1992 and DEC WRL). The current Rice (FreeBSD 4.0) and Duke patches (FreeBSD 4.4) require executing a technology transfer license in order to be able to use them commercially; technically, they have license restrictions incompatible with FreeBSD. When the code was first offerred to the project (FreeBSD 2.2), the project never integrated the code. I don't know why they didn't make it in, then. FWIW, I personally dislike the rescon -- Resource Container -- code in the newer implementation; for embedded devices, it's not important to account overhead to a particular process, I think. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Don Bowman wrote: I suspect because of the copyright: http://www.cs.rice.edu/CS/Systems/ScalaServer/code/rescon-lrp/README.html This code is copyrighted software and can NOT be redistributed That explains why the new code was integrate, not why the old code wasn't, when it was initially offered by Rice. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
I really don't understand why you keep claiming that the SYN cache changes anything. Without the SYN cache, tcp_input() calls sonewconn(so, 0) on receipt of a SYN; with the SYN cache, tcp_input() calls some syncache function which calls sonewconn(so, SS_ISCONNECTED) on receipt of a SYN/ACK. In either case, it's with the same interrupt level, etc -- you are in the middle of processing a packet that was received by tcp_input(). So, you're saying that what we're hitting is a design flaw in 4BSD and that this problem has been there since day one? Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
crash with network load (in tcp syncache ?)
I'm getting panics on SMP -CURRENT while running apachebench (binary ab from apache distribution, not the Perl one) against httpd on the machine. The panics don't occur when I have WITNESS and INVARIANTS turned on. I'm running apache server from ports with no special configuration. I'm running (on different machine) ./ab -c 10 http://host/index.html. The panics occur after bit more than the number of connections I have set as kern.ipc.maxsockets (17000 by default, 3 incresed) - probably because the first ones expire (when I set net.inet.tcp.msl to 5000 I can't make it panic - the number of tcpcb in vm.zone doesn't grow past about 6500 - it doesn't go much down even long after the benchmark run finishes but that's ok I suppose). I had while 1; sysctl -a |grep tcpcb sleep 1 end running and the output was like this tcpcb: 604,3, 28551, 57,28354 tcpcb: 604,3, 29301, 57,29104 tcpcb: 604,3, 29926, 56,29729 - and then panic into ddb with backtrace below (the one posted here is actually from kern.ipc.maxsockets being 17000) My kernel config is basically GENERIC with stripped HW the machine doesn't dontain. - verbose booting - /boot/kernel/kernel text=0x209c10 data=0x2ae18+0x3d54c syms=[0x4+0x2da40+0x4+0x37495] Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... /boot/kernel/acpi.ko text=0x380ec data=0x1a38+0xae8 syms=[0x4+0x56e0+0x4+0x733b] SMAP type=01 base= len= 0009f000 SMAP type=02 base= 0009f000 len= 1000 SMAP type=02 base= 000f len= 0001 SMAP type=01 base= 0010 len= 1fefd000 SMAP type=03 base= 1fffd000 len= 2000 SMAP type=04 base= 1000 len= 1000 SMAP type=02 base= fec0 len= 1000 SMAP type=02 base= fee0 len= 1000 SMAP type=02 base= len= 0001 Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-CURRENT #0: Thu Oct 31 15:34:37 CET 2002 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/TESTIK Preloaded elf kernel /boot/kernel/kernel at 0xc0422000. Preloaded elf module /boot/kernel/acpi.ko at 0xc04220a8. Calibrating clock(s) ... TSC clock: 751634930 Hz, i8254 clock: 1193071 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter i8254 frequency 1193182 Hz CLK_USE_TSC_CALIBRATION not specified - using old calibration method CPU: Pentium III/Pentium III Xeon/Celeron (751.71-MHz 686-class CPU) Origin = GenuineIntel Id = 0x683 Stepping = 3 Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE real memory = 536858624 (524276K bytes) Physical memory chunk(s): 0x1000 - 0x0009dfff, 643072 bytes (157 pages) 0x0044c000 - 0x1ffbcfff, 532090880 bytes (129905 pages) avail memory = 515866624 (503776K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 - irq 0 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee0 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee0 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec0 bios32: Found BIOS32 Service Directory header at 0xc00f9e20 bios32: Entry = 0xf0530 (c00f0530) Rev = 0 Len = 1 pcibios: PCI BIOS entry at 0xf+0x730 pnpbios: Found PnP BIOS data at 0xc00fd270 pnpbios: Entry = f:d2a0 Rev = 1.0 pnpbios: OEM ID cd041 Other BIOS signatures found: Initializing GEOMetry subsystem null: null device, zero device random: entropy source mem: memory I/O Pentium Pro MTRR support enabled SMP: CPU0 bsp_apic_configure(): lint0: 0x00010700 lint1: 0x0400 TPR: 0x0010 SVR: 0x01ff npx0: math processor on motherboard npx0: INT 16 interface acpi0: ASUS P2B-Don motherboard pci_open(1):mode 1 addr port (0x0cf8) is 0x80002358 pci_open(1a): mode1res=0x8000 (0x8000) pci_cfgcheck: device 0 [class=06] [hdr=00] is there (id=71908086) Using $PIR table, 6 entries at 0xc00f0d20 PCI-Only Interrupts: none Location Bus Device Pin Link IRQs slot 1 0 12A 0x60 3 4 5 7 9 10 11 12 slot 1 0 12B 0x61 3 4 5 7 9 10 11 12 slot 1 0 12C 0x62 3 4 5 7 9 10 11 12 slot 1 0 12D 0x63 3 4 5 7 9 10 11 12 slot 2 0 11A 0x61 3 4 5 7 9 10 11 12 slot 2 0 11B 0x62 3 4 5 7 9 10 11 12 slot 2 0 11C 0x63 3 4 5 7 9 10 11 12 slot 2 0 11D 0x60 3 4 5 7 9 10 11 12 slot 3 0 10A 0x62 3 4 5 7 9 10 11 12 slot 3 0 10B 0x63 3 4 5 7 9 10 11 12 slot 3 0 10C 0x60 3 4 5 7 9 10 11 12 slot 3 0 10D 0x61 3 4 5 7 9 10 11 12 slot 4 09A 0x63 3 4 5 7 9 10 11
[PATCH]Re: crash with network load (in tcp syncache ?)
Michal Mertl wrote: I'm getting panics on SMP -CURRENT while running apachebench (binary ab from apache distribution, not the Perl one) against httpd on the machine. The panics don't occur when I have WITNESS and INVARIANTS turned on. [ ... ] #10 0xc01bd46f in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:503 #11 0xc01f7e1e in sofree (so=0xc58f05d0) at /usr/src/sys/kern/uipc_socket.c:312 #12 0xc01fa508 in sonewconn (head=0xc43874d8, connstatus=2) at /usr/src/sys/kern/uipc_socket2.c:208 #13 0xc023f42f in syncache_socket (sc=0x2, lso=0xc43874d8, m=0xc1662200) at /usr/src/sys/netinet/tcp_syncache.c:564 #14 0xc023f748 in syncache_expand (inc=0xd6a62b3c, th=0xc1f6c834, sop=0xd6a62b10, m=0xc1662200) /usr/src/sys/netinet/tcp_syncache.c:783 #15 0xc0239978 in tcp_input (m=0xc1f6c834, off0=20) at /usr/src/sys/netinet/tcp_input.c:713 soreserve is called to get mbufs reserved to the socket, and sbreserve is called, and this fails, because you have too few mbufs in your system for the number of connections you have configured. This is a problem because the sotryfree() in sonewconn() (see the definition in sys/socketvar.h) sees a so_count of zero, and calls sofree() directly. The sofree() fails because the socket is not enqueued as being an incomplete connection, and not enqueued as being a complete connection (not on a queue, and so_state does not have SS_INCOMP or SS_COMP flags set). Basically, this code dies not expect to be called in this case, and the call occurs because the SYN cache code runs at NETISR. Personally, I do not understand why a prereservation for mbufs is necessary in this particular case: if you are out of mbufs, the packets should end up dropped, in any case, so it should not matter. I guess it's an attempt to protect you from massive connection attempts acting as a denial of service attack. One fix would be to reference the socket before making the call, in syncache_socket(). The basically correct way to do this would be to invert the order of the if test in sonewconn() (see attached patch). This can also fail, though: if the protocol attach fails, then it will still panic. Also, if the protocol attach doesn't fail, and there's an soabort(), if the protocol detach fails, it will still call sotryfree() in the abort... and, once again, panic. My suggestion: 1) Try the attached patch; it will probably cover up the problem for you. 2) Make sure you don't put the number of connections you allow to be larger than the number of mbufs, divided by 2, divided by the number of mbufs you have set in the net.inet.tcp.recvspace (i.e.: Do Not Overcommit Mbufs). 3) Disable the use of SYN cookies, e.g.: sysctl net.inet.tcp.syncookies=0 SYN cookies are incredibly evil, and will put pressure on your resources by drastically increasing pool retention time, if they end up being invoked. -- Terry Index: uipc_socket2.c === RCS file: /cvs/src/sys/kern/uipc_socket2.c,v retrieving revision 1.104 diff -c -r1.104 uipc_socket2.c *** uipc_socket2.c 18 Sep 2002 19:44:11 - 1.104 --- uipc_socket2.c 1 Nov 2002 17:16:39 - *** *** 203,210 #ifdef MAC mac_create_socket_from_socket(head, so); #endif ! if (soreserve(so, head-so_snd.sb_hiwat, head-so_rcv.sb_hiwat) || ! (*so-so_proto-pr_usrreqs-pru_attach)(so, 0, NULL)) { sotryfree(so); return ((struct socket *)0); } --- 203,210 #ifdef MAC mac_create_socket_from_socket(head, so); #endif ! if ((*so-so_proto-pr_usrreqs-pru_attach)(so, 0, NULL) || ! soreserve(so, head-so_snd.sb_hiwat, head-so_rcv.sb_hiwat)) { sotryfree(so); return ((struct socket *)0); }
Re: crash with network load (in tcp syncache ?)
sonewconn() hands sofree() a self-inconsistent socket -- so-so_head is set, so so must be on a queue, but sonewconn() hasn't put it on a queue yet. Please try this patch. Bill Index: uipc_socket2.c === RCS file: /home/ncvs/src/sys/kern/uipc_socket2.c,v retrieving revision 1.104 diff -u -r1.104 uipc_socket2.c --- uipc_socket2.c 18 Sep 2002 19:44:11 - 1.104 +++ uipc_socket2.c 1 Nov 2002 22:40:52 - @@ -192,7 +192,7 @@ return ((struct socket *)0); if ((head-so_options SO_ACCEPTFILTER) != 0) connstatus = 0; - so-so_head = head; + so-so_head = NULL; so-so_type = head-so_type; so-so_options = head-so_options ~ SO_ACCEPTCONN; so-so_linger = head-so_linger; @@ -209,6 +209,7 @@ return ((struct socket *)0); } + so-so_head = head; if (connstatus) { TAILQ_INSERT_TAIL(head-so_comp, so, so_list); so-so_state |= SS_COMP; To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Bill Fenner wrote: sonewconn() hands sofree() a self-inconsistent socket -- so-so_head is set, so so must be on a queue, but sonewconn() hasn't put it on a queue yet. Please try this patch. I think this can still crash (just like my patch); the problem is in what happens when it fails to allocate memory. Unless you set one of the flags, it's still going to panic in the same place, I think, when you run out of memory. The code that the SYN-cache uses really needs to be refactored, and seperated out. The problem is that the delay in allocation is intentional, to permit the cache to deal with lighter weight instances, until it decides to actually create a connection. There's not a clean way to do this, really, without passing the address of the socket pointer down, and having lower level code fill it out, I think. 8-(. The problem is definitely based on what happens in the allocation or protocol attach failure cases. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
I think this can still crash (just like my patch); the problem is in what happens when it fails to allocate memory. Unless you set one of the flags, it's still going to panic in the same place, I think, when you run out of memory. No. The flags are only checked when so_head is not NULL. sonewconn() was handing sofree() an inconsistent struct so (so_head was set without being on either queue), i.e. sonewconn() was creating an invalid data structure. The call in sonewconn() used to be to sodealloc(), which didn't care about whether or not the data structure was self-consistent. The code was refactored to do reference counting, but the fact that the socket was inconsistent at that point wasn't noticed until now. The problem is not at all based on what happens in the allocation or protocol attach failure cases. The SYN cache is not involved, this is a bug in sonewconn(), plain and simple. Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: crash with network load (in tcp syncache ?)
Bill Fenner wrote: I think this can still crash (just like my patch); the problem is in what happens when it fails to allocate memory. Unless you set one of the flags, it's still going to panic in the same place, I think, when you run out of memory. No. The flags are only checked when so_head is not NULL. sonewconn() was handing sofree() an inconsistent struct so (so_head was set without being on either queue), i.e. sonewconn() was creating an invalid data structure. You're right... I missed that; I was thinking too hard on the other situations (e.g. soabort()) that could trigger that code, and no enough on the code itself. The call in sonewconn() used to be to sodealloc(), which didn't care about whether or not the data structure was self-consistent. The code was refactored to do reference counting, but the fact that the socket was inconsistent at that point wasn't noticed until now. Yeah; I looked at doing a ref() of the thing as a partial fix, but the unref() did the sotryfree() anyway. The problem is not at all based on what happens in the allocation or protocol attach failure cases. The SYN cache is not involved, this is a bug in sonewconn(), plain and simple. I still think there is a potential failure case, but the amount of code you'd have to read through to follow it is immense. It has to do with the conection completing at NETISR, instead of in a process context, in the allocation failure case. I ran into the same issue when trying to run connections to completion up to the accept() at interrupt, in the LRP case. The SYN cache case is very similar, in the case of a cookie that hits when there are no resources remaining. He might be able to trigger it with his setup, by setting the cache size way, way don, and thus relying on cookies, and then flooding it with conection requests until he runs it out of resources. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message