RE: Problems in VM structure ?
I've adjusted MAXUSERS to 128 on my heavily loaded PIIs and the crashes have not re-occurred for 24 hours now. (Had to adjust NMBCLUSTERS up, though) The panics were happening every 5-8 hours like clockwork prior to this. I believe that these crashes are caused by heavy network traffic, not heavy load values, so a make world may not trigger this. Actually, I couldn't force it to happen when I hit the box hard during testing with web traffic, so it must be a combination thing. Another clue is the fact that I can't seem to get a Pentium (P5) to crash at all, ever, even when running exactly the same kernel config. The Pentium IIs fell over like crazy. -Troy Cobb Circle Net, Inc. http://www.circle.net To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
RE: RE: Problems in VM structure ?
-Original Message- From: Matthew Dillon [mailto:dil...@apollo.backplane.com] :What's the chance that our kernel adaptations for PIIs :is partly at fault? : :-Troy Cobb : Circle Net, Inc. : http://www.circle.net With what config? Have you tried reducing maxusers to 128? -Matt I've had it at MAXUSERS=256 on both the P5 and the P6. The P5 stays stable, the P6 doesn't. If I reduce MAXUSERS to 128 then these heavily loaded boxen will fall over due to out of MBUFs errors, or so I believe. I'd love to find some real kernel-tuning documentation out there, one of my panics is a pipeinit: cannot allocate pipe -- out of kvm and I can't pull a crashdump due to a DSCHECK error because my SWAP is 2GB. -Troy Cobb Circle Net, Inc. http://www.circle.net To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
RE: Problems in VM structure ?
I'm seeing different responses depending on hardware. On regular Pentium 166 machines, I almost NEVER get a panic. On brand-new Pentium II 350s, I get a panic every 6-9 hours. This happens when both kernels are configured the same for maxusers. It happens when both machines are under the same load level -- the P5 stays rock solid, the P6 flakes out. What's the chance that our kernel adaptations for PIIs is partly at fault? -Troy Cobb Circle Net, Inc. http://www.circle.net -Original Message- From: Brian Feldman [mailto:gr...@unixhelp.org] Sent: Tuesday, February 16, 1999 7:48 AM To: Matthew Dillon Cc: Khetan Gajjar; curr...@freebsd.org Subject: Re: Problems in VM structure ? On Tue, 16 Feb 1999, Matthew Dillon wrote: :maxusers 256 Try reducing maxusers to 128. Another person reported similar behavior to me and after a bunch of work he tried going back to a basic distribution -- and everything started working again. It turned out that a maxusers value of 256 and 512 were causing his machine to go poof, but a maxusers value of 128 worked fine. I haven't tracked the problem down yet. Please try reducing your maxusers to 128 and email the results to current. For what it's worth, my maxusers is 250 and my system is quite stable, even during a make -j25 buildworld. -Matt To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message Brian Feldman_ __ ___ ___ ___ gr...@unixhelp.org _ __ ___ | _ ) __| \ http://www.freebsd.org/ _ __ ___ | _ \__ \ |) | FreeBSD: The Power to Serve! _ __ ___ _ |___/___/___/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
panics deciphering VMSTAT output
I've been trying to track down a regular, but not manually reproducible crash in 3.0-BETA (19990205). I can't get a crashdump due to a DSCHECK negative number bug. I think my swap space of 3+GB is too large for it. So, I've been having it send me the output of vmstat -m every 15 minutes to try to track it down. The couple of times I've seen the panic message on the console, it was typically, but not always: pipeinit: cannot allocate pipe - out of kvm It has been happening approximately every 6 hours on a heavily loaded server with 100+ chrooted daemons and NFS. So, in short, so that I can compare my vmstat outputs to the one I captured 3 minutes before the last crash, can anyone tell me what the vmstat entries mean? :) A quick legend or tutorial would be helpful, and I'll turn it into a FAQ for the documentation project, too. -Troy Cobb Circle Net, Inc. http://www.circle.net To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
RE: 3c905B stops responding during ifconfig alias
Bill, Your patch worked perfectly. THANK YOU! By the way, I'd be happy to fedex a 3c905B to you for your use in testing these sorts of things if that would be helpful. We have a fairly large commitment to this card now (40+) and I'd do this happily to facilitate continuing performance enhancements or other improvements to it. Sincerely, -Troy Cobb Circle Net, Inc. http://www.circle.net -Original Message- From: wp...@ctr.columbia.edu [mailto:wp...@ctr.columbia.edu] My apologies for not replying to you on this sooner; it took me a while to locate a machine with which I could do some testing (all the 3c905B hardware I have is in the form of embedded chipsets in Dell desktop machines, and they've been moving around on me a lot). This does NOT happen on the: xl0: 3Com 3c905 Fast Etherlink XL 10/100BaseTX rev 0x00 int a irq 10 on I think I found the problem. Currently, xl_stop() and xl_init() both issue RX and TX resets. Seems logical doesn't it? I mean, the purpose of xl_init() is to put the NIC into a known good state, and the purpose of xl_stop() is to slap it in the face and make it shut up ASAP. The difference between the 3c905 and the 3c905B (well, the important difference in this case) is that the 3c905B's chipset has a built-in PHY, while the 3c905 requires an external one (3Com uses a DP83840A for the 3c905 boards, judging by the one sample 3c905 card I have). Apparently, issuing the RX and TX reset commands on the 3c905B causes it to also reset the PHY, which causes the PHY to restart its autonegotiation session with its link partner. It takes a few seconds for the autoneg session to finish, and during this time the 3c905B stops receiving packets. This doesn't happen on the 3c905 because issuing the RX and TX reset commands does not have any affect on the external PHY: the only way to reset the PHY is by writing to the PHY's basic mode control register via the MII management interface. I'm including a patch which should fix this problem. It just disables the code that does the reset in both xl_stop() and xl_init(). Please try this and let me know if it helps. To apply the patch, do the following: - Make sure you have the kernel source code installed under /usr/src. - Save this message to a file, i.e. /tmp/xl.patch - Become root. - Run the following commands: # cd /sys/pci # patch /tmp/xl.patch - Compile a new kernel and boot it. This patch was generated using a version of if_xl.c from FreeBSD-current, but it should work on any version of the driver with only a couple of mild warnings. -Bill -- = -Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu Work: wp...@ctr.columbia.edu | Center for Telecommunications Research Home: wp...@skynet.ctr.columbia.edu | Columbia University, New York City = Mulder, toads just fell from the sky! I guess their parachutes didn't open. = *** ../CVSWORK/sys_pci/if_xl.c Mon Feb 1 16:25:52 1999 --- if_xl.c Thu Feb 11 18:34:39 1999 *** *** 2363,2373 --- 2363,2375 for (i = 0; i 3; i++) CSR_WRITE_2(sc, XL_W2_STATION_MASK_LO + (i * 2), 0); + #ifdef notdef /* Reset TX and RX. */ CSR_WRITE_2(sc, XL_COMMAND, XL_CMD_RX_RESET); xl_wait(sc); CSR_WRITE_2(sc, XL_COMMAND, XL_CMD_TX_RESET); xl_wait(sc); + #endif /* Init circular RX list. */ if (xl_list_rx_init(sc) == ENOBUFS) { *** *** 2715,2724 --- 2717,2728 CSR_WRITE_2(sc, XL_COMMAND, XL_CMD_TX_DISABLE); CSR_WRITE_2(sc, XL_COMMAND, XL_CMD_COAX_STOP); DELAY(800); + #ifdef notdef CSR_WRITE_2(sc, XL_COMMAND, XL_CMD_RX_RESET); xl_wait(sc); CSR_WRITE_2(sc, XL_COMMAND, XL_CMD_TX_RESET); xl_wait(sc); + #endif CSR_WRITE_2(sc, XL_COMMAND, XL_CMD_INTR_ACK|XL_STAT_INTLATCH); /* Stop the stats updater. */ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Tracking a Fatal Double Fault
Can someone please give me a short guide on how to track down a fatal double fault? System is 3.0-19990205-STABLE, and I've written down the fault info. Thanks, -Troy Cobb Circle Net, Inc. http://www.circle.net To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
RE: Tracking a Fatal Double Fault
The machine is running a custom kernel, but nothing very unusual. My instinct is that it may be related to something with the 3c905B 3COM cards that I reported earlier, I'm trying with Intel EtherExpresses right now and getting no fault problems. The double-fault does not occur consistently, unfortunately, and typically only occurs during my rc.local stuff (loading a bunch (100+) of chrooted daemons) on boot-up. Would the eip/esp/ebp values be worth sending? -Troy Cobb Circle Net, Inc. http://www.circle.net -Original Message- From: Mike Smith [mailto:m...@smith.net.au] Sent: Monday, February 08, 1999 6:55 PM To: tc...@staff.circle.net Cc: curr...@freebsd.org Subject: Re: Tracking a Fatal Double Fault Can someone please give me a short guide on how to track down a fatal double fault? System is 3.0-19990205-STABLE, and I've written down the fault info. Ack. It's actually pretty difficult. You can start by trying to locate the PC for the fault in the kernel image, but the typical cause of a double fault is running out of kernel stack. Are you running any custom kernel code? -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ m...@smith.net.au \\ The race is long, and in the \\ msm...@freebsd.org \\ end it's only with yourself. \\ msm...@cdrom.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
RE: Tracking a Fatal Double Fault
So a double-fault is always a kernel stack problem? I find it suspicious that this same machine also had trouble with the 3c905B flaking out -- dropping packets during an ifconfig alias, and sometimes never reactivating the interface according to what tcpdump shows. The 3c905B problem repeates itself on EVERY machine that I've them installed into (7 or so), the double-faults are infrequent on some of the busier machines, and almost always during the initial boot process. -Troy Cobb Circle Net, Inc. http://www.circle.net -Original Message- From: Mike Smith [mailto:m...@smith.net.au] There's nothing immediately obvious in the xl driver that would suggest that it uses excessive kernel stack either. 8( Maybe someone has some clues on measuring stack usage (or simply on how to increase the kernel stack allocation...). -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ m...@smith.net.au \\ The race is long, and in the \\ msm...@freebsd.org \\ end it's only with yourself. \\ msm...@cdrom.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
3c905B stops responding during ifconfig alias
This happens in -current and -stable. Machine: CPU: Pentium II (quarter-micron) (350.80-MHz 686-class CPU) Origin = GenuineIntel Id = 0x652 Stepping=2 Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA, CM OV,PAT,PSE36,MMX,b24 real memory = 402653184 (393216K bytes) config quit avail memory = 388808704 (379696K bytes ... xl0: 3Com 3c905B Fast Etherlink XL 10/100BaseTX rev 0x30 int a irq 9 on pci0.18.0 ... During an ifconfig xl0 alias, the xl0 interface drops packets. It does NOT generate errors (netstat -in). In fact, on several occasions I've seen it go completely unresponsive (not responding to arp requests) until kicked back to life by outbound packets. This does NOT happen on the: xl0: 3Com 3c905 Fast Etherlink XL 10/100BaseTX rev 0x00 int a irq 10 on pci0.1 8.0 Here's a quick test program that I use: #!/usr/bin/perl # call this as just_alias.pl class_c num_ips $ip_base=$ARGV[0] || 209.95.67.; $num=$ARGV[1] || 250; for (1..$num) { print aliasing for $ip_base.$_.\n; system (ifconfig xl0 alias $ip_base.$_. netmask 255.255.255.255); # the sleep command allows us to see the problem more # clearly, though it does happen w/o the sleep here... sleep 1; } -Troy Cobb Circle Net, Inc. http://www.circle.net To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message