Re: ZFS deadlock
Johan Ström wrote: Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it unlocks or if anyone have any suggestions. The key is to increase your kmem and prevent it from being exhausted. I think more recent OpenSolaris's ZFS code has some improvements but I do not have spare devices at hand to test and debug :( Maybe pjd@ would get a new import at some point? I have cc'ed him. Cheers, -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: ZFS deadlock
For your question: just reboot would be fine, you may want to tune your arc size (to be smaller) and kmem space (to be larger), which would reduce the chance that this would happen, or eliminate it, depending on your workload. This situation is not recoverable and you can trust ZFS that you will not lose data if they are already sync'ed. -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Reliably trigger-able ZFS panic
Hi, The following iozone test case on ZFS would reliably trigger panic: /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C Unfortunately the kgdb can not reveal useful backtrace. I have tried KDB_TRACE, but have not yet be able to further investigate it. fs12# kgdb /boot/kernel/kernel.symbols vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd. Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 05 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x8:0x80763d16 stack pointer = 0x10:0xd94798f0 frame pointer = 0x10:0xd9479920 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 340 (txg_thread_enter) trap number = 12 panic: page fault cpuid = 5 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a trap_fatal() at trap_fatal+0x29f trap_pfault() at trap_pfault+0x294 trap() at trap+0x2ea calltrap() at calltrap+0x8 --- trap 0xc, rip = 0x80763d16, rsp = 0xd94798f0, rbp = 0xd9479920 --- dmu_objset_sync_dnodes() at dmu_objset_sync_dnodes+0x26 dmu_objset_sync() at dmu_objset_sync+0x12d dsl_pool_sync() at dsl_pool_sync+0x72 spa_sync() at spa_sync+0x390 txg_sync_thread() at txg_sync_thread+0x12f fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xd9479d30, rbp = 0 --- Uptime: 25m7s Physical memory: 4081 MB Dumping 1139 MB: 1124 1108 1092 1076 1060 1044 1028 1012 996 980 964 948 932 916 900 884 868 852 836 820 804 788 772 756 740 724 708 692 676 660 644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388 372 356 340 324 308 292 276 260 244 228 212 196 180 164 148 132 116 100 84 68 52 36 20 4 #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) add-symbol-file /boot/kernel/zfs.ko.symbols add symbol table from file /boot/kernel/zfs.ko.symbols at (y or n) y Reading symbols from /boot/kernel/zfs.ko.symbols...done. (kgdb) where #0 doadump () at pcpu.h:194 #1 0x80277aa8 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0x80277f07 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0x80465a1f in trap_fatal (frame=0xc, eva=Variable eva is not available. ) at /usr/src/sys/amd64/amd64/trap.c:724 #4 0x80465e04 in trap_pfault (frame=0xd9479840, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #5 0x8046677a in trap (frame=0xd9479840) at /usr/src/sys/amd64/amd64/trap.c:410 #6 0x8044babe in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #7 0x80763d16 in ?? () #8 0x0004 in adjust_ace_pair () #9 0x0004 in adjust_ace_pair () #10 0xd94799e0 in ?? () #11 0x80763e7d in ?? () #12 0xff0004275a80 in ?? () #13 0xff00045a1190 in ?? () #14 0x807639b0 in ?? () #15 0x80763f20 in ?? () #16 0xff00042dc800 in ?? () #17 0x0004 in adjust_ace_pair () #18 0xd9479990 in ?? () #19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0, level=70109184, method=68351768, windowBits=68351600, memLevel=76231808, strategy=76231808, version=Cannot access memory at address 0x00040010 ) at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318 Previous frame inner to this frame (corrupt stack?) -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: pseudo terminals in 7.0 - pts implementation
Dan Epure wrote: Hi, For the moment this feature is only available in HEAD. I think the limit is only 512 master/slave pairs. Should be enough for this year. Is it going to be merged in 6.3 ? Thanks again. I don't think so. Currently the FreeBSD pts implementation has some serious issues that must be resolved before we can officially expose it to users, and merging it to 6.3 does not seem to be reasonable at this moment. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: date/time trouble - PST came too early
Chris H. wrote: Quoting LI Xin [EMAIL PROTECTED]: Chris H. wrote: Any advice would be *greatly* appreciated. Install /usr/ports/misc/zoneinfo or upgrade your system to a recent release (preferred), e.g. RELENG_6_2 aka FreeBSD 6.2-RELEASE. Re-run tzsetup and choose your time zone accordingly. You will need to restart the time-sensitive services afterward, or reboot the whole system :-) Cheers, -- Hello Xin LI, and thank you for your quick response. FWIW The system already knows what timezone it lives in. It simply chose to change to PST according to the /normal/ standards. What happened here in the USA, is that president Bush decided that we'd be better served here if we waited an additional week to set our clocks back one hour. So. It seems this particular server decided to ignore our president (not that I blame it) and set the clock back one hour on the /usual/ date. :) As an experiment, what I have done was bounce the server and set the clock in the BIOS ahead 1 day save settings reboot. My /initial/ findings seemed hopeful. But, given that I run ntpdate as a cron job, the first time the job ran, all went back to the /wrong/ dime/date. So as I must wait for 6.3, I'm just going to end the ntpdate cron job until PST /really/ occurs; unless of course someone has a better solution. :) Thanks again for taking the time and effort to respond. I think you have misunderstood me. I knew what you wanted, which is a corrected day of PDT-PST transition. The reason why you want to install misc/zoneinfo or a more recent release of FreeBSD is exactly because that we have updated it for the modified standards. For me, America - United States - Pacific Time works just fine. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: date/time trouble - PST came too early
Chris H. wrote: Any advice would be *greatly* appreciated. Install /usr/ports/misc/zoneinfo or upgrade your system to a recent release (preferred), e.g. RELENG_6_2 aka FreeBSD 6.2-RELEASE. Re-run tzsetup and choose your time zone accordingly. You will need to restart the time-sensitive services afterward, or reboot the whole system :-) Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Source upgrade from 5.5 to 6.X not safe?
Clint Olsen wrote: I just attempted a source upgrade from 5.5-STABLE to 6.3-PRERELEASE, and it was a disaster, more than likely because I forgot to do something. Normally I'm saved by the fact that the operations are not so scary as to cause problems. [... restore old kernel worked; new kernel does not work after fsck ...] So we get: - 5.5-STABLE works well on your box - 6.2-RELEASE stock GENERIC works fine - 6.3-PRERELEASE failed for some reason. So far as I am aware I have no clue why this could happen. Could you check if you have any special configuration in your /etc/make.conf, especially special CFLAGS? I usually simply remove my /usr/src /usr/obj and build a new world without make.conf to make sure. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Debugging off in 7.0-BETA1 kernel?
Doug Poland wrote: Hello, A couple of days ago I csup'd RELENG_7 and I noticed the GENERIC kernel was missing the following options... options KDB options DDB options GDB options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN The GENERIC kernel from BETA1.5 ISO did, however, include these options. Question, is this a feature? What are the preferred settings for testing in the BETA/RC stages of the release cycle? I think for BETA stage we will prefer to have these options (disclaimer: this is only my own $0.02), but for RC stages it will be preferred to use kernel without these stuff. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: ICH9 support for Motherboard: Foxconn G33M?
Abdullah Ibn Hamad Al-Marri wrote: Hello, This is just a fresh csup buildworld, buildkernel with GENERIC Motherboard: Foxconn G33M ata controllers not found so it runs generic @ udma33 slowest possible I think the chipset is not yet supported by -HEAD now. Would you please try the attached patch to see if it works? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! Index: ata-chipset.c === RCS file: /home/ncvs/src/sys/dev/ata/ata-chipset.c,v retrieving revision 1.201 diff -u -p -r1.201 ata-chipset.c --- ata-chipset.c 4 Oct 2007 19:17:15 - 1.201 +++ ata-chipset.c 9 Oct 2007 18:40:28 - @@ -1713,6 +1713,7 @@ ata_intel_ident(device_t dev) { ATA_I82801HBM_S2, 0, AHCI, 0x00, ATA_SA300, ICH8M }, { ATA_I82801IB_S1, 0, AHCI, 0x00, ATA_SA300, ICH9 }, { ATA_I82801IB_S2, 0, AHCI, 0x00, ATA_SA300, ICH9 }, + { ATA_I82801IB_AH2, 0, AHCI, 0x00, ATA_SA300, ICH9 }, { ATA_I82801IB_AH4, 0, AHCI, 0x00, ATA_SA300, ICH9 }, { ATA_I82801IB_AH6, 0, AHCI, 0x00, ATA_SA300, ICH9 }, { ATA_I31244, 0,0, 0x00, ATA_SA150, 31244 }, Index: ata-pci.h === RCS file: /home/ncvs/src/sys/dev/ata/ata-pci.h,v retrieving revision 1.80 diff -u -p -r1.80 ata-pci.h --- ata-pci.h 4 Oct 2007 19:17:16 - 1.80 +++ ata-pci.h 9 Oct 2007 18:39:51 - @@ -169,6 +169,7 @@ struct ata_connect_task { #define ATA_I82801HBM_S10x28298086 #define ATA_I82801HBM_S20x282a8086 #define ATA_I82801IB_S1 0x29208086 +#define ATA_I82801IB_AH20x29218086 #define ATA_I82801IB_AH60x29228086 #define ATA_I82801IB_AH40x29238086 #define ATA_I82801IB_S2 0x29268086 signature.asc Description: OpenPGP digital signature
Re: ICH9 support for Motherboard: Foxconn G33M?
Abdullah Ibn Hamad Al-Marri wrote: Thank you Li, It works, and thank you for the RELENG_6 patch too. Thanks. I have just committed the -CURRENT patch against -HEAD and it should appear in RELENG_7 (7.0-BETA). Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: PAE Slowdown
Jeff Kramer wrote: Hey all, I know that AMD64's the preferred way to run 4 gig systems, but I'm having a weird situation with 6.2-RELEASE-p8 and 6-STABLE as of last night. When I compile the PAE kernel, my system performance drops like a rock. It still boots and everything still runs, but for instance, running the Flops port my megaflops drop from the 950 MFLOPS range to 4 MFLOPS. It feels about as fast as a 486. I'm not sure what I should try disabling. I tried nodevice usb, but that didn't seem to change anything. SMP and GENERIC kernels work fine. CPU: Intel Core Duo 2 Quad 2.4ghz Memory: 8 gig (4 2 gig dimms) Swap: 16 gig partition If I try to boot without ACPI disabled the kernel doesn't finish booting, it stops after ata7. Perhaps unrelated, but why don't you run amd64 version? I think PAE is a hack, for instance it does not allow processes to use more than 2GB memory, while AMD64 (called EM64T by Intel implementation) provides much more... Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: STABLE is not compiling
Carlos Fernando Assis Paniago wrote: plutao# make cc -O2 -fno-strict-aliasing -pipe -DIPSEC -DINET6 -DFAST_IPSEC -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -c /usr/src/usr.bin/netstat/inet.c /usr/src/usr.bin/netstat/inet.c: In function `pim_stats': /usr/src/usr.bin/netstat/inet.c:1032: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1033: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1034: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1035: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1036: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1037: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1038: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1039: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1040: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1041: warning: long long unsigned int format, u_quad_t arg (arg 2) /usr/src/usr.bin/netstat/inet.c:1042: warning: long long unsigned int format, u_quad_t arg (arg 2) *** Error code 1 Stop in /usr/src/usr.bin/netstat. plutao# I cvsup'ed the RELENG_6 and try in 2 diffent machines. The stable is broken (The thinderbox is showing the same). This is the stable version, people, please solve this ASAP I think David has committed a fix 2 hours ago. Could you please take a look at file revisions and make sure if you have these revisions: Revision ChangesPath 1.24.8.2 +1 -0 src/usr.bin/netstat/atalk.c 1.5.2.4+1 -0 src/usr.bin/netstat/bpf.c 1.70.2.4 +29 -28src/usr.bin/netstat/inet.c 1.25.8.3 +161 -160 src/usr.bin/netstat/inet6.c 1.12.8.3 +27 -26src/usr.bin/netstat/ipsec.c 1.23.2.3 +1 -0 src/usr.bin/netstat/ipx.c 1.72.2.10 +4 -4 src/usr.bin/netstat/main.c 1.42.8.8 +24 -23src/usr.bin/netstat/mbuf.c 1.22.8.3 +15 -10src/usr.bin/netstat/mroute.c 1.15.8.1 +19 -18src/usr.bin/netstat/mroute6.c 1.10.8.3 +1 -0 src/usr.bin/netstat/netgraph.c 1.41.2.7 +3 -3 src/usr.bin/netstat/netstat.h 1.1.2.3+23 -22src/usr.bin/netstat/pfkey.c 1.76.2.4 +1 -0 src/usr.bin/netstat/route.c 1.18.8.2 +1 -0 src/usr.bin/netstat/unix.c Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Cannot ssh from jail
Tom Evans wrote: Hi stable@, jail@ [jail@ plz cc me as I'm not subscribed] I'm having some problems setting up some jails for semi-isolated development (ie, so we can isolate the developers into a jail, give them all the root access they want, and not worry about them blowing up more than their own jail) on 6.2-RELEASE-p5. I have set up a jail, using ezjail, which appeared to work fine. I can start the jail, and use jexec to spawn a shell inside the jail. However, if I then try to ssh from the jail to another box, ssh fails with the error message (with -v): I think the problem is that if you jexec into a jail then you don't have a TTY at hand, so bad things would happen. If you login into the jail by some ways (e.g. by ssh or telnet or whatever that spawns a TTY for you) then it would work I bet. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: rm(1) bug, possibly serious
I think this is a bug, here is a fix obtained from NetBSD. The reasoning (from NetBSD's rm.c,v 1.16): Strip trailing slashes of operands in checkdot(). POSIX.2 requires that if . or .. are specified as the basename portion of an operand, a diagnostic message be written to standard error, etc. We strip the slashes because POSIX.2 defines basename as the final portion of a pathname after trailing slashes have been removed. This also makes rm perform actions equivalent to the POSIX.1 rmdir() and unlink() functions when removing directories and files, even when they do not follow POSIX.1's pathname resolution semantics (which require trailing slashes be ignored). If nobody complains about this I will request for commit approval from [EMAIL PROTECTED] Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! Index: rm.c === RCS file: /home/ncvs/src/bin/rm/rm.c,v retrieving revision 1.58 diff -u -p -r1.58 rm.c --- rm.c31 Oct 2006 02:22:36 - 1.58 +++ rm.c25 Sep 2007 18:26:52 - @@ -558,6 +558,14 @@ check2(char **argv) return (first == 'y' || first == 'Y'); } +/* + * POSIX.2 requires that if . or .. are specified as the basename + * portion of an operand, a diagnostic message be written to standard + * error and nothing more be done with such operands. + * + * Since POSIX.2 defines basename as the final portion of a path after + * trailing slashes have been removed, we'll remove them here. + */ #define ISDOT(a) ((a)[0] == '.' (!(a)[1] || ((a)[1] == '.' !(a)[2]))) void checkdot(char **argv) @@ -567,10 +575,17 @@ checkdot(char **argv) complained = 0; for (t = argv; *t;) { + /* strip trailing slashes */ + p = strrchr(*t, '\0'); + while (--p *t *p == '/') + *p = '\0'; + + /* extract basename */ if ((p = strrchr(*t, '/')) != NULL) ++p; else p = *t; + if (ISDOT(p)) { if (!complained++) warnx(\.\ and \..\ may not be removed); signature.asc Description: OpenPGP digital signature
Re: rm(1) bug, possibly serious
Oliver Fromme wrote: Nicolas Rachinsky wrote: Oliver Fromme wrote: By the way, an additional confusion is that .. and ../ are handled differently. Specifying .. always leads to this message: rm: . and .. may not be removed and nothing is actually removed. It is confusing that adding a slash leads to a different error message _and_ removal of the contents of the parent directory. Clearly a POLA violation. Adding a slash often leads to different behaviour. Yes, I'm aware of that. I often make use of the feature that find /sys/ expands the symlink, while find /sys does not. The same holds true for ls(1). However, I would still argue that there is no sane reason for rm -rf ../ behaving differently from rm -rf .., especially because it behaves differently in a destructive way. That's why I call it a POLA violation. Also a POSIX violation IMHO :-) Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: [OT] Which one is best MTA for me?
Byung-Hee HWANG wrote: Hi, Recently I am considering to move to another MTA. At one time I was wondering what mail server big ISP are running. I can't decide postfix or qmail. Which one is best MTA for me? We (one of the largest ICP company in China and provides some billions of free e-mail accounts) has replaced our locally hacked qmail with postfix in 2005. I and one of our sys-op has accomplished all of the necessary migration work (coding to re-implement our features, testing, and migration itself) within three weeks, so once you have confirmed what your requirement is, it's not very hard to do. I do not want to say simply that you will want postfix, but I would like to explain the reasons why we have did the migration: - Performance. qmail uses about 3 times of file I/O operation for each mail delivery, compared with postfix. For a busy MX server this means that you have to prepare more servers to do the same job. - Features. Most of extension to qmail has to be done by patching. It's true that there is a lot of qmail patches floating around, but you have to carefully maintain a local qmail tree. Postfix has a lot of built-in anti-spam and other features, and it is relatively easy to implement new feature with Python/Twisted through Postfix's interface - no patch needed for postfix itself. - Manageability. Our system administration team used to be very familiar with qmail. However, for large ISP/ICPs, qmail fails to solve the following essential problems that postfix does not have: - Hard to reconstruct mail queue after damage In postfix this can be done with 'postfix check'. - Have to watch its queue situation Once you got the queue stuck with a lot (200,000+) e-mails that can not be delivered temporarily, e.g. due to a backend hardware issue, all e-mails would be delivered very slowly, and incoming e-mail would make the situation worse. We used to have a monitoring script that adds an ipfw rule to block all subsequent incoming mail to work around this issue. With postfix, stuck e-mails are stored in separate queue. - and a lot more... - Maintainability. Extending postfix for your own need is easy, our new anti-spam system is primarily written as a policy/filtering/table lookup daemon with Python/Twisted, making the code much more easy to understand and maintain. It's not easy for someone to be able to pick up the qmail patch due to the coding style and lack of comments. More importantly, postfix is being actively maintained, but qmail is not. I'm not sure about the problem you are having. If you are going to set up a new mail server (cluster) then don't use qmail, it would be a nightmare if your system grows. Postfix is a good choice, and there are some other choices, e.g. sendmail, exim, etc. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Merged em driver
Jack Vogel wrote: The next driver I that I release via Intel channels is going to merge the code for 6 and 7. I was thinking that I could check that into the tip and it would make the most current version buildable on either RELEASE, was wondering if that is looked upon favorably or not? I have code ready to do that if getting it into 7.0 would be desireable. I think that if the change is not quite intrusive (e.g. add some ifdef blocks rather than restructuring the code heavily) then it would be desirable to have it hit the tree before we branch RELENG_7. However, if the change would be very late then it would be better to have it delayed and only backport important fixes in a case-by-case manner. My $0.02 :-) Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Getting this Fatal Trap with heavy network activity
Alexey Sopov wrote: #16 0xc0539c1c in ithread_execute_handlers () #17 0xc0539d66 in ithread_loop () #18 0xc053878f in fork_exit () #19 0xc06ec18c in fork_trampoline () XL I think this was a fatal trap 12 and you may want to try if updating to XL 6.2-STABLE helps. There was some important related fixes in RELENG_6 XL but not yet merged to RELENG_6_2 (by jhb@). I've just updated my system to 6.2-STABLE #4. Hope this helps. Ok, be sure to report any problem so we can investigate further. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: release cycle
Hi, Erik, Erik Trulsson wrote: On Tue, May 29, 2007 at 11:45:15AM +0200, Volker wrote: [...] I know there's a release cycle for 7-CURRENT planned next June but IMHO it can be delayed for some weeks. What does the core and releng team think? It might do good. A quick release cycle for a release from -STABLE sounds like a bad idea. There have been enough new things that have gone into the 6-STABLE branch that a full release cycle seems warranted. I would say that just tagging -STABLE tree as RELEASE is a very bad idea, however, it would not be too bad if we take RELENG_6_2 as a codebase and add the following bugfixes: - zonelim fixes (very helpful for heavy loaded systems). - some socket locking fixes (settled for a long time). - driver updates, like RAID driver bugfixes. Therefore, from src/'s view, I think it would not be too bad to make a point release that merges some (well tested) bugfixes in, or at least, provide a patchset as an errata for 6.2-R. I also think that it would be a bad idea to create *any* new release until after the X.org upgrade has settled down - it has not done that quite yet as far as I can tell. I agree. What's more, freezing the ports/ tree often does not sound quite appealing :-) Just my $0.02. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: vnode_pager_putpages errors on 6.2
Hi, Steve, steve wrote: [...] http://atm.tut.fi/list-archive/freebsd-stable/msg19288.html that I appled to 4.x and it solved the problem. Now that I have upgraded to 6.2, the problem has recurred, but the previous patch is no longer valid. Is there something wrong with the patch/solution given, and is there a solution for 6.2? In RELENG_6_2, the rate limit part of the patch was implemented in a different way. Could you please try this patch to see if it solves your problem? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! Index: vnode_pager.c === RCS file: /home/ncvs/src/sys/vm/vnode_pager.c,v retrieving revision 1.221.2.7 diff -u -p -u -r1.221.2.7 vnode_pager.c --- vnode_pager.c 14 Oct 2006 06:04:32 - 1.221.2.7 +++ vnode_pager.c 30 May 2007 01:43:39 - @@ -1083,6 +1083,7 @@ vnode_pager_generic_putpages(vp, m, byte struct iovec aiov; int error; int ioflags; + int status; int ppscheck = 0; static struct timeval lastfail; static int curfail; @@ -1177,8 +1178,9 @@ vnode_pager_generic_putpages(vp, m, byte printf(vnode_pager_putpages: residual I/O %d at %lu\n, auio.uio_resid, (u_long)m[0]-pindex); } + status = error ? VM_PAGER_BAD : VM_PAGER_OK; for (i = 0; i ncount; i++) { - rtvals[i] = VM_PAGER_OK; + rtvals[i] = status; } return rtvals[0]; } signature.asc Description: OpenPGP digital signature
Re: fast rate of major FreeBSD releases to STABLE
Ivan Voras wrote: Chris wrote: and its for a desktop element of the os, does it matter if servers running FreeBSD have to remain on vulnerable versions of ports as a result of this? This looks like another call to have RELENG_x branches on ports, with which I agree. Hmm... Branching is not about to do it, or not to do it, but about who will invest their time to do it. By making it as an official offer we have to make sure that: - STABLE branch is well maintained. What's the rule of MFC in these branches? For src/ the answer is clear, but for ports/ I do not think it's obvious. What's the standard choosing particular ports' version? Who will be responsible for that? - packages are continuously built and mirrored. This could cause confusion about should I use -HEAD ports/, or RELENG_X ports/? Not to mention that it needs a doubled computation resource for package cluster. So, while I agree that having branches is a very nice idea I feel that it is not quite exercisable at the moment. It's easy for committers to do make universe to verify that their work does not break build, but it's not that easy for porters to make sure that a commit does not break the -STABLE branch... Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
Oleg Derevenetz wrote: [snip] Not all people can do deadlock debugging, though. In my case turning on INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily loaded server. So I can only describe my case, actions and result without providing any debug information. I'd say that I completely agree with Kris because that it's very hard for developers to investigate problems if there is no detailed information available, especially for those problems that can not easily reproduced. Of course, deadlock debugging could be tricky, but having a backtrace can usually save a lot of time (and fortunately that is not that hard even for average users :) What I wanted to suggest is that, we hope that the submitter can provide detailed steps to reliably reproduce the problem whenever possible, if they are not able to diagnose the problem themselves, so we will be able to extract more information at lab, and possibly reach a fix. The problem I have is that the reporter of the issue is not quite cooperative as they did before, and what I wanted to say is that it's possible to trigger the livelock without nullfs/unionfs, and I did not figured out why (yet) because I can not reproduce it in my environment :-( Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 6.2-STABLE deadlock?
Kostik Belousov wrote: On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote: At work, amoungst my stable of old computers running FreeBSD, I have a Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This primarily runs Nagios and a small and lightly used MySQL database, along with a few inbound FTP transfers per minute. It has a Mylex card based disc subsystem, ruling out crash dumps. At some point during 5.5-STABLE this machine started to occasionally hang ... Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken rather sooner after the hang. Processes with wmesg=ufs feature often in the ps output. http://www.stade.co.uk/crash1/ I would suspect the mlx controller. There is several processes (for instance, 988, 50918) waiting for completion of block read, and processes in the ufs states are the result of the lock cascade, IMHO. I'm not very sure if this is specific to one disk controller. Actually I got some occasional reports about similar hangs on amd64 6.2-RELEASE (slightly patched version) that most of processes stuck in the 'ufs' state, under very light load, the box was equipped with amr(4) RAID. I was not able to reproduce the problem at my lab, though, it's still unknown that how to trigger the livelock :-( Still need some investigate on their production system. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 6.2-STABLE deadlock?
Hi, Oleg, Oleg Derevenetz wrote: Цитирую LI Xin [EMAIL PROTECTED]: [...] I'm not very sure if this is specific to one disk controller. Actually I got some occasional reports about similar hangs on amd64 6.2-RELEASE (slightly patched version) that most of processes stuck in the 'ufs' state, under very light load, the box was equipped with amr(4) RAID. I was not able to reproduce the problem at my lab, though, it's still unknown that how to trigger the livelock :-( Still need some investigate on their production system. I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406: http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat= and there should be a thread related to this. Briefly, I suspects that this is related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be so, at least). Hmm... Seems to be different issues. The problem I have received was a pgsql server (no nullfs/unionfs involved), and the hang always happen when it is not being heavily loaded (usually in the morning, for instance, and there is no special configuration, like scheduled tasks which can generate disk load, etc., only the entropy harvesting), so this is quite confusing. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Sendmail problems
Gustavo Feijó wrote: Found the problem origin: Amavis Apr 10 13:34:02 serv01 amavis[26962]: (26962-01) (!!)TROUBLE in process_request: Can't create directory /var/spool/amavisd/tmp/amavis-20070410T133402-26962: Too many links at /usr/sbin/amavisd line 4215, GEN3 line 2. Searching for solution Any aid would be wellcome Seems that you have too many subdirectories under /var/spool/amavisd/tmp... Try letting it to flush all e-mails and rm -fr /var/spool/amavisd/tmp/amavis* Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: em0 watchdog timeout with nfs
Hi, Jack, Jack Vogel wrote: [EMAIL PROTECTED]:14:0: class=0x02 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00 [...] The driver in 6.2 RELEASE fixed all known problems with watchdogs, other than REAL issues with the network/hardware. Have you tried installing that? A friend of mine has reported similar problem, with different em(4) hardware. The server runs lighttpd on ufs, and the watchdog timeout occurs no matter whether there is heavy traffic. Here is some pciconf -l output which can be interesting. [EMAIL PROTECTED] /usr/local/etc]# pciconf -l|grep em [EMAIL PROTECTED]:0:0: class=0x02 card=0x30a38086 chip=0x108b8086 rev=0x03 hdr=0x00 [EMAIL PROTECTED]:5:0: class=0x02 card=0x30a18086 chip=0x10768086 rev=0x05 hdr=0x00 Should more debugging aid / information is needed to narrow down the issue please let us know, thanks! Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: em0 watchdog timeout with nfs
LI Xin wrote: Hi, Jack, Jack Vogel wrote: [EMAIL PROTECTED]:14:0: class=0x02 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00 [...] The driver in 6.2 RELEASE fixed all known problems with watchdogs, other than REAL issues with the network/hardware. Have you tried installing that? A friend of mine has reported similar problem, with different em(4) hardware. The server runs lighttpd on ufs, and the watchdog timeout occurs no matter whether there is heavy traffic. Here is some pciconf -l output which can be interesting. [EMAIL PROTECTED] /usr/local/etc]# pciconf -l|grep em [EMAIL PROTECTED]:0:0: class=0x02 card=0x30a38086 chip=0x108b8086 rev=0x03 hdr=0x00 [EMAIL PROTECTED]:5:0: class=0x02 card=0x30a18086 chip=0x10768086 rev=0x05 hdr=0x00 Should more debugging aid / information is needed to narrow down the issue please let us know, thanks! Forgot to mention, the em0 (which have watchdog issue) has device polling turned on. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Reverting to 6.2-RELEASE
Oliver Fromme wrote: LI Xin wrote: I always use options INCLUDE_CONFIG_FILE for my kernel :-) Maybe we should add it to DEFAULTS some day... Yes, that would be very useful. But it should also take any includes into account. It was very annoying to discover that INCLUDE_CONFIG_FILE gave me only two lines for one of my kernels (those were options SMP and include MYKERNEL). :-( Fortunately I was able to find a copy of that included configuration file elsewhere. But I guess it could be a very bad surprise for somebody else. Take a look at Wojciech's perforce branch, I find it even more useful than this :-) Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Reverting to 6.2-RELEASE
Pete French wrote: I appear to have a machine which will not run RELENG_6_2, though it runs the released code quite happily. Is there a CVS tag I can use to revert the sources back to the way they were on RELEASE? I want to be able to verify that this is and track down what changed! I don't think it should ever be the case that something which runs X.Y-RELEASE will not run RELENG_X_Y should it ? I think you will want RELENG_6_2_0_RELEASE. What happens with RELENG_6_2, IIRC there was only very limited changes to kernel which should only affect IPv6... Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Reverting to 6.2-RELEASE
Pete French wrote: I think you will want RELENG_6_2_0_RELEASE. thanks (and to the others who responded) What happens with RELENG_6_2, IIRC there was only very limited changes to kernel which should only affect IPv6... Indeed! Part of the reason I want to do the revert is to make absolutely sure that it runs the GENERIC kernel from RELEASE properly. The kernel I have running on it is a binary from elsewherre which was built from the 6.2 RELEASE code, but I no longer have the options it was built with (though I always use options INCLUDE_CONFIG_FILE for my kernel :-) Maybe we should add it to DEFAULTS some day... Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Reverting to 6.2-RELEASE
Wojciech A. Koszek wrote: On Mon, Mar 19, 2007 at 01:33:44PM -, Steven Hartland wrote: I think this is a very good idea, I've been caught at least once not being able to recreate a working kernel due to the loss of the original config file. Steve - Original Message - From: LI Xin [EMAIL PROTECTED] I always use options INCLUDE_CONFIG_FILE for my kernel :-) Maybe we should add it to DEFAULTS some day... I did some work in this area, as several system administrators I've met also seem to have problem with kernel configuration recovery. In my case I came with a method of obtaining a configuration of a running kernel via sysctl (kern.conftxt for now) and via config(8) form the kernel file. Hopefully this work will get more review soon. Not sure how useful could it be to expose it via sysctl(8) interface but sounds interesting to me. Have you posted the patch somewhere? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: OpenBSD's spamd.
Hi, Charles Sprickman wrote: On Dec 19, 2006, at 6:56 PM, Christopher Hilton wrote: Charles Sprickman wrote: On Tue, 19 Dec 2006, Christopher Hilton wrote: Oliver Fromme wrote: Dimitry Andric wrote: Oliver Fromme wrote: What does stuttering mean? Is it similar to sendmail's greet_pause feature? See here: http://www.ualberta.ca/~beck/nycbug06/spamd/mgp00014.html OK, so the answer to my question seems to be yes. :-) Actually I'd say it's similar. If you telnet to port 25 on a server that's using sendmail's greet_pause option I'm assuming that you get nothing for 5 seconds. OpenSD's Spamd sends the initial greeting at a rate of one character per second and only accepts data from you at the same rate. It also sets the window size to something like 1 byte. :) Yes, it does. This results in the remote smtp daemon getting bound up by it's own kernel. Someone had mentioned that this would consume many threads/processes, but that is not the case. Bob explained that spamd runs in a select() loop. I don't totally understand that, but to me it sounds like the same methodology that thttpd used, and that sure scaled up nice. It keeps an array of file descriptors, one for each connection to the remote smtp daemon. It periodically uses the select(2) system call to see which of the descriptors is ready and services them accordingly. Here's what I think is the latest version of Bob's talk. It's quite good. http://www.ualberta.ca/~beck/nycbug06/spamd/ There's audio available here: http://www.nycbsdcon.org/slides I heard the talk in the beginning of November, right about the middle of the big October/November spamming event of '06. To me the most interesting part of the talk was when he spoke about the results of tarpitting his greylisted connections and how he eliminated 1,300,000 Mail messages from a total of 3,000,000 before they ever hit his MTA. That's the feature that's missing from FreeBSD since the port pulls spamd from OpenBSD 3.7 and the tarpitting feature was added in the revision right after the release 3.7 tag. Was the original question when will the FreeBSD port be updated?? :) Yes. There's lots of ways to do it. One could pull diff from the openbsd cvs servers and drop it into the patch directory. That should go cleanly but it would be nice to get this revved up to the latest release. I've got a copy of the latest code to compile. The call symantics of openbsd's openlog_r(3) and syslog_r(3) differ from FreeBSD openlog(3) and syslog(3). But It should work. I need to throw some polish on it but after I do I'll post the patches here and send them to the port maintainer. I know this is kind of old, but I'm needing to work with spamd on FreeBSD and I noticed the port is still stuck at the 3.7 version. Do you have anything that you'd like people to test? I think we need a new maintainer for the port, I'm busy working on other stuff and it would be nice if someone else who has time and interest to maintain it to take it instead of me. If anyone needs a checked out copy of the current state of OpenBSD's spamd to ease their port work (note that the tarball I have provided is a patched version), please let me know. For those who wants to became the new maintainer for the port, please note that, there was some discussion about other improvements to the port, please merge them at http://portsmon.freebsd.org/portoverview.py?category=mailportname=spamd Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: ping's seem to hang ... 'zoneli' state?
Marc G. Fournier wrote: [...] Can someone comment on whether I just missed the commit on my last cvsup, or if I'm hitting the same problem but in a different way? I think so. Try patching your system with: src/sys/kern/kern_mbuf.c,v 1.9.2.9 src/sys/sys/mbuf.h,v 1.170.2.7 src/sys/vm/uma.h,v 1.22.2.8 src/sys/vm/uma_core.c,v 1.119.2.19 src/sys/vm/uma_core.c,v 1.119.2.18 and perhaps also: src/sys/kern/uipc_socket.c,v 1.293. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: ping's seem to hang ... 'zoneli' state?
Marc G. Fournier wrote: [...] Here's what I have right now: __FBSDID($FreeBSD: src/sys/kern/kern_mbuf.c,v 1.9.2.9 2007/02/11 03:31:18 mohans Exp $); * $FreeBSD: src/sys/sys/mbuf.h,v 1.170.2.7 2007/02/11 03:31:19 mohans Exp $ * $FreeBSD: src/sys/vm/uma.h,v 1.22.2.8 2007/02/11 03:31:19 mohans Exp $ __FBSDID($FreeBSD: src/sys/vm/uma_core.c,v 1.119.2.19 2007/02/11 03:31:19 mohans Exp $); __FBSDID($FreeBSD: src/sys/vm/uma_core.c,v 1.119.2.19 2007/02/11 03:31:19 mohans Exp $); __FBSDID($FreeBSD: src/sys/kern/uipc_socket.c,v 1.242.2.8 2007/02/03 04:01:22 bms Exp $); The only one that looks off is uipc_socket.c ... do I need to copy that from HEAD? Are there any compatibility issues with doing that? No, you can not simply copy the file. Try patching your system with: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/uipc_socket.c.diff?r1=1.292r2=1.293 You can just ignore the first hunk. Please note that you may also need to modify uipc_socket2.c just in the way that this file is changed. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: src/sys/netinet/in_pcb.c(rev. 1.165.2.6)
viper wrote: Hi all. I am just want ask question only out of curiosity. Are there any reasons why a file src/sys/netinet/in_pcb.c(rev. 1.165.2.6) is not included to RELEASE_6_2? What a kind of race condition closing that revision. Thanks to help me :-) The changeset was MFC'ed and we need re@'s approval of doing the actual MFC, before that, we need to settle it in the RELENG_6 branch in order to make sure that it does not have unwanted side effects. As far as I am aware there is still some other changesets that is under investigation. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: BTX halted with MegaRaid SCSI 320-2 on 6.2R help
Bruce M. Simpson wrote: Hi, This isn't the answer, but I'm attempting to provide triage for jhb who will probably look at it. This is a GPF, but it's not being caused by an attempt to enter protected mode, so it isn't the most-often reported BTX issue. [EMAIL PROTECTED] wrote: 6.2R cd boot failed with follow error,and the MegaRAID fw version is FW_1L33 thanks with any info BTX loader 1.00 BTX version 1.01 Console: internal video/keyboard BIOS CD is cd0 BIOS drive A: is disk0 BIOS drive C: is disk1 BIOS 639kB/3668928kB available memory FreeBSD/i386 bootstrap loader, Revision 1.1 (root at [EMAIL PROTECTED], Fri Jan 12 06:40:38 UTC 2007) int=000d err= efl=00030086 eip=c3d4 eax=8058 ebx=2000 ecx=0007 edx=fffa esi=f69b edi=00040170 epb=03d8 esp=0358 cs=f000 ds=0040 es=5d18fs=9fc0 gs=f000 ss=9e17 cs:eip=ec 50 e4 61 58 50 e4 61-58 ee 5a c3 01 00 e4 c3 12 00 00 41 d0 0c 02 08-80 00 03 00 79 00 79 00 00 ss:esp=77 01 03 2c a1 00 08 2c-fa 02 00 e0 00 00 c0 9f 00 00 4e 80 f3 ee 00 f0-03 24 00 e0 06 02 00 80 BTX halted It looks like BIOS code at f000:c3d4 is trying to read a word from I/O port 0xfffa, and this is causing a GPF when it tries to write to what looks like the BIOS data area at 0040:0058; cursor position for video page 4. 0: ec in (%dx),%al 1: 50 push %eax 2: e4 61 in $0x61,%al 4: 58 pop%eax 5: 50 push %eax 6: e4 61 in $0x61,%al 8: 58 pop%eax ^^^ The stack operations sound mad to me :-) I think these is probably not what we expect... 9: ee out%al,(%dx) a: 5a pop%edx b: c3 ret c: 01 00 add%eax,(%eax) e: e4 c3 in $0xc3,%al 10: 12 00 adc(%eax),%al 12: 00 41 d0add%al,0xffd0(%ecx) 15: 0c 02 or $0x2,%al 17: 08 80 00 03 00 79 or %al,0x79000300(%eax) 1d: 00 79 00add%bh,0x0(%ecx) Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Can't build threaded perl 5.8 on 6.2-RELEASE and 7-CURRENT
LI Xin wrote: LI Xin wrote: Hi, It seems that threaded perl is broken on 6.2-RELEASE and 7-CURRENT. I have tried some option combinations with no luck, if WITH_THREADED=yes is specified then the build would fail with a coredump. Another observation is that this happens with 6.2-RELEASE kernel with 6.1-RELEASE userland too. But 6.1+6.1 configuration is not affected. Sorry, I mean that 6.1+6.1 configuration was not tested. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Dell PowerEdge 840 in FreeBSD
Felippe de Meirelles Motta wrote: Hi people, Somebody already got problems with PERC 5/i controller, stopping during the boot ? Reading the last message, i dont known the possible problem, look: mfi0: 587 (223981812s/0x0001/-1) - VD 00/0 progress 2% in 48s: Background Initialization progress on VD 00/0 is 2.03%(48s) mfi0:588 (223981820s/0x/0) - Battery temperature is normal mfi0: 589 (223981820s/0x/0) - Current capacity of the battery is above hold mfid0: MFI Logical Disk on mfi0 mfid0: 278784MB (578949632 sectors) RAID volume '0' is optimal My hardware informations are: Dell Poweredge 840 BIOS A01 Raid Controller LSI LOGIC Corporation PERC 5/i version 5.0.2-0003 3 HDs in RAID 5 I tested with FreeBSD 6.1 and 6.2, but, no success! I read about a possible bug in FreeBSD 6.1 with multiple volumes, but i don't find any notice that confirmed this, not even any correction/fix. Could you please try a verbose boot? This could provide more information and can sometimes very helpful for us. Thanks in advance! Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Linux Binary Compitability
Thomas Roberts wrote: How important is it to include Linux binary compatibility when installing 6.2-RELEASE? I am guessing it is only needed if I want to use an application that was compiled by/for Linux, but I am new to FreeBSD so I will be using it mostly for: 1 - learning the system (i.e. what goes where, when and why) and how to configure whatever can be configured. 2 - running GNUstep and related development applications to compliment my Obj-C/Cocoa development. 3 - running GNOME and/or KDE since GNUstep has a limited number of applications available, most notably, a web browser. 4 - learning to create and use shell scripts to customize my environment. The only applications I will be using are the ones in the ports collection that I can use portmanager to create and update. Your assumption is correct. Linux binary compatibility is only necessary when you need to install some pre-compiled binaries from Linux system, and this is usually a last resort if you don't get source code. Native versions, if available, are usually better. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 6.2 bge regression
Hi, Daniel, Daniel O'Connor wrote: I have some Supermicro P8SCT based systems I would like to run 6.2 on, unfortunately something between 6.1 and 6.2 has broken bge on this board. This board has 2 bge's and the first one is on before the BIOS hands off to the OS (I can see the activity light flash) but when the kernel attaches the light stops flashing. An ifconfig says 'no carrier'. bge0: Broadcom BCM5721 Gigabit Ethernet, ASIC rev. 0x4101 mem 0xd100-0xd100 irq 16 at device 0.0 on pci3 miibus0: MII bus on bge0 brgphy0: BCM5750 10/100/1000baseTX PHY on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:30:48:89:be:5c pcib4: ACPI PCI-PCI bridge irq 17 at device 28.1 on pci0 pci4: ACPI PCI bus on pcib4 bge1: Broadcom BCM5721 Gigabit Ethernet, ASIC rev. 0x4101 mem 0xd110-0xd110 irq 17 at device 0.0 on pci4 miibus1: MII bus on bge1 brgphy1: BCM5750 10/100/1000baseTX PHY on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: Ethernet address: 00:30:48:89:be:5d From pciconf.. [EMAIL PROTECTED]:0:0: class=0x02 card=0x02c615d9 chip=0x165914e4 rev=0x11 hdr=0x00 [EMAIL PROTECTED]:0:0: class=0x02 card=0x02c615d9 chip=0x165914e4 rev=0x11 hdr=0x00 This system was updated using CVS so it's really 6.2-PRERELEASE, I am about to try updating again to RELENG_6 and seeing if I was perhaps out of sync. There is no difference between RELENG_6_2 and RELENG_6, so I think you should expect no change. I have backported some interesting changes against bge(4) here: http://people.freebsd.org/~delphij/misc/patch-bge-releng62 You may try it out to see if things gets improved, but I am not very confident about that, because the changes are mostly unrelated to BCM5752. Also, we would appreciate if you could help to test if the latest -CURRENT snapshot can make your NIC to work correctly, because it looks like that we need a more complete MFC of the bge(4) changes, not just the ones I have chosen (for my personal needs). Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: second cpu not used on smp platform
Oles Hnatkevych wrote: Hello! Just cvsup-ed and upgraded to 6.2-STABLE. The box has hyperthreading processor: # more /var/run/dmesg.boot |grep -i cpu CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (3000.37-MHz 686-class CPU) Logical CPUs per core: 2 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu0: ACPI CPU on acpi0 cpu1: ACPI CPU on acpi0 SMP: AP CPU #1 Launched! [...] What it can be This is done intentionally on all -STABLE branches, see http://security.freebsd.org/advisories/FreeBSD-SA-05:09.htt.asc for details. Note that it is possible to re-enable htt if you know what you are doing, by setting machdep.hyperthreading_allowed. Please note that enabling hyperthreading does hurt performance for many cases, so you will want to evaluate if you really want it, or to disable it from BIOS. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 6.2 Release - Adaptec 2130SLP driver?? issue - aac driver
Jeff Royle wrote: Jeff Royle wrote: I could use some advice on this issue I have had with my raid controller. I am not really running much on the system yet, postfix, Pf + pflogd, rlogind, ssh, bsnmp and ntpd. While I was just reading a file with less the system stopped responding. I thought it was the network interfaces but I was able to ping the interface. Once I plugged a monitor into the system I saw this (roughly): AAC0: COMMAND SOME HEX TIMEOUT AFTER X number of seconds Not good :) Reset of the system resolved the issue and it booted fine.Since the controller stopped responding nothing was recorded to my logs. Now I have to figure out how to prevent that from happening again. Basic run down on the system and some history... P4 3.2Ghz Asus P5MT-S MB 2 x 1GB DDR2 667 memory Adaptec 2130SLP Raid Controller + battery backup module 2 Segate Ultra320 73GB 15k RPM (mirrored) I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2 without this issue.I was using the driver released by Adaptec while testing the pre-release installs (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). You could say I am fairly confidient in the hardware itself. I have put this system through a lot of testing since BETA3. The 6.2 release kernel has not been customized all that much, I just pulled out all the drivers I would never use.To be safe I kept just about all scsi devices/card models still in as I continued my testing of 6.2 release. Right now I am going to try taking out aac and aacp then try the driver I used in my previous tests.However, since I have run a week without this issue it will be hard/impossible tell if this did anything to resolve it...I almost want a crash on the old driver :) So I need some advice... How best do I debug this issue? Thanks in advance for any direction you guys can offer me. Cheers, Jeff It appears the driver I was using in my pre-release testing is newer then the release driver. Stock driver in 6.2r dmesg: aac0: Adaptec SCSI RAID 2130S mem 0xfc60-0xfc7f,0xfc5ff000-0xfc5f irq 24 at device 1.0 on pci2 aac0: New comm. interface enabled aac0: Adaptec Raid Controller 2.0.0-1 aacp0: SCSI Passthrough Bus on aac0 Currently using: aacu0: Adaptec SCSI RAID 2130S mem 0xfc60-0xfc7f,0xfc5ff000-0xfc5f irq 24 at device 1.0 on pci2 aacu0: New comm. interface enabled aacu0: Adaptec Raid Controller 2.0.7-1 aacpu0: SCSI Passthrough Bus on aacu0 Going to continue testing with the newer driver. I have some preliminary work on merging the Adaptec driver: http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518 But one of the reviewers has advised me to request boarder testing, especially against old cards and CLI tools, so I have hold the commit for now. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 6.2 Release - Adaptec 2130SLP driver?? issue - aac driver
Vivek Khera wrote: On Jan 19, 2007, at 11:52 AM, LI Xin wrote: I have some preliminary work on merging the Adaptec driver: http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518 But one of the reviewers has advised me to request boarder testing, especially against old cards and CLI tools, so I have hold the commit for now. My newer 2230SLP cards do not work with any extant command line tools for freebsd under amd64. The older cards did. I've tested FreeBSD 6.0 and 6.1. 6.2 is on the agenda to test soon. Do you mean Linux CLI tools on FreeBSD? I think I have missed my src/sys/dev/aac/aac_linux.c,v 1.4 change with re@ so I think there might be no change. Just MFC'ed that to RELENG_6. I shall have a look at your merged driver. It won't be a regression for me if the CLI tools stop working :-( Thanks! Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: ncurses
Stephen Montgomery-Smith wrote: In the cvs repository, there has appeared src/lib/ncurses, which seems to be a copy of lib/libncurses. Is this meant to be? Yes, this is a repo-copy in preparation of ncurses update. BTW. I think the files appeared on -HEAD, no? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Any way to solve watchdog timeout on network-IF?
KAWAGUTI Ginga wrote: Hi. I'm using FreeBSD/i386 6-stable on HP DL360G5 server, and getting Watchdog timeout link state changed to DOWN/UP messages shown below. bce0: /usr/src/sys/dev/bce/if_bce.c(5000): Watchdog timeout occurred, resetting! bce0: link state changed to DOWN bce0: link state changed to UP [System I'm using:] * CPU Xeon 51xx, * Intel 5000X chipset * NIC: bce (1.2.2.6 2006/10/24) bce0: Broadcom NetXtreme II BCM5708 1000Base-T (B1), v0.9.6 mem 0xf800-0xf9ff irq 16 at device 0.0 on pci3 bce0: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz Try this patch, it solved some problems we have observed in China community. For -CURRENT: http://people.freebsd.org/~delphij/for_review/patch-bce-watchdog-rewrite-HEAD.20070109 For -STABLE: http://people.freebsd.org/~delphij/misc/patch-bce-watchdog-rewrite Note that the -STABLE version is an old, but tested one. The -CURRENT version is optimized, but I have not stress tested yet. Both should apply to the different branch cleanly, though. I would appreciate if you would test the -HEAD patch, even with RELENG_6 and report if it solves your problem. But if the server is important to you then just use the latter. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Marvell 8053 support?
Mars G. Miro wrote: Greetz! I happen to play w/ a newish Gigabyte mobo that has an on-board Marvell 8053 GigE NIC. FreeBSD 6.2-BETA2 i386, amd64 does not seem to support it. Does anyone know if this is supported? I'm currently taking a look at http://people.freebsd.org/~yongari/msk/ , tho this NIC isnt mentioned there at all. A quick glance at msk(4) on -CURRENT shows that 88E8053 is supposed to work with it. Unfortunately it seems that it needs some KPIs that is only present in -HEAD so backporting is not trivial. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Fatal Trap 12 in 6.2-PRERELEASE
J. W. Ballantine wrote: Yes, I saw the thread and I did a cvsup prior to rebuilding everything. What I'm asking about is the best way to get the system working again. Is it a reinstall, or is there a simple way to put a working kernel in? Simplest way would be to boot from /boot/kernel.old/kernel I guess. Then you will be able to do cvsup and rebuild. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Marvell 8053 support?
Mars G. Miro wrote: On 1/9/07, LI Xin [EMAIL PROTECTED] wrote: Mars G. Miro wrote: Greetz! I happen to play w/ a newish Gigabyte mobo that has an on-board Marvell 8053 GigE NIC. FreeBSD 6.2-BETA2 i386, amd64 does not seem to support it. Does anyone know if this is supported? I'm currently taking a look at http://people.freebsd.org/~yongari/msk/ , tho this NIC isnt mentioned there at all. A quick glance at msk(4) on -CURRENT shows that 88E8053 is supposed to work with it. Unfortunately it seems that it needs some KPIs that is only present in -HEAD so backporting is not trivial. hmmm... downloading CURRENT snapshots now.. lesse if this works... The mobo is one of those 'solid-capacitor' mobos (just mentioned on /.) from Gigabyte http://www.gigabyte.com.tw/Products/Motherboard/Products_Spec.aspx?ClassValue=MotherboardProductID=2295ProductName=GA-965P-DQ6 Not sure if the snapshot contained the changes, though... IIRC the January snapshot is not released yet? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: FreeBSD 6.2-RC2 Available - networking zoneli freeze problem still exist.
Bruce A. Mah wrote: If memory serves me right, LI Xin wrote: Ken Smith wrote: On Thu, 2006-12-28 at 16:01 +0100, Thomas Herrlin wrote: It still runs networking daemons into a frozen zoneli state on heavy/(D)DOS network loads. Such processes cant be kill-9ed so there is no way to recover from it. (think frozen sshd and a very remote/headless server). See the stress test panic called 'Ran out of 128 Bucket http://people.FreeBSD.org/%7Epho/stress/log/cons210.html' on the 6.2 todo list and my own latest test here: http://www.maniacs.se/~junics/temp/vmstat-z.txt This test was on a new 6.2-RC2 install with no zone limit tweaks nor any sbsize limits in /etc/login.conf. I just made a vm disk image with replication instructions, however Peter Holm have replicated it with his own tools so i have not bothered with it until now. That problem is being worked on but won't be fixed for 6.2-REL. Depending on how complex the fix winds up being it may be an Errata candidate when the time comes. Perhaps we should mention some known workarounds in the errata documentation. E.g. raising nmbclusters limit, etc.? That's a good idea. Do you have more specifics (e.g. any particular nmbclusters value, other workarounds, etc.)? The current workaround is that set the following in /boot/loader.conf: kern.ipc.nmbclusters=0 And reboot; Note that this is not perfect as it can lead to the need of increasing KVA space under certain load. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: running mksnap_ffs
Willem Jan Withagen wrote: Hi, I got the following Filesystem: FilesystemSizeUsed Avail Capacity iused ifree %iused /dev/da0a 1.3T422G823G34% 565952 1828334700% Running of a 3ware 9550, on a dual core Opteron 242 with 1Gb. The system is used as SMB/NFS server for my other systems here. I would like to make weekly snapshots, but manually running mksnap_ffs freezes access to the disk (I sort of expected that) but the process never terminates. So I let is sit overnight, but looking a gstat did not reveil any activity what so ever... The disk was not released, mksnap_ffs could not be terminated. And things resulted in me rebooting the system. So: - How long should I expect making a snapshot to take: 5, 15, 30min, 1, 2 hour or even more??? This depends how much cylinder groups do you have. If you have a lot of large files, using newfs -b 32768 instead of the default settings would speed up the process drastically. Note that this might be unfeasable because you already have data on the disk. Another suggestion is to separate the volume into smaller slices, this would reduce the impact. BTW. Our experience with a semi full 1.3T volume is that the snapshot would take about 1 hour on FreeBSD 5.x, but I doubt that it is not really comparable to your situation as the hardware is very different. - How do I diagnose the reason why it is not terminating? This might be somewhat complicated. Check out the developers' handbook. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: mpt0 timed out
wsk wrote: hi, scsi card connected to a disk array.and get follow error after less than 10 days thanks in advance. You did not mentioned which release are you using... BTW. Have you tried if lowering down the connection speed (i.e. use 160M/s instead of 320M/s) would make things better? If that helps, check the cable. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: FreeBSD 6.2-RC2 Available - networking zoneli freeze problem still exist.
Ken Smith wrote: On Thu, 2006-12-28 at 16:01 +0100, Thomas Herrlin wrote: It still runs networking daemons into a frozen zoneli state on heavy/(D)DOS network loads. Such processes cant be kill-9ed so there is no way to recover from it. (think frozen sshd and a very remote/headless server). See the stress test panic called 'Ran out of 128 Bucket http://people.FreeBSD.org/%7Epho/stress/log/cons210.html' on the 6.2 todo list and my own latest test here: http://www.maniacs.se/~junics/temp/vmstat-z.txt This test was on a new 6.2-RC2 install with no zone limit tweaks nor any sbsize limits in /etc/login.conf. I just made a vm disk image with replication instructions, however Peter Holm have replicated it with his own tools so i have not bothered with it until now. That problem is being worked on but won't be fixed for 6.2-REL. Depending on how complex the fix winds up being it may be an Errata candidate when the time comes. Perhaps we should mention some known workarounds in the errata documentation. E.g. raising nmbclusters limit, etc.? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: deadlock in zoneli state on 6.2-PRERELEASE
Hi, Nikolay, Our local customer has applied the following and it seems to have 'solved' their problem on squid: echo kern.ipc.nmbclusters=0 /boot/loader.conf and then reboot. They have been running with the 20061212 patch but I suspect that it's no longer necessary. The feedback I have received is that they have run with this for two weeks without problem, serving video streams. Please let us know if this works. Wish you a happy new year :-) Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Practically missing cpio(1L) man page since Oct 22 2006
The point of MFC'ing cpio(1) changes is that it has fixed some old bugs that can potentially damage user data. Personally I'd rather replace it with the BSD pax(1) found in the base system, if we had a proper GNU cpio(1) test suite to make sure that we did not break something. It might be interesting to do check the recent NetBSD/OpenBSD changes and apply the appropriate ones. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: truss missing some files or directories
Ilya Vishnyakov wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Problem with truss, won't start. Please advise me what to do. truss /bin/echo hello truss: truss: cannot open /proc/42520/mem cannot open /proc/curproc/mem: : No such file or directory No such file or directory uname -a output: FreeBSD mars.mysite.com 6.1-RELEASE-p10 FreeBSD 6.1-RELEASE-p10 #2: Thu Nov 2 13:58:09 UTC 2006 [EMAIL PROTECTED] :/usr/obj/usr/src/sys/MARS amd64 Thank you. You should do mount_procfs procfs /proc before doing truss. Cheers, -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: When will new changes in BCE driver for vlans be included in stable?
Olivier Mueller wrote: Le 29 nov. 06 à 01:34, Scott Long a écrit : I just merged the fixes into RELENG_6 and RELENG_6_2. Thanks for the reminder. thanks, but what about RELENG_6_1? Under freebsd 6.1 it is still not coming: I have this in my cvsup file:*default release=cvs tag=RELENG_6_1 Should I rather use RELENG_6 to stay uptodate? After a cvsup now, I get this version from 14.04.2006: -rw-r--r--1 root wheel 217151 Apr 14 2006 if_bce.c __FBSDID($FreeBSD: src/sys/dev/bce/if_bce.c,v 1.2.4.2 2006/04/13 22:42:07 ps Exp $); RELENG_6_1 is frozen and only security and errata changes are allowed under security-officer@'s permission (IIRC re@'s approval is required for errata changes). Because we are working on 6.2-RELEASE, it's likely that these changes would never get into RELENG_6_1. So, there are three options: - Manually checkout RELENG_6_2's sys/dev/bce and replace yours, compile, install a new kernel. This way would bring your bce(4) driver up-to-date, while retaining other parts untouched. For Dell [12]950 boxes, I think mfi(4) changes are also useful. - Upgrade to RELENG_6_2. This brings a lot of stability enhancements. However, if this is a production server you may want to wait for the final RELEASE. - Use RELENG_6. This is potentially somewhat risky, but newly discovered bugs are fixed in RELENG_6 and the risk is less than running a raw -CURRENT. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: SMP Kernel Panic while heavy load.
hshh wrote: Hi, My FreeBSD server panic while running SMP kernel. All work fine if disable SMP feature. uname log: http://upload.hshh.org/homes/hshh/temp/panic/uname.log My kernel conf: http://upload.hshh.org/homes/hshh/temp/panic/kernel.log Backtrace log: http://upload.hshh.org/homes/hshh/temp/panic/kgdb.log Would you please disable ZEROCOPY_SOCKETS and try if things gets improved? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: deadlock in zoneli state on 6.2-PRERELEASE
Nikolay Pavlov wrote: No luck at all. patch-zonelim-drain-20061212 works for me as a previos one. no panics, but still zoneli. All this is very odd, because other two squid servers works perfectly in the same loadbalancer with out any patches and kernel panics. I think that the case with this server is realy rare. Would you please give a vmstat -z output when the server stuck in the zonelim livelock? Thanks! Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: deadlock in zoneli state on 6.2-PRERELEASE
Hi, Would you please give the following patch a try? http://people.freebsd.org/~delphij/misc/patch-zonelim-drain Note: Please revert my previous patch against sys/kern/kern_mbuf.c. This patch should be applied against sys/vm/ [RELENG_6 or RELENG_6_2], it schedules a drain of uma zones when they are low on memory. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Zoneli State / Nttcp client.
[EMAIL PROTECTED] wrote: Hi, I have a script where we start a nttcp for some 500 nttcp client in back ground. After some time I could see the nttcp clients are listed in the TOP command as Zoneli state. Can any one please let me know what is meant by Zoneli state? Test Script: = count=1 while [ $count -le 2000 ] do ifconfig xge1 17.1.1.25 promisc up ./nttcp -t -l65536 -w227 -P120 17.1.1.152 echo count is $count count=`expr $count + 1` done Would you please help to test whether the patch located at: http://people.freebsd.org/~delphij/misc/patch-zonelim-drain To see if there is change? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: deadlock in zoneli state on 6.2-PRERELEASE
Hi, Nikolay, Nikolay Pavlov wrote: On Monday, 11 December 2006 at 15:59:03 +0800, LI Xin wrote: Hi, Would you please give the following patch a try? http://people.freebsd.org/~delphij/misc/patch-zonelim-drain Note: Please revert my previous patch against sys/kern/kern_mbuf.c. This patch should be applied against sys/vm/ [RELENG_6 or RELENG_6_2], it schedules a drain of uma zones when they are low on memory. This time things worked out a bit better, there was no Kernel panic and my server managed to overcome the magic number 65550 mbufs. But very soon the server reached another limit - 131072 mbuf clusters Do you still get squid stuck in zoneli state and the server became unresponsive? (This is my limit for kern.ipc.nmbclusters). And server started to drop the packets. After I've removed the overload I found my server responding but when I actually accessed it I found out that although the number of connections has reduces considerably, the memory allocated for the net did not become free. So I believe that there is still a mbufs leak somewhere. This looks weird to me... Would you please try to add some load to the server and remove afterwards, to see if the 'current' mbuf clusters keeps increasing or not? Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: deadlock in zoneli state on 6.2-PRERELEASE
Hi, Nikolay, Nikolay Pavlov wrote: On Tuesday, 12 December 2006 at 2:59:37 +0800, LI Xin wrote: Hi, Nikolay, Nikolay Pavlov wrote: On Monday, 11 December 2006 at 15:59:03 +0800, LI Xin wrote: Hi, Would you please give the following patch a try? http://people.freebsd.org/~delphij/misc/patch-zonelim-drain Note: Please revert my previous patch against sys/kern/kern_mbuf.c. This patch should be applied against sys/vm/ [RELENG_6 or RELENG_6_2], it schedules a drain of uma zones when they are low on memory. This time things worked out a bit better, there was no Kernel panic and my server managed to overcome the magic number 65550 mbufs. But very soon the server reached another limit - 131072 mbuf clusters Do you still get squid stuck in zoneli state and the server became unresponsive? Yes. No panic, but still idle in zoneli and the server become unresponsive via network. I am not quite sure but please let me know if the patch: http://people.freebsd.org/~delphij/misc/patch-zonelim-drain-20061212 would help the situation. You can also modify wakeup_one(keg); to wakeup(keg); manually, at sys/vm/uma_core.c:2507. Note that this would be a (potentially) a pessimization for zonelim case, where all zonelim'ed threads would be waken rather than only one. However, confirming whether this would help your situation, would help to narrow down the issue so we can think about a better solution for it. Thanks! Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: deadlock in zoneli state on 6.2-PRERELEASE
Nikolay Pavlov wrote: On Friday, 24 November 2006 at 11:11:48 +0800, LI Xin wrote: Nikolay Pavlov wrote: On Thursday, 23 November 2006 at 20:24:15 +0800, [EMAIL PROTECTED] wrote: Hi, On Wed, 22 Nov 2006 21:55:49 +0200, Nikolay Pavlov [EMAIL PROTECTED] wrote: Hi. It seems i have a deadlock on 6.2-PRERELEASE. This is squid server in accelerator mode. I can easily trigger it with a high rate of requests. Squid is locked on some zoneli state, i am not sure what it is. Also i can't KILL proccess even with SIGKILL. In addition one of sshd proccess is locked too. Would you please update to the latest RELENG_6 and apply this patch: http://people.freebsd.org/~delphij/misc/patch-zonelimit-workaround to see if things gets improved? Thanks in advance! Cheers, Well. This patch works quite ambiguous for me. Under heavy load this box become unresponseble via network. System is mostly idle. Squid is locked in zoneli. Another panic. Guys do i need some additional debug options or this info is enough. I am asking because this panic is easily reproduceable for me. I think these stuff is enough. By the way, which scheduler do you use? [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ACCEL# kgdb kernel.debug /var/crash/vmcore.4 kgdb: kvm_nlist(_stopped_cpus): kgdb: kvm_nlist(_stoppcbs): [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd. Unread portion of the kernel message buffer: lock order reversal: (sleepable after non-sleepable) 1st 0xca21567c so_snd (so_snd) @ /usr/src/sys/netinet/tcp_output.c:253 2nd 0xc070bd84 user map (user map) @ /usr/src/sys/vm/vm_map.c:3074 KDB: stack backtrace: kdb_backtrace(,c071ccb0,c071c210,c06e5c4c,c0758f18,...) at kdb_backtrace+0x29 witness_checkorder(c070bd84,9,c06be56c,c02,c070d2c4,0,c06aab25,9f) at witness_checkorder+0x4cd _sx_xlock(c070bd84,c06be56c,c02) at _sx_xlock+0x2c _vm_map_lock_read(c070bd40,c06be56c,c02,184637d,c92796b0,...) at _vm_map_lock_read+0x37 vm_map_lookup(f48a29d0,0,1,f48a29d4,f48a29c4,f48a29c8,f48a29ab,f48a29ac) at vm_map_lookup+0x28 vm_fault(c070bd40,0,1,0,c927aa80,...) at vm_fault+0x65 trap_pfault(f48a2a98,0,c) at trap_pfault+0xee trap(8,c06b0028,f48a0028,1,0,...) at trap+0x325 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0xc053ea34, esp = 0xf48a2ad8, ebp = 0xf48a2ae4 --- m_copydata(0,,1,d0020d74,c1040468,...) at m_copydata+0x28 tcp_output(d21c5570) at tcp_output+0x9af tcp_input(d0020d00,14,e9,93935ce,0,...) at tcp_input+0x24a2 ip_input(d0020d00) at ip_input+0x561 netisr_processqueue(c075a6d8) at netisr_processqueue+0x6e swi_net(0) at swi_net+0xc2 ithread_execute_handlers(c9279648,c92c3400) at ithread_execute_handlers+0xce ithread_loop(c92436a0,f48a2d38,c070db20,0,c06a818a,...) at ithread_loop+0x4e fork_exit(c04f76d4,c92436a0,f48a2d38) at fork_exit+0x61 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xf48a2d6c, ebp = 0 --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0xc fault code = supervisor read, page not present instruction pointer = 0x20:0xc053ea34 stack pointer = 0x28:0xf48a2ad8 frame pointer = 0x28:0xf48a2ae4 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13 (swi1: net) trap number = 12 panic: page fault KDB: stack backtrace: kdb_backtrace(100,c927aa80,28,f48a2a98,c,...) at kdb_backtrace+0x29 panic(c069b8a1,c06c5f2c,0,f,c927d69b,...) at panic+0xa8 trap_fatal(f48a2a98,c,c927aa80,c070bd40,0,...) at trap_fatal+0x2a6 trap_pfault(f48a2a98,0,c) at trap_pfault+0x187 trap(8,c06b0028,f48a0028,1,0,...) at trap+0x325 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0xc053ea34, esp = 0xf48a2ad8, ebp = 0xf48a2ae4 --- m_copydata(0,,1,d0020d74,c1040468,...) at m_copydata+0x28 tcp_output(d21c5570) at tcp_output+0x9af tcp_input(d0020d00,14,e9,93935ce,0,...) at tcp_input+0x24a2 ip_input(d0020d00) at ip_input+0x561 netisr_processqueue(c075a6d8) at netisr_processqueue+0x6e swi_net(0) at swi_net+0xc2 ithread_execute_handlers(c9279648,c92c3400) at ithread_execute_handlers+0xce ithread_loop(c92436a0,f48a2d38,c070db20,0,c06a818a,...) at ithread_loop+0x4e fork_exit(c04f76d4,c92436a0,f48a2d38) at fork_exit+0x61 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xf48a2d6c, ebp = 0 --- Uptime: 25m13s Dumping 3967 MB (3 chunks
Re: deadlock in zoneli state on 6.2-PRERELEASE
Nikolay Pavlov wrote: On Thursday, 23 November 2006 at 20:24:15 +0800, [EMAIL PROTECTED] wrote: Hi, On Wed, 22 Nov 2006 21:55:49 +0200, Nikolay Pavlov [EMAIL PROTECTED] wrote: Hi. It seems i have a deadlock on 6.2-PRERELEASE. This is squid server in accelerator mode. I can easily trigger it with a high rate of requests. Squid is locked on some zoneli state, i am not sure what it is. Also i can't KILL proccess even with SIGKILL. In addition one of sshd proccess is locked too. Would you please update to the latest RELENG_6 and apply this patch: http://people.freebsd.org/~delphij/misc/patch-zonelimit-workaround to see if things gets improved? Thanks in advance! Cheers, Well. This patch works quite ambiguous for me. Under heavy load this box become unresponseble via network. System is mostly idle. Squid is locked in zoneli. Would you please give me the output of sysctl vm.zone on a patched system? It's not important whether it is loaded. ast pid: 840; load averages: 0.26, 0.24, 0.17 up 0+00:11:50 10:19:46 34 processes: 1 running, 33 sleeping CPU states: 0.4% user, 0.0% nice, 0.4% system, 1.5% interrupt, 97.8% idle Mem: 225M Active, 144M Inact, 261M Wired, 12K Cache, 112M Buf, 3259M Free Swap: 4070M Total, 4070M Free PID USERNAME THR PRI NICE SIZERES STATETIME WCPU COMMAND 682 squid 1 -160 207M 207M zoneli 2:18 6.59% squid 709 root1 -80 7768K 7240K piperd 0:00 0.00% perl5.8.8 691 root1 960 6632K 4796K select 0:00 0.00% snmpd 829 root1 76 -20 2400K 1648K RUN 0:00 0.00% top 790 quetzal 1 960 6220K 3252K select 0:00 0.00% sshd 788 root1 40 6232K 3232K sbwait 0:00 0.00% sshd 837 root1 200 5048K 3024K pause0:00 0.00% tcsh 832 root1 40 6232K 3236K sbwait 0:00 0.00% sshd 820 root1 200 4700K 2856K pause0:00 0.00% tcsh 645 root1 960 2984K 1808K select 0:00 0.00% ntpd 791 quetzal 1 200 4708K 2872K pause0:00 0.00% tcsh 560 root1 960 1352K 996K select 0:00 0.00% syslogd 362 _pflogd 1 -580 1600K 1144K bpf 0:00 0.00% pflogd 835 quetzal 1 200 4728K 2960K pause0:00 0.00% tcsh 688 squid 1 -80 1224K 632K piperd 0:00 0.00% unlinkd 834 quetzal 1 960 6220K 3252K select 0:00 0.00% sshd 840 root1 200 1540K 960K pause0:00 0.00% netstat 719 root1 960 3464K 2796K select 0:00 0.00% sendmail 729 root1 80 1364K 1060K nanslp 0:00 0.00% cron [EMAIL PROTECTED]:~# netstat -h 1 input(Total) output packets errs bytespackets errs bytes colls 1.6K 0 1.3M 1.5K 0 1.6M 0 1.8K 0 1.6M 1.7K 0 1.6M 0 1.3K 0 1.0M 1.4K 0 1.4M 0 1.5K 0 1.3M 1.5K 0 1.4M 0 1.6K 0 1.4M 1.6K 0 1.5M 0 1.7K 0 1.5M 1.6K 0 1.5M 0 1.3K 0 830K 1.4K 0 1.5M 0 1.1K 0 679K 1.3K 0 1.4M 0 812 0 501K912 0 971K 0 1.2K 0 1.1M 1.2K 0 1.1M 0 617 0 325K742 0 806K 0 634 0 312K769 0 818K 0 1.8K 0 1.7M 1.5K 0 1.1M 0 11K 013M 7.5K 0 3.8M 0 10K 012M 8.0K 0 5.2M 0 9.7K 0 9.9M 8.2K 0 6.3M 0 513 1.7K 666K328 0 151K 0 ^^ Here goes load... 1.0K 543 782K434 0 247K 0 0 2.3K 0 0 0 0 0 2 605 1.5K 2 0132 0 input(Total) output packets errs bytespackets errs bytes colls 0 334 0 0 0 0 0 0 286 0 0 0 0 0 0 288 0 0 0 0 0 819 204 689K328 0 122K 0 0 1.7K 0 0 0 0 0 866 1.2K 719K375 0 141K 0 144 1.5K 175K111 055K 0 0 1.3K 0 0 0 0 0 687 182 426K304 073K 0 0 3.2K 0 0 0 0 0 1.0K 0 723K405 0
Re: FW: Zoneli State / Nttcp client.
To quote the original post: Hi, I have a script where we start a nttcp for some 500 nttcp client in back ground. After some time I could see the nttcp clients are listed in the TOP command as Zoneli state. Can any one please let me know what is meant by Zoneli state? Test Script: = count=1 while [ $count -le 2000 ] do ifconfig xge1 17.1.1.25 promisc up ./nttcp -t -l65536 -w227 -P120 17.1.1.152 echo count is $count count=`expr $count + 1` done Thanks, ~Siva The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com ast pid: 6790; load averages: 0.00, 0.00, 0.28 up 0+00:56:58 17:57:26 229 processes: 1 starting, 1 running, 227 sleeping CPU states: 0.4% user, 0.0% nice, 0.4% system, 1.1% interrupt, 98.1% idle Mem: 48M Active, 6368K Inact, 138M Wired, 9184K Buf, 804M Free Swap: 487M Total, 487M Free PID USERNAME THR PRI NICE SIZERES STATETIME WCPU COMMAND 6664 root1 60 0K 0K START0:04 0.00% login 6756 root1 960 2700K 1976K select 0:02 0.00% top 642 root1 60 1272K 888K ttywri 0:00 0.00% vmstat 304 root1 960 1292K 852K select 0:00 0.00% syslogd 540 root1 960 3152K 2260K select 0:00 0.00% telnetd 525 root1 -160 3152K 2232K zoneli 0:00 0.00% telnetd 6790 root1 960 2636K 1912K RUN 0:00 0.00% top 6663 root1 -160 3152K 2260K zoneli 0:00 0.00% telnetd 5870 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 6329 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 6221 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 662 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 668 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 512 root1 80 1620K 1324K wait 0:00 0.00% login 1382 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 511 root1 -160 3152K 2232K zoneli 0:00 0.00% telnetd 545 root1 200 3872K 2632K pause0:00 0.00% csh 1433 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 1418 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 1415 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 5747 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 6760 root1 960 3152K 2260K select 0:00 0.00% telnetd 3653 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 1211 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 6765 root1 200 3872K 2636K pause0:00 0.00% csh 516 root1 200 3868K 2600K pause0:00 0.00% csh 530 root1 50 3868K 2600K ttyin0:00 0.00% csh 500 root1 80 1592K 1280K wait 0:00 0.00% login 6632 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 6452 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 1667 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 3578 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 1814 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 6413 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp 1562 214748361 -160 1332K 1008K zoneli 0:00 0.00% nttcp Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: Install onto 6Tb array
Lawrence Farr wrote: I'd like to install stable directly onto a 6Tb Areca array, and have run out of clues how to do it. In the past I've always used a separate boot volume that was small enough for sysinstall to work with, but I have no space to fit more drives in this chassis. I've tried putting a small partition on the start, which boots fine, but screws up the disk sizing as the geometry is wrong I guess? tried diskprep, but that gets the gemometry wrong as well. Is the only option to use gpt and restore to it from another boot device and hope that the bios can boot it? I am afraid so... 6TB is way too large for MBR to manage (which can support 4TB at maximum). Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 5 to 6
Randy Bush wrote: do folk actually successfully upgrade # uname -a FreeBSD psg.com 5.5-STABLE FreeBSD 5.5-STABLE #15: Sun Oct 1 18:41:24 GMT 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PSG i386 to RELENG_6 *safely* on a many-user production system using the instructions in UPDATING? I have did several (50+) updates from 5.4-RELEASE to a custom build of 6.1-RELEASE through a similar way that Colin has posted on his website[1]. Basically it's fairly safe to upgrade 5.4-R to 6.1-R by either src/ update (src/UPDATING) or binary update way. Note that if you want a clean system then you will want to do make delete-old and make delete-old-libs, you are advised to install compat5x before removing the libraries. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 5 to 6
LI Xin wrote: Randy Bush wrote: do folk actually successfully upgrade # uname -a FreeBSD psg.com 5.5-STABLE FreeBSD 5.5-STABLE #15: Sun Oct 1 18:41:24 GMT 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PSG i386 to RELENG_6 *safely* on a many-user production system using the instructions in UPDATING? I have did several (50+) updates from 5.4-RELEASE to a custom build of 6.1-RELEASE through a similar way that Colin has posted on his website[1]. Basically it's fairly safe to upgrade 5.4-R to 6.1-R by Oops... [1] http://www.daemonology.net/freebsd-upgrade-5.3-to-5.4/ Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature