Re: [releng_5 tinderbox] failure on sparc64/sparc64
Sorry- my bad. I'll fix shortly. On 5/10/07, FreeBSD Tinderbox [EMAIL PROTECTED] wrote: TB --- 2007-05-10 14:42:44 - tinderbox 2.3 running on freebsd-stable.sentex.ca TB --- 2007-05-10 14:42:44 - starting RELENG_5 tinderbox run for sparc64/sparc64 TB --- 2007-05-10 14:42:44 - cleaning the object tree TB --- 2007-05-10 14:43:10 - checking out the source tree TB --- 2007-05-10 14:43:10 - cd /tinderbox/RELENG_5/sparc64/sparc64 TB --- 2007-05-10 14:43:10 - /usr/bin/cvs -f -R -q -d/home/ncvs update -Pd -rRELENG_5 src TB --- 2007-05-10 14:56:20 - building world (CFLAGS=-O -pipe) TB --- 2007-05-10 14:56:20 - cd /src TB --- 2007-05-10 14:56:20 - /usr/bin/make -B buildworld Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything TB --- 2007-05-10 15:40:18 - generating LINT kernel config TB --- 2007-05-10 15:40:18 - cd /src/sys/sparc64/conf TB --- 2007-05-10 15:40:18 - /usr/bin/make -B LINT TB --- 2007-05-10 15:40:18 - building LINT kernel (COPTFLAGS=-O -pipe) TB --- 2007-05-10 15:40:18 - cd /src TB --- 2007-05-10 15:40:18 - /usr/bin/make buildkernel KERNCONF=LINT Kernel build for LINT started on Thu May 10 15:40:18 UTC 2007 stage 1: configuring the kernel stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3.1: making dependencies stage 3.2: building everything [...] cc -c -O -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/src/sys -I/src/sys/contrib/dev/acpica -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd -I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float -ffreestanding -Werror /src/sys/dev/isp/isp.c cc -c -O -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/src/sys -I/src/sys/contrib/dev/acpica -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd -I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float -ffreestanding -Werror /src/sys/dev/isp/isp_freebsd.c cc -c -O -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/src/sys -I/src/sys/contrib/dev/acpica -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd -I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float -ffreestanding -Werror /src/sys/dev/isp/isp_library.c cc -c -O -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/src/sys -I/src/sys/contrib/dev/acpica -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd -I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float -ffreestanding -Werror /src/sys/dev/isp/isp_target.c cc -c -O -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/src/sys -I/src/sys/contrib/dev/acpica -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd -I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float -ffreestanding -Werror /src/sys/dev/isp/isp_pci.c cc -c -O -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual
Re: clock problem
M. Warner Losh wrote: Peter Jeremy wrote: : There seems to be a bug in ntpd where the PLL can saturate at : +/-500ppm and will not recover. This problem seems too occur mostly : where the reference servers have lots of jitter (ie a fairly congested : link to them). Yes. This is a rather interesting misfeature of ntpd. Its rails are at +/- 500ppm, and when it hits the rail it assumes that things are too bad to continue and it stops. I think it is related to the maximum slew rate of 1/2000, which is equivalent to 500 ppm. The ntpd(8) manpage says: Since the slew rate of typical Unix kernels is limited to 0.5 ms/s, each second of adjustment requires an amortization interval of 2000 s. And a bit further down: The maximum slew rate possible is limited to 500 parts-per- million (PPM) as a consequence of the correctness principles on which the NTP protocol and algorithm design are based. As a result, the local clock can take a long time to converge to an acceptable offset, about 2,000 s for each second the clock is outside the acceptable range. Most PC clocks have a frequency error on the order of 10-150ppm, so it doesn't take a whole lot of jitter from a conjectsted remote network to exceed the limits... I think the burst and iburst options for the server lines in ntp.conf might help in such cases. Of course, the best solution is to buy a GPS or DCF radio receiver and set up a startum-1 yourself. But last time I tried to do that with a cheap DCF plug, it wasn't very well supported on FreeBSD. Even an expensive Mainberg receiver ( http://www.meinberg.de/english/ ) with an RS232 output worked much more accurately with a Solaris machine than with FreeBSD. (Unfortunately, the Mainberg model availbale to us did not have NTP support via ethernet itself, only serial output.) I have to admit that that was in FreeBSD 4.x days. The situation might have improved in the meantime (I don't know). Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd C++ is to C as Lung Cancer is to Lung. -- Thomas Funke ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
freebsd and securelevel question
Hi, So. The simple question is: Why FreeBSD has securelevel 0 if init sets it to 1, if it sees at boot that the level is 0? :) It's OK that it's in the manual, but there are two default ways to set securelevel at boot time also. I don't really get the point of this forced 0 to 1 changing. We'd like to use our machines with securelevel 0 by default, so I had comment out the relevant two lines from init.c. Regards, Andras ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: freebsd and securelevel question
* G?t Andr?s ([EMAIL PROTECTED]) wrote: So. The simple question is: Why FreeBSD has securelevel 0 if init sets it to 1, if it sees at boot that the level is 0? :) So when you boot to single user mode you can turn off immutable/append only flags etc, without letting those capabilities propagate into multiuser mode? We'd like to use our machines with securelevel 0 by default, so I had comment out the relevant two lines from init.c. init(8): -1Permanently insecure mode - always run the system in level 0 mode. This is the default initial value. -- Thomas 'Freaky' Hurst http://hur.st/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: freebsd and securelevel question
Gót András [EMAIL PROTECTED] wrote: So. The simple question is: Why FreeBSD has securelevel 0 if init sets it to 1, if it sees at boot that the level is 0? :) It's OK that it's in the manual, but there are two default ways to set securelevel at boot time also. I don't really get the point of this forced 0 to 1 changing. The reason is so that /etc/rc and all of the related startup scripts can run at level 0, which might be necessary for various reasons, and afterwards the level is autmatically increased to 1. If you don't want that, you should leave the level at the default of -1. We'd like to use our machines with securelevel 0 by default, so I had comment out the relevant two lines from init.c. Uhm, could you please explain why you want to do that? It doesn't make sense. Note that level -1 behaves exactly the same as level 0 (i.e. no restrictions at all), the only difference is that -1 prevents the automatic increase to level 1 when the system goes multi-user. So, if you want to run permanently without restrictions, then you should leave the secure level at the default value of -1. It's all explained in the init(8) manual page. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd Documentation is like sex; when it's good, it's very, very good, and when it's bad, it's better than nothing. -- Dick Brandon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: UNIX domain sockets MFC's
On Tue, 8 May 2007, Robert Watson wrote: Right now I am tracking two known issues with UNIX domain sockets in RELENG_6: - Reported NULL point derference in unp_connect(), which occurs due to the dropping of locks around sonewconn(). This is fixed in HEAD, and I am preparing an MFC of this patch. The fix for this has now been merged as 1.155.2.22. As there have been no new reports of UNIX domain socket problems in the last couple of days, it sounds like the MFC of the last batch of fixes and cleanups has not lead to problems. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock problem
On Thu, 10 May 2007, M. Warner Losh wrote: In message: [EMAIL PROTECTED] Martin Dieringer [EMAIL PROTECTED] writes: : well now it works without restrict: : # ntpq -p : remote refid st t when poll reach delay offset jitter : == : *time192.53.103.108 2 u 19 64 77 91.454 301.926 860.104 : : : and the clock is only 3 seconds late now... only 300ms late, or .3s you mean. Well it says so, but I meant 3 seconds, compared by eye to a radio clock. This is NOT a hardware problem. 1. I have this on 2 machines, 2. the problem is solved by switching to ACPI instead of APM m. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
[releng_6 tinderbox] failure on sparc64/sparc64
TB --- 2007-05-11 13:24:38 - tinderbox 2.3 running on freebsd-stable.sentex.ca TB --- 2007-05-11 13:24:38 - starting RELENG_6 tinderbox run for sparc64/sparc64 TB --- 2007-05-11 13:24:38 - cleaning the object tree TB --- 2007-05-11 13:25:06 - checking out the source tree TB --- 2007-05-11 13:25:06 - cd /tinderbox/RELENG_6/sparc64/sparc64 TB --- 2007-05-11 13:25:06 - /usr/bin/cvs -f -R -q -d/home/ncvs update -Pd -rRELENG_6 src TB --- 2007-05-11 13:35:38 - building world (CFLAGS=-O2 -pipe) TB --- 2007-05-11 13:35:38 - cd /src TB --- 2007-05-11 13:35:38 - /usr/bin/make -B buildworld Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything TB --- 2007-05-11 14:40:11 - generating LINT kernel config TB --- 2007-05-11 14:40:11 - cd /src/sys/sparc64/conf TB --- 2007-05-11 14:40:11 - /usr/bin/make -B LINT TB --- 2007-05-11 14:40:12 - building LINT kernel (COPTFLAGS=-O2 -pipe) TB --- 2007-05-11 14:40:12 - cd /src TB --- 2007-05-11 14:40:12 - /usr/bin/make buildkernel KERNCONF=LINT Kernel build for LINT started on Fri May 11 14:40:12 UTC 2007 stage 1: configuring the kernel stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3.1: making dependencies [...] awk -f /src/sys/tools/makeobjops.awk /src/sys/kern/linker_if.m -h awk -f /src/sys/tools/makeobjops.awk /src/sys/libkern/iconv_converter_if.m -h awk -f /src/sys/tools/makeobjops.awk /src/sys/dev/ofw/ofw_bus_if.m -h awk -f /src/sys/tools/makeobjops.awk /src/sys/sparc64/pci/ofw_pci_if.m -h rm -f .newdep /usr/bin/make -V CFILES -V SYSTEM_CFILES -V GEN_CFILES | MKDEP_CPP=cc -E CC=cc xargs mkdep -a -f .newdep -O2 -pipe -fno-strict-aliasing -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/dev/ath -I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float -ffreestanding /src/sys/dev/isp/isp_sbus.c:487:67: macro isp_dma_tag_create requires 12 arguments, but only 1 given mkdep: compile failed *** Error code 1 Stop in /obj/sparc64/src/sys/LINT. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2007-05-11 14:41:34 - WARNING: /usr/bin/make returned exit code 1 TB --- 2007-05-11 14:41:34 - ERROR: failed to build lint kernel TB --- 2007-05-11 14:41:34 - tinderbox aborted TB --- 0.95 user 2.65 system 4616.05 real http://tinderbox.des.no/tinderbox-releng_6-RELENG_6-sparc64-sparc64.full ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock too slow - big time offset with ntpdate
On Wed, 9 May 2007, Ian Smith wrote: Bottom line might be: if it hurts when you run powerd with APM, don't. If you want powerd to work, I'd suggest trying ACPI again ok, using ACPI solved the clock problem, the suspend problem has to be solved later m. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: UNIX domain sockets MFC's
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Friday, May 11, 2007 12:49:32 +0100 Robert Watson [EMAIL PROTECTED] wrote: On Tue, 8 May 2007, Robert Watson wrote: Right now I am tracking two known issues with UNIX domain sockets in RELENG_6: - Reported NULL point derference in unp_connect(), which occurs due to the dropping of locks around sonewconn(). This is fixed in HEAD, and I am preparing an MFC of this patch. The fix for this has now been merged as 1.155.2.22. As there have been no new reports of UNIX domain socket problems in the last couple of days, it sounds like the MFC of the last batch of fixes and cleanups has not lead to problems. I will work on upgrading that system right now to the latest -STABLE and let y ou know ... did you happen to receive my email concerning that java process in a soclose state? - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGRISl4QvfyHIvDvMRAhNVAJ94AKDAhNQIk3Kkq3PRbiru0a+T2QCfWglT kwaljA9wg70RKzqcyOwDz3U= =FuMA -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock problem
In message: [EMAIL PROTECTED] Martin Dieringer [EMAIL PROTECTED] writes: : This is NOT a hardware problem. 1. I have this on 2 machines, 2. the : problem is solved by switching to ACPI instead of APM It is a hardware problem. APM + powerd changes the frequency of the TSC. If the TSC is used as the time source, then you'll get bad timekeeping. ACPI uses its own frequency source that is much more stable and independent of the TSC, so switching to it fixes the problem because you are switching the hardware from using a really bad frequency source with ugly steps to using a good frequency source w/o steps. Warner ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock problem
In message: [EMAIL PROTECTED] Oliver Fromme [EMAIL PROTECTED] writes: : M. Warner Losh wrote: : Peter Jeremy wrote: : : There seems to be a bug in ntpd where the PLL can saturate at : : +/-500ppm and will not recover. This problem seems too occur mostly : : where the reference servers have lots of jitter (ie a fairly congested : : link to them). : : Yes. This is a rather interesting misfeature of ntpd. Its rails are : at +/- 500ppm, and when it hits the rail it assumes that things are : too bad to continue and it stops. : : I think it is related to the maximum slew rate of 1/2000, : which is equivalent to 500 ppm. The ntpd(8) manpage says: : : Since the slew rate of typical Unix kernels is limited to : 0.5 ms/s, each second of adjustment requires an amortization : interval of 2000 s. : : And a bit further down: : : The maximum slew rate possible is limited to 500 parts-per- : million (PPM) as a consequence of the correctness principles : on which the NTP protocol and algorithm design are based. : As a result, the local clock can take a long time to converge : to an acceptable offset, about 2,000 s for each second the : clock is outside the acceptable range. I think you are confusing two things here. One is the maximum frequency error of the system clock that ntpd can tolerate. The other is the maximum slew rate of the system clock. The actual error in nominal frequency of the system clock is what is recorded. When ntpd slams the system clock to do its 2ms/s adjustment, it still records the actual error. Since the original drift file was 500.000, this indicates a very bad clock. : Most PC clocks have a frequency error on the order of 10-150ppm, so it : doesn't take a whole lot of jitter from a conjectsted remote network : to exceed the limits... : : I think the burst and iburst options for the server lines : in ntp.conf might help in such cases. It might. : Of course, the best solution is to buy a GPS or DCF radio : receiver and set up a startum-1 yourself. But last time : I tried to do that with a cheap DCF plug, it wasn't very : well supported on FreeBSD. Even an expensive Mainberg : receiver ( http://www.meinberg.de/english/ ) with an RS232 : output worked much more accurately with a Solaris machine : than with FreeBSD. (Unfortunately, the Mainberg model : availbale to us did not have NTP support via ethernet : itself, only serial output.) I have to admit that that : was in FreeBSD 4.x days. The situation might have : improved in the meantime (I don't know). My company has used FreeBSD's ntpd since 3.x with a small, custom driver that I wrote. It turns out to work very well in practice. I'd suggest that it is well supported, even in FreeBSD 4.x. It isn't well documented. As for working better on Solaris, I've not done measurements there. I do know that our custom clock drivers typically stay less than a microsecond of the reference clock when the unit has good temperature stability, and three or five microseconds in a durunal swing when the units aren't well thermally regulated. Of course, we are using what is effecitvely a GPS disciplined Rubidium (Rb) oscillator's PPS as our time base. We haven't gone the extra step of using the Rb's 10MHz to synthesize frequencies for the motherboard, since we don't need system time to be that stable (it gives two or three more orders of magnitude of stability). We do use the kernel FLL, and our custom driver provides the absolute phase. We get slightly better results in 6.x than we did in 4.x because of a subtle bug in the kernel that phk fixed in the calculation of error where two terms that were almost the same had the wrong signs and almost cancelled out... Warner ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock problem
On Fri, 2007-05-11 at 08:53 -0600, M. Warner Losh wrote: In message: [EMAIL PROTECTED] Martin Dieringer [EMAIL PROTECTED] writes: : This is NOT a hardware problem. 1. I have this on 2 machines, 2. the : problem is solved by switching to ACPI instead of APM It is a hardware problem. APM + powerd changes the frequency of the TSC. If the TSC is used as the time source, then you'll get bad timekeeping. ACPI uses its own frequency source that is much more stable and independent of the TSC, so switching to it fixes the problem because you are switching the hardware from using a really bad frequency source with ugly steps to using a good frequency source w/o steps. Warner Surely that would imply that it is a software misconfiguration issue. If the TSC is unreliable under fairly standard duties, and there exists an alternate source that is reliable, surely that indicates the manufacturer has identified a problem, and solved it with alternate hardware. The failure then to use the correct hardware is a software misconfiguration. Cheers Tom signature.asc Description: This is a digitally signed message part
Re: panic: spin lock held too long (w/ backtrace)
[EMAIL PROTECTED] /usr/src/sys/i386/conf kldstat Id Refs AddressSize Name 13 0xc040 65e308 kernel 21 0xc0a5f000 59f20acpi.ko So, yes then :) Can you follow the steps for debugging modules and see if it gives a better trace? Kris Unfortunately, after prepping and adding the symbol file for acpi.ko, I got the exact same backtrace. Any other thoughts? Regards; Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Intel ICH5 UDMA100 controller TIMEOUT - READ_DMA
I am working with a new IBM XSeries 226 server. It worked fine with the original 80 gig drives. Upon replacing them with 2 new Hitichi 500 gig drives I get DMA timouts at random times while using the on board Intel SATA controller. I put a Promice SATA controller in the machine and everything works great. Has anyone heard of a problem with Intel controllers? Thanks in advance Richard Puga Here is the info from dmesg and atacontrol; kernel: ad3: TIMEOUT - READ_DMA retrying (1 retry left) LBA=0 kernel: ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=324524575 kernel: ad2: TIMEOUT - READ_DMA retrying (1 retry left) LBA=3780487 kernel: ad2: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2651511 and so on atapci1: Intel ICH5 UDMA100 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x14a0-0x14af at device 31.1 on pci0 ad4: 476940MB Hitachi HDT725050VLA360 V56OA73A at ata2-master SATA150 ad6: 476940MB Hitachi HDT725050VLA360 V56OA73A at ata3-master SATA150 atacontrol cap ad4 Protocol Serial ATA II device model Hitachi HDT725050VLA360 serial number VFD400R40E0EHC firmware revision V56OA73A cylinders 16383 heads 16 sectors/track 63 lba supported 268435455 sectors lba48 supported 976773168 sectors dma supported overlap not supported Feature Support EnableValue Vendor write cacheyes yes read ahead yes yes Native Command Queuing (NCQ) yes - 31/0x1F Tagged Command Queuing (TCQ) no no 31/0x1F SMART yes no microcode download yes yes security yes no power management yes yes advanced power management yes no 0/0x00 automatic acoustic management yes no 254/0xFE128/0x80 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Intel ICH5 UDMA100 controller TIMEOUT - READ_DMA
On Fri, May 11, 2007 at 07:20:06AM -1000, Richard Puga wrote: I am working with a new IBM XSeries 226 server. It worked fine with the original 80 gig drives. Upon replacing them with 2 new Hitichi 500 gig drives I get DMA timouts at random times while using the on board Intel SATA controller. I put a Promice SATA controller in the machine and everything works great. There's no mention of what FreeBSD version and kernel build date you're using. uname -a would be very useful here. kernel: ad3: TIMEOUT - READ_DMA retrying (1 retry left) LBA=0 kernel: ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=324524575 kernel: ad2: TIMEOUT - READ_DMA retrying (1 retry left) LBA=3780487 kernel: ad2: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2651511 and so on The interesting part is that the LBAs are all over the place; it's not sequential, which means (in my opinion) the drive itself is fine. atapci1: Intel ICH5 UDMA100 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x14a0-0x14af at device 31.1 on pci0 ad4: 476940MB Hitachi HDT725050VLA360 V56OA73A at ata2-master SATA150 ad6: 476940MB Hitachi HDT725050VLA360 V56OA73A at ata3-master SATA150 Some clarification: These drives are not attached to atapci1. They're attached to a different PCI device. UDMA100 is the ATA/IDE port (read: old PATA), not an SATA port. What you should be pointing to is something that looks like this: atapci0: Intel ICH5 SATA150 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f irq 18 at device 31.2 on pci0 (The above example is from a machine we have sitting around doing heavy I/O work due to MySQL. We have no disk problems there.) Now... I have seen similar behaviour to what you've described on an Intel-based SATA controller (ICH6) with a Western Digital drive that I have personally used and determined to be reliable on Windows and verified as such with WD's testing software under DOS too. I've only seen this happen *once* on the system. That system: FreeBSD eos.sc1.parodius.com 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Mar 8 10:41:09 PST 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/EOS i386 [EMAIL PROTECTED]:31:2: class=0x010180 card=0x628015d9 chip=0x26528086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FR/FRW ICH6R/ICH6RW SATA Controller' class = mass storage subclass = ATA Master: ad0 WDC WD2500KS-00MJB0/02.01C03 Serial ATA II Slave: no device present ad0: timeout waiting to issue command ad0: error issuing WRITE_DMA command ad0: timeout waiting to issue command ad0: error issuing WRITE_DMA command ad0: timeout waiting to issue command ad0: error issuing WRITE_DMA command ad0: timeout waiting to issue command ad0: error issuing WRITE_DMA command ad0: timeout waiting to issue command ad0: error issuing WRITE_DMA command g_vfs_done():ad0s1d[WRITE(offset=16821780480, length=16384)]error = 5 g_vfs_done():ad0s1d[WRITE(offset=16826417152, length=16384)]error = 5 g_vfs_done():ad0s1d[WRITE(offset=813531136, length=16384)]error = 5 g_vfs_done():ad0s1d[WRITE(offset=817922048, length=16384)]error = 5 g_vfs_done():ad0s1d[WRITE(offset=870563840, length=16384)]error = 5 And SMART (smartctl) shows absolutely no signs of any problems with the drive (the Temperature_Celcius in_the_past error is how the drive came from the factory -- I think Western Digital was doing some testing, who knows.) ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 200 200 051Pre-fail Always - 0 3 Spin_Up_Time0x0003 214 214 021Pre-fail Always - 4283 4 Start_Stop_Count0x0032 100 100 000Old_age Always - 9 5 Reallocated_Sector_Ct 0x0033 200 200 140Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 200 200 051Pre-fail Always - 0 9 Power_On_Hours 0x0032 095 095 000Old_age Always - 4145 10 Spin_Retry_Count0x0013 100 253 051Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 253 051Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 8 190 Temperature_Celsius 0x0022 063 042 045Old_age Always In_the_past 37 194 Temperature_Celsius 0x0022 113 092 000Old_age Always - 37 196 Reallocated_Event_Count 0x0032 200 200 000Old_age Always - 0 197 Current_Pending_Sector 0x0012 200 200 000Old_age Always - 0 198 Offline_Uncorrectable 0x0010 200 200 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x003e 200 200 000Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0009 200 200 051Pre-fail Offline - 0 SMART
Re: clock problem
On Fri, 11 May 2007, Tom Evans wrote: On Fri, 2007-05-11 at 08:53 -0600, M. Warner Losh wrote: In message: [EMAIL PROTECTED] Martin Dieringer [EMAIL PROTECTED] writes: : This is NOT a hardware problem. 1. I have this on 2 machines, 2. the : problem is solved by switching to ACPI instead of APM It is a hardware problem. APM + powerd changes the frequency of the TSC. If the TSC is used as the time source, then you'll get bad timekeeping. ACPI uses its own frequency source that is much more stable and independent of the TSC, so switching to it fixes the problem because you are switching the hardware from using a really bad frequency source with ugly steps to using a good frequency source w/o steps. Warner Yes, but Martin already showed it was using the i8254, not TSC; would you expect the same effect using powerd with the i8254 clock? It seems so, unless it's some problem with est and/or p4tcc under APM (canoworms) Runnning APM, at least on my ol' Compaq 1500c 5.5-S running APM - really too ancient to expect ACPI to work properly - on verbose boot states: Calibrating clock(s) ... i8254 clock: 1193216 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter i8254 frequency 1193182 Hz quality 0 Calibrating TSC clock ... TSC clock: 300011839 Hz [..] TSC timecounter disabled: APM enabled. Timecounter TSC frequency 300011839 Hz quality -1000 ie: kern.timecounter.hardware: i8254 kern.timecounter.choice: TSC(-1000) i8254(0) dummy(-100) Surely that would imply that it is a software misconfiguration issue. If the TSC is unreliable under fairly standard duties, and there exists an alternate source that is reliable, surely that indicates the manufacturer has identified a problem, and solved it with alternate hardware. The failure then to use the correct hardware is a software misconfiguration. If one considers disabling ACPI and enabling APM misconfiguration .. which in Martin's case it turned out to be, since his ACPI works, but est0: Enhanced SpeedStep Frequency Control on cpu0 p4tcc0: CPU Frequency Thermal Control on cpu0 apm0: APM BIOS on motherboard apm0: found APM BIOS v1.2, connected at v1.2 together with powerd appeared to heavily retard time on both laptops, beyond ntpd's ability to cope. Cheers, Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: spin lock held too long (w/ backtrace)
On Fri, May 11, 2007 at 10:43:54AM -0600, Scott Swanson wrote: [EMAIL PROTECTED] /usr/src/sys/i386/conf kldstat Id Refs AddressSize Name 13 0xc040 65e308 kernel 21 0xc0a5f000 59f20acpi.ko So, yes then :) Can you follow the steps for debugging modules and see if it gives a better trace? Kris Unfortunately, after prepping and adding the symbol file for acpi.ko, I got the exact same backtrace. Any other thoughts? Sometimes -O2 can confuse gdb...unfortunately there is no way to repair it after the fact. Maybe someone else has ideas. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock problem
In message: [EMAIL PROTECTED] Tom Evans [EMAIL PROTECTED] writes: : On Fri, 2007-05-11 at 08:53 -0600, M. Warner Losh wrote: : In message: [EMAIL PROTECTED] : Martin Dieringer [EMAIL PROTECTED] writes: : : This is NOT a hardware problem. 1. I have this on 2 machines, 2. the : : problem is solved by switching to ACPI instead of APM : : It is a hardware problem. APM + powerd changes the frequency of the : TSC. If the TSC is used as the time source, then you'll get bad : timekeeping. ACPI uses its own frequency source that is much more : stable and independent of the TSC, so switching to it fixes the : problem because you are switching the hardware from using a really bad : frequency source with ugly steps to using a good frequency source w/o : steps. : : Warner : : Surely that would imply that it is a software misconfiguration issue. If : the TSC is unreliable under fairly standard duties, and there exists an : alternate source that is reliable, surely that indicates the : manufacturer has identified a problem, and solved it with alternate : hardware. : : The failure then to use the correct hardware is a software : misconfiguration. TSC is very accurate if you don't have the clock frequency slammed around, which is why its quality is listed as 800 and the i8254 is listed as 0. If you do anything that slams the TSC frequency, then you need to reconfigure the timecounter used. It is hard for the timekeeping part of the software to know if you are on a sane system (TSC-wise) or an insane one. Warner ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: UNIX domain sockets MFC's
On 5/11/07, Robert Watson [EMAIL PROTECTED] wrote: On Tue, 8 May 2007, Robert Watson wrote: Right now I am tracking two known issues with UNIX domain sockets in RELENG_6: - Reported NULL point derference in unp_connect(), which occurs due to the dropping of locks around sonewconn(). This is fixed in HEAD, and I am preparing an MFC of this patch. The fix for this has now been merged as 1.155.2.22. As there have been no new reports of UNIX domain socket problems in the last couple of days, it sounds like the MFC of the last batch of fixes and cleanups has not lead to problems. Robert N M Watson Computer Laboratory University of Cambridge I updated my server which has dual CPU with SMP one hour ago, and I didn't see any speed improvement in the MySQL. Hints? -- Regards, -Abdullah Ibn Hamad Al-Marri Arab Portal http://www.WeArab.Net/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2-R on Dell Poweredge 2950 with Dell PERC 5/i [mfi(4)]
On Thu, May 10, 2007 at 05:14:01PM -0600, Scott Long wrote: ... Not sure that this impression is entirely accurate. The biggest problem with MFI machines is online RAID management. The storage driver itself matured very quickly and has been very reliable. Ah; good to know: thank you. Well, now a colleague is trying to run 6.2-R on one of these 2950s; dmesg says the controller is: mfi0: Dell PERC 5/i mem 0xd80f-0xd80f,0xfc4e-0xfc4f irq 78 at device 14.0 on pci2 ... and the disks looks like: mfid0: MFI Logical Disk on mfi0 mfid0: 418176MB (856424448 sectors) RAID volume '' is optimal Looks A OK to me. Even better. :-) The intended production workload involves creation and deletion of a large number of files rather rapidly. ... sysctl vfs.ffs.doasyncfree=0 might help. Running the syncer more frequently might also help, but I don't recall the sysctl node for that. OK; I've relayed your suggestion to my colleague, but haven't heard back from her yet. ... Very strange. No chance that it was due to files that were deleted but still referenced by open apps? I don't think so. She's deployed 13 other boxen over the last few years with -- naturally! -- different hardware specs, but all running essentailly the same application. The big question for her is whether or not the Dell 2950, as specified, will do the job. ... This sounds purely like a filesystem issue, not an MFI driver issue. Hmmm... I'll admit to knowing little about RAID configurations; is it possible that some RAID configurations might exacerbate problems with such a workload -- or that others might be more amenable to it? Thanks again! Peace, david -- David H. Wolfskill [EMAIL PROTECTED] Believe SORBS at your own risk: 63.193.123.122 has been static since Aug 1999. See http://www.catwhisker.org/~david/publickey.gpg for my public key. pgpprO0rFWQoZ.pgp Description: PGP signature
Re: clock problem
In message: [EMAIL PROTECTED] Ian Smith [EMAIL PROTECTED] writes: : On Fri, 11 May 2007, Tom Evans wrote: : On Fri, 2007-05-11 at 08:53 -0600, M. Warner Losh wrote: :In message: [EMAIL PROTECTED] :Martin Dieringer [EMAIL PROTECTED] writes: :: This is NOT a hardware problem. 1. I have this on 2 machines, 2. the :: problem is solved by switching to ACPI instead of APM : :It is a hardware problem. APM + powerd changes the frequency of the :TSC. If the TSC is used as the time source, then you'll get bad :timekeeping. ACPI uses its own frequency source that is much more :stable and independent of the TSC, so switching to it fixes the :problem because you are switching the hardware from using a really bad :frequency source with ugly steps to using a good frequency source w/o :steps. : :Warner : : Yes, but Martin already showed it was using the i8254, not TSC; would : you expect the same effect using powerd with the i8254 clock? It seems : so, unless it's some problem with est and/or p4tcc under APM (canoworms) No. I would not have expected it at all. I would have expected the i8254 to not be able to provide time much better than a microsecond or two, but I'd expect time to be relatively stable, modulo the normal walking due to thermal variation you'd see given the relatively low quality oscillators that feed it. However, see below. : Surely that would imply that it is a software misconfiguration issue. If : the TSC is unreliable under fairly standard duties, and there exists an : alternate source that is reliable, surely that indicates the : manufacturer has identified a problem, and solved it with alternate : hardware. : : The failure then to use the correct hardware is a software : misconfiguration. : : If one considers disabling ACPI and enabling APM misconfiguration .. : which in Martin's case it turned out to be, since his ACPI works, but : est0: Enhanced SpeedStep Frequency Control on cpu0 : p4tcc0: CPU Frequency Thermal Control on cpu0 : apm0: APM BIOS on motherboard : apm0: found APM BIOS v1.2, connected at v1.2 : together with powerd appeared to heavily retard time on both laptops, : beyond ntpd's ability to cope. The i8254 time counter has a frequency of about 1.19 MHz, but it wraps about 18 times a second (or once every ~55ms). I think that if the clock speed was slow enough, there might be situations where interrupts are disabled long enough to blow past that 55ms mark, especially on a 300MHz laptop that might be running at a very slow clock rate when idle. If it misses the wrap, then you'll see time slip away. Maybe you can experiment with the lower bounds the frequency of the system can run and keep accurate time. debug.cpufreq.verbose=1 might be helpful. You can override the lowest setting of powerd by using the sysctl debug.cpufreq.lowest. Warner ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: UNIX domain sockets MFC's
On Fri, 11 May 2007, Abdullah Ibn Hamad Al-Marri wrote: On 5/11/07, Robert Watson [EMAIL PROTECTED] wrote: On Tue, 8 May 2007, Robert Watson wrote: Right now I am tracking two known issues with UNIX domain sockets in RELENG_6: - Reported NULL point derference in unp_connect(), which occurs due to the dropping of locks around sonewconn(). This is fixed in HEAD, and I am preparing an MFC of this patch. The fix for this has now been merged as 1.155.2.22. As there have been no new reports of UNIX domain socket problems in the last couple of days, it sounds like the MFC of the last batch of fixes and cleanups has not lead to problems. I updated my server which has dual CPU with SMP one hour ago, and I didn't see any speed improvement in the MySQL. The speed improvements associated with these MFC's is minor; primarily they are stability improvements under high load. There are major performance improvements in the 7.x implementation, especially for multi-core systems, but I have no current plans to MFC them, as they interact with other system components and may depend on other changes that also haven't been MFC'd. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2-R on Dell Poweredge 2950 with Dell PERC 5/i [mfi(4)]
David Wolfskill wrote: On Thu, May 10, 2007 at 05:14:01PM -0600, Scott Long wrote: ... Not sure that this impression is entirely accurate. The biggest problem with MFI machines is online RAID management. The storage driver itself matured very quickly and has been very reliable. Ah; good to know: thank you. Well, now a colleague is trying to run 6.2-R on one of these 2950s; dmesg says the controller is: mfi0: Dell PERC 5/i mem 0xd80f-0xd80f,0xfc4e-0xfc4f irq 78 at device 14.0 on pci2 ... and the disks looks like: mfid0: MFI Logical Disk on mfi0 mfid0: 418176MB (856424448 sectors) RAID volume '' is optimal Looks A OK to me. Even better. :-) The intended production workload involves creation and deletion of a large number of files rather rapidly. ... sysctl vfs.ffs.doasyncfree=0 might help. Running the syncer more frequently might also help, but I don't recall the sysctl node for that. OK; I've relayed your suggestion to my colleague, but haven't heard back from her yet. ... Very strange. No chance that it was due to files that were deleted but still referenced by open apps? I don't think so. She's deployed 13 other boxen over the last few years with -- naturally! -- different hardware specs, but all running essentailly the same application. The big question for her is whether or not the Dell 2950, as specified, will do the job. ... This sounds purely like a filesystem issue, not an MFI driver issue. Hmmm... I'll admit to knowing little about RAID configurations; is it possible that some RAID configurations might exacerbate problems with such a workload -- or that others might be more amenable to it? If anything, a fast RAID controller will help reduce the lag that you get when the syncer does its periodic run. But beyond that, I can't think of anything that would cause problems. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock problem
On 2007-May-11 12:11:29 +0200, Oliver Fromme [EMAIL PROTECTED] wrote: Of course, the best solution is to buy a GPS or DCF radio receiver and set up a startum-1 yourself. One of our customers has 6 GPS-locked NTP servers. Only problem is that two of them are reporting a time that is exactly one second different to the other four. You shouldn't rely solely on your GPS or DCF receiver - use it as the primary source but have some secondary sources for sanity checks. (From experience, I can state that ntpd does not behave well when presented with two stratum 1 servers that differ by 1 second). -- Peter Jeremy pgpNDrmLVcVm1.pgp Description: PGP signature
Hard Hang, nothing in logs / no panics
Good Day everyone. I have this one system setup with If_bridge to filter traffic. It does work quite good. I am running FreeBSD 6.2 but as a TINYBSD Image. The one problem I have is I place the machine at the perimeter on our network with 27 seats. At that time anywhere between 15min - 24hours the entire system goes into a Hard Lock (Physical reboot needed). The thing is there is no logs or kernel panics or anything. No IRQ Conflicts exists. I am looking for any inputs or any ways to go after looking how to even diagnosed this. Here is a copy of my dmesg. Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE #0: Wed May 9 17:48:30 UTC 2007 root@:/usr/obj/usr/src/sys/TINYBSD WARNING: MPSAFE network stack disabled, expect reduced performance. ACPI APIC Table: IntelR AWRDACPI Timecounter i8254 frequency 1193520 Hz quality 0 CPU: Intel(R) Celeron(R) CPU 2.80GHz (2792.85-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf34 Stepping = 4 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x441dSSE3,RSVD2,MON,DS_CPL,CNTX-ID,b14 real memory = 535691264 (510 MB) avail memory = 51964 (494 MB) ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 24-47 on motherboard acpi0: IntelR AWRDACPI on motherboard acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x408-0x40b on acpi0 cpu0: ACPI CPU on acpi0 acpi_button0: Power Button on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge at device 3.0 on pci0 pci1: ACPI PCI bus on pcib1 em0: Intel(R) PRO/1000 Network Connection Version - 6.3.9 port 0xc000-0xc01f mem 0xf200-0xf201 irq 18 at device 1.0 on pci1 em0: Ethernet address: 00:30:48:86:97:62 em0: [GIANT-LOCKED] pcib2: ACPI PCI-PCI bridge at device 28.0 on pci0 pci2: ACPI PCI bus on pcib2 pci0: serial bus, USB at device 29.0 (no driver attached) pci0: serial bus, USB at device 29.1 (no driver attached) pci0: base peripheral at device 29.4 (no driver attached) pci0: base peripheral, interrupt controller at device 29.5 (no driver attached) pci0: serial bus, USB at device 29.7 (no driver attached) pcib3: ACPI PCI-PCI bridge at device 30.0 on pci0 pci3: ACPI PCI bus on pcib3 pci3: display, VGA at device 9.0 (no driver attached) em1: Intel(R) PRO/1000 Network Connection Version - 6.3.9 port 0xd100-0xd13f mem 0xf100-0xf101 irq 19 at device 10.0 on pci3 em1: Ethernet address: 00:30:48:86:97:63 em1: [GIANT-LOCKED] isab0: PCI-ISA bridge at device 31.0 on pci0 isa0: ISA bus on isab0 atapci0: Intel 6300ESB SATA150 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f irq 18 at device 31.2 on pci0 ata0: ATA channel 0 on atapci0 ata1: ATA channel 1 on atapci0 pci0: serial bus, SMBus at device 31.3 (no driver attached) acpi_tz0: Thermal Zone on acpi0 sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0 atkbd0: AT Keyboard irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] pmtimer0 on isa0 orm0: ISA Option ROM at iomem 0xc-0xc7fff on isa0 sc0: System console at flags 0x100 on isa0 sc0: VGA 16 virtual consoles, flags=0x300 vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 Timecounter TSC frequency 2792849472 Hz quality 800 Timecounters tick every 1.000 msec IP Filter: v4.1.13 initialized. Default = pass all, Logging = enabled ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to accept, logging limited to 100 packets/entry by default ad0: 977MB SanDisk SDCFH-1024 HDX 3.19 at ata0-master PIO4 ad2: 76319MB Seagate ST380811AS 3.AAB at ata1-master SATA150 Trying to mount root from ufs:/dev/ad0s1a em1: link state changed to DOWN em0: link state changed to DOWN bridge0: Ethernet address: 46:e0:af:c9:e6:b7 em0: promiscuous mode enabled em1: promiscuous mode enabled em0: link state changed to UP em1: link state changed to UP em1: link state changed to DOWN em1: link state changed to UP Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...1 0 0 done All buffers synced. Uptime: 47m5s Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD
Re: UNIX domain sockets MFC's
On 5/11/07, Robert Watson [EMAIL PROTECTED] wrote: On Fri, 11 May 2007, Abdullah Ibn Hamad Al-Marri wrote: On 5/11/07, Robert Watson [EMAIL PROTECTED] wrote: On Tue, 8 May 2007, Robert Watson wrote: Right now I am tracking two known issues with UNIX domain sockets in RELENG_6: - Reported NULL point derference in unp_connect(), which occurs due to the dropping of locks around sonewconn(). This is fixed in HEAD, and I am preparing an MFC of this patch. The fix for this has now been merged as 1.155.2.22. As there have been no new reports of UNIX domain socket problems in the last couple of days, it sounds like the MFC of the last batch of fixes and cleanups has not lead to problems. I updated my server which has dual CPU with SMP one hour ago, and I didn't see any speed improvement in the MySQL. The speed improvements associated with these MFC's is minor; primarily they are stability improvements under high load. There are major performance improvements in the 7.x implementation, especially for multi-core systems, but I have no current plans to MFC them, as they interact with other system components and may depend on other changes that also haven't been MFC'd. Robert N M Watson Computer Laboratory University of Cambridge Robert, Thanks for clearing this up. How safe is using 7.0 for MySQL server now? -- Regards, -Abdullah Ibn Hamad Al-Marri Arab Portal http://www.WeArab.Net/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard Hang, nothing in logs / no panics
On Fri, May 11, 2007 at 02:42:51PM -0500, Roger Miranda wrote: Good Day everyone. I have this one system setup with If_bridge to filter traffic. It does work quite good. I am running FreeBSD 6.2 but as a TINYBSD Image. The one problem I have is I place the machine at the perimeter on our network with 27 seats. At that time anywhere between 15min - 24hours the entire system goes into a Hard Lock (Physical reboot needed). The thing is there is no logs or kernel panics or anything. No IRQ Conflicts exists. I am looking for any inputs or any ways to go after looking how to even diagnosed this. See the developers handbook chapter on kernel debugging. Kris (I really need to make a keyboard shortcut for typing this phrase) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard Hang, nothing in logs / no panics
On Friday 11 May 2007 16:00, Kris Kennaway wrote: See the developers handbook chapter on kernel debugging. Kris, I have gone through the kernel debugging sections of the developers handbook. The one problem is when I get a hard hang, I do not get any error or panics. And there is no crash or dump data on reboot. Yes. I have enabled DDB and KDB and a dumpdir (in /etc/rc.conf) Am I missing something in the kernel debugging section? I see 11.9 Debugging Deadlocks talk about Deadlocks. But at the time of the lock I have no way of doing a ps or really anything as the system is locked up solid. Roger ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock problem
:One of our customers has 6 GPS-locked NTP servers. Only problem is :that two of them are reporting a time that is exactly one second :different to the other four. You shouldn't rely solely on your :GPS or DCF receiver - use it as the primary source but have some :secondary sources for sanity checks. (From experience, I can state :that ntpd does not behave well when presented with two stratum 1 :servers that differ by 1 second). : :--=20 :Peter Jeremy Ntp will also become really unhappy when chunky time slips occur or if the skew rate is more then a few hundred ppm. Ntp will also blow up if it loses the network link for a long period of time. It will just give up and stop making corrections entirely, even after the link is restored. This is particularly true when it is used over a dialup (me having done that for over a year in 1997, so I can tell you how badly it works). A slow time slip over a day could still be chunky, which would imply lost interrupts. Determining whether the problem is due to an 8254 rollover or lost hardclock interrupts is easy... just set 'hz' to something really high, like 2, and see if your time goes crazy. If it does, then you have your culprit. I don't know if those bugs are still present in FreeBSD, but I do remember that I had to redo all the timekeeping in DragonFly because lost interrupts from high 'hz' settings were causing timekeeping to go nuts. That turned out to mainly be due to the same 8254 timer being used to generate the hardclock interrupt AND handle time keeping. i.e. at high hz settings one was not getting the full 1/18 second benefit from the timer. You just can't do that... it doesn't work. It is almost 100% guarenteed to result in a bad time base. It is easy to test.. just set your kern.hz in the boot env, reboot, and see if things blow up or not. Time keeping should be stable regardless of what hz is set to (provisio: never set hz less then 100). Unfortunately, all the timebases in the system have their own quirks. Blame the hardware manufacturers. The 8254 timer 0 is actually the MOST consistent of the lot, with the ACPI timer coming a close second. TSC Haha. Good luck. Nice wide timer, easy to read, but any power savings mode, including the failsafe modes that intel has when a cpu overheats, will probably blow it up. Because of that it is not really a good idea to use it as a timebase. I shake my fist at Intel! $#%$#%$#% ACPI timer Despite the hardware bugs this almost always works as a timebase, but sometimes the frequency changes when the cpu goes into power savings mode or EST, and sometimes the frequency is something other then what it is supposed to be. 8254 timer 0Almost always works as a timebase, but only if not also used to generate high-speed interrupts (because interrupts are lost easily). Set it to a full cycle (1/18 second) and you will be fine. Set it to anything else and you will lose interrupts. The BIOS will sometimes mess with timer 0, but not as often as it messes with timer 2. 8254 timer 1Sometimes works as a time base, but can lock older machines up. Can even lock up newer machines. Why? Because hardware manufacturers are idiots. 8254 timer 2Often can be used as a time base, but video bios calls often try to use it too. [EMAIL PROTECTED] bios makers! Still, this is better then losing interrupts when timer 0 is set to high speed so DragonFly uses timer 2 for its timebase as a default until the ACPI timer becomes available, with a boot option to use timer 1 instead. Using timer 2 as a time base means you don't get motherboard speaker sound (the old beep beep BEEP!). Do I care? No. LAPIC timer Dunno. Probably best to use it as a high speed clock interrupt which would free 8254 timer 0 to use as a time base. RTC interrupt Basically unusable. Stable, but doesn't have sufficient resolution to be helpful and takes forever to read. -Matt Matthew Dillon [EMAIL PROTECTED]
Re: Hard Hang, nothing in logs / no panics
On Fri, May 11, 2007 at 04:21:25PM -0500, Roger Miranda wrote: On Friday 11 May 2007 16:00, Kris Kennaway wrote: See the developers handbook chapter on kernel debugging. Kris, I have gone through the kernel debugging sections of the developers handbook. The one problem is when I get a hard hang, I do not get any error or panics. And there is no crash or dump data on reboot. Yes. I have enabled DDB and KDB and a dumpdir (in /etc/rc.conf) Am I missing something in the kernel debugging section? I see 11.9 Debugging Deadlocks talk about Deadlocks. But at the time of the lock I have no way of doing a ps or really anything as the system is locked up solid. You missed that the debugger is there to debug bugs (including deadlocks). Break to the debugger and obtain the necessary debugging :) Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard Hang, nothing in logs / no panics
You missed that the debugger is there to debug bugs (including deadlocks). Break to the debugger and obtain the necessary debugging I should've been more clear. I can not break to the debugger (CTRL-ALT-ESC) when the system locks up. Am I possible looking at a hardware issue? If so what is the best way to test for it? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard Hang, nothing in logs / no panics
On Fri, May 11, 2007 at 04:43:06PM -0500, Roger Miranda wrote: You missed that the debugger is there to debug bugs (including deadlocks). Break to the debugger and obtain the necessary debugging I should've been more clear. I can not break to the debugger (CTRL-ALT-ESC) when the system locks up. You may need the KDB_STOP_NMI option, especially if it is an SMP system. I forget if you also need to enable a sysctl on 6.x (look at sysctl -a | grep nmi for the obvious one) Am I possible looking at a hardware issue? If so what is the best way to test for it? If that doesn't work then in my experience it is likely to be hardware-related. The usual debugging procedure then involves trying to replicate on an unrelated machine, and/or swapping out hardware components. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: clock problem
Another idea to help track down timebase problems. Port dntpd to FreeBSD. You need like three sysctls (because the ntp API and the original sysctl API are both insufficient). Alternatively you could probably hack dntpd to run in debug mode without having to implement any new sysctls, as long as you be sure to clean out any active kernel timebase adjustments in the kernel before you run it. Here's some sample output: http://apollo.backplane.com/DFlyMisc/dntpd.sample01.txt Dntpd in debug mode will print out the results from two staggered continuously running linear regressions (resets after 30 samples, staggered by 15 samples). For anyone who understands how linear regressions work, finding kernel timekeeping bugs is really easy with this sort of output. You get the slope, y-intercept, correlation, and standard deviation, and then you get calculated frequency drift and time offset based on those numbers. The correlation is accurate after around 10 samples. Note that frequency drift calculations require longer intervals to get better results. The forced 30 second interval set in the sample output is way too short, hence the errors (it has to be in 90th percentile to even have a chance of producing a reasonable PPM calculation). But also remember we are talking parts per million here. If you throw away iteration numbers 15 or so you will get very nice output and kernel bugs will show up in fairly short order. Kernel bugs will show up as non-trivial y-intercept calculations over multiple samples, large jumps in the offset, inability to get a good correlation (provisio: sample interval has to be at least 120 seconds, not the 30 in my example), and so on and so forth. Also be sure to use a locked ntp source, otherwise running corrections on the source will show up as problems in the debug output. ntp.pool.org is usually good enough. It's fun checking various time sources with an idle box with a good timebase. hhahahhaha. OMG. -Matt Matthew Dillon [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard Hang, nothing in logs / no panics
On Fri, May 11, 2007 at 05:47:35PM -0400, Kris Kennaway wrote: On Fri, May 11, 2007 at 04:43:06PM -0500, Roger Miranda wrote: You missed that the debugger is there to debug bugs (including deadlocks). Break to the debugger and obtain the necessary debugging I should've been more clear. I can not break to the debugger (CTRL-ALT-ESC) when the system locks up. You may need the KDB_STOP_NMI option, especially if it is an SMP He can also try a serial console, if he can scare up something to use as a serial console. Serial ports are becoming legacy but if he can do it, it might help him. - Diane -- - [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.db.net/~db ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard Hang, nothing in logs / no panics
On Saturday 12 May 2007 07:34, Diane Bruce wrote: You may need the KDB_STOP_NMI option, especially if it is an SMP He can also try a serial console, if he can scare up something to use as a serial console. Serial ports are becoming legacy but if he can do it, it might help him. Or Firewire, you can do that with an uncooperative system - unless the PCI bus is hung (which would be useful information in an of itself) Might be worth trying a BIOS update too. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C pgpPaEg1OsOeo.pgp Description: PGP signature
Re: UNIX domain sockets MFC's
On Fri, May 11, 2007 at 11:26:37PM +0300, Abdullah Ibn Hamad Al-Marri wrote: How safe is using 7.0 for MySQL server now? -CURRENT is never recommended for mission-critical applications. mcl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]