date:20070511

Re: [releng_5 tinderbox] failure on sparc64/sparc64

2007-05-11 Thread Matthew Jacob


Sorry- my bad. I'll fix shortly.

On 5/10/07, FreeBSD Tinderbox [EMAIL PROTECTED] wrote:

TB --- 2007-05-10 14:42:44 - tinderbox 2.3 running on freebsd-stable.sentex.ca
TB --- 2007-05-10 14:42:44 - starting RELENG_5 tinderbox run for sparc64/sparc64
TB --- 2007-05-10 14:42:44 - cleaning the object tree
TB --- 2007-05-10 14:43:10 - checking out the source tree
TB --- 2007-05-10 14:43:10 - cd /tinderbox/RELENG_5/sparc64/sparc64
TB --- 2007-05-10 14:43:10 - /usr/bin/cvs -f -R -q -d/home/ncvs update -Pd 
-rRELENG_5 src
TB --- 2007-05-10 14:56:20 - building world (CFLAGS=-O -pipe)
TB --- 2007-05-10 14:56:20 - cd /src
TB --- 2007-05-10 14:56:20 - /usr/bin/make -B buildworld
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 stage 4.3: make dependencies
 stage 4.4: building everything
TB --- 2007-05-10 15:40:18 - generating LINT kernel config
TB --- 2007-05-10 15:40:18 - cd /src/sys/sparc64/conf
TB --- 2007-05-10 15:40:18 - /usr/bin/make -B LINT
TB --- 2007-05-10 15:40:18 - building LINT kernel (COPTFLAGS=-O -pipe)
TB --- 2007-05-10 15:40:18 - cd /src
TB --- 2007-05-10 15:40:18 - /usr/bin/make buildkernel KERNCONF=LINT
 Kernel build for LINT started on Thu May 10 15:40:18 UTC 2007
 stage 1: configuring the kernel
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3.1: making dependencies
 stage 3.2: building everything
[...]
cc -c -O -pipe  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -fformat-extensions 
-std=c99  -nostdinc -I-  -I. -I/src/sys -I/src/sys/contrib/dev/acpica 
-I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf 
-I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd 
-I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h 
-fno-common -finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float 
-ffreestanding -Werror  /src/sys/dev/isp/isp.c
cc -c -O -pipe  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -fformat-extensions 
-std=c99  -nostdinc -I-  -I. -I/src/sys -I/src/sys/contrib/dev/acpica 
-I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf 
-I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd 
-I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h 
-fno-common -finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float 
-ffreestanding -Werror  /src/sys/dev/isp/isp_freebsd.c
cc -c -O -pipe  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -fformat-extensions 
-std=c99  -nostdinc -I-  -I. -I/src/sys -I/src/sys/contrib/dev/acpica 
-I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf 
-I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd 
-I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h 
-fno-common -finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float 
-ffreestanding -Werror  /src/sys/dev/isp/isp_library.c
cc -c -O -pipe  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -fformat-extensions 
-std=c99  -nostdinc -I-  -I. -I/src/sys -I/src/sys/contrib/dev/acpica 
-I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf 
-I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd 
-I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h 
-fno-common -finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float 
-ffreestanding -Werror  /src/sys/dev/isp/isp_target.c
cc -c -O -pipe  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -fformat-extensions 
-std=c99  -nostdinc -I-  -I. -I/src/sys -I/src/sys/contrib/dev/acpica 
-I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf 
-I/src/sys/contrib/dev/ath -I/src/sys/contrib/dev/ath/freebsd 
-I/src/sys/contrib/ngatm -I/src/sys/dev/twa -D_KERNEL -include opt_global.h 
-fno-common -finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float 
-ffreestanding -Werror  /src/sys/dev/isp/isp_pci.c
cc -c -O -pipe  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual

Re: clock problem

2007-05-11 Thread Oliver Fromme

M. Warner Losh wrote:
  Peter Jeremy wrote:
  : There seems to be a bug in ntpd where the PLL can saturate at
  : +/-500ppm and will not recover.  This problem seems too occur mostly
  : where the reference servers have lots of jitter (ie a fairly congested
  : link to them).
  
  Yes.  This is a rather interesting misfeature of ntpd.  Its rails are
  at +/- 500ppm, and when it hits the rail it assumes that things are
  too bad to continue and it stops.

I think it is related to the maximum slew rate of 1/2000,
which is equivalent to 500 ppm.  The ntpd(8) manpage says:

Since the slew rate of typical Unix kernels is limited to
0.5 ms/s, each second of adjustment requires an amortization
interval of 2000 s.

And a bit further down:

The maximum slew rate possible is limited to 500 parts-per-
million (PPM) as a consequence of the correctness principles
on which the NTP protocol and algorithm design are based.
As a result, the local clock can take a long time to converge
to an acceptable offset, about 2,000 s for each second the
clock is outside the acceptable range.

  Most PC clocks have a frequency error on the order of 10-150ppm, so it
  doesn't take a whole lot of jitter from a conjectsted remote network
  to exceed the limits...

I think the burst and iburst options for the server lines
in ntp.conf might help in such cases.

Of course, the best solution is to buy a GPS or DCF radio
receiver and set up a startum-1 yourself.  But last time
I tried to do that with a cheap DCF plug, it wasn't very
well supported on FreeBSD.  Even an expensive Mainberg
receiver ( http://www.meinberg.de/english/ ) with an RS232
output worked much more accurately with a Solaris machine
than with FreeBSD.  (Unfortunately, the Mainberg model
availbale to us did not have NTP support via ethernet
itself, only serial output.)  I have to admit that that
was in FreeBSD 4.x days.  The situation might have
improved in the meantime (I don't know).

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

C++ is to C as Lung Cancer is to Lung.
-- Thomas Funke
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

freebsd and securelevel question

2007-05-11 Thread Gót András

Hi,

So. The simple question is: Why FreeBSD has securelevel 0 if init sets it
to 1, if it sees at boot that the level is 0? :) It's OK that it's in the
manual, but there are two default ways to set securelevel at boot time
also. I don't really get the point of this forced 0 to 1 changing.

We'd like to use our machines with securelevel 0 by default, so I had
comment out the relevant two lines from init.c.

Regards,
Andras

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: freebsd and securelevel question

2007-05-11 Thread Thomas Hurst

* G?t Andr?s ([EMAIL PROTECTED]) wrote:

 So. The simple question is: Why FreeBSD has securelevel 0 if init sets
 it to 1, if it sees at boot that the level is 0? :)

So when you boot to single user mode you can turn off immutable/append
only flags etc, without letting those capabilities propagate into
multiuser mode?

 We'd like to use our machines with securelevel 0 by default, so I had
 comment out the relevant two lines from init.c.

init(8):
  -1Permanently insecure mode - always run the system in level 0 mode.
  This is the default initial value.

-- 
Thomas 'Freaky' Hurst
http://hur.st/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: freebsd and securelevel question

2007-05-11 Thread Oliver Fromme

Gót András [EMAIL PROTECTED] wrote:
  So. The simple question is: Why FreeBSD has securelevel 0 if init sets it
  to 1, if it sees at boot that the level is 0? :) It's OK that it's in the
  manual, but there are two default ways to set securelevel at boot time
  also. I don't really get the point of this forced 0 to 1 changing.

The reason is so that /etc/rc and all of the related
startup scripts can run at level 0, which might be
necessary for various reasons, and afterwards the
level is autmatically increased to 1.

If you don't want that, you should leave the level
at the default of -1.

  We'd like to use our machines with securelevel 0 by default, so I had
  comment out the relevant two lines from init.c.

Uhm, could you please explain why you want to do that?
It doesn't make sense.

Note that level -1 behaves exactly the same as level 0
(i.e. no restrictions at all), the only difference is
that -1 prevents the automatic increase to level 1 when
the system goes multi-user.

So, if you want to run permanently without restrictions,
then you should leave the secure level at the default
value of -1.

It's all explained in the init(8) manual page.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

Documentation is like sex; when it's good, it's very, very good,
and when it's bad, it's better than nothing.
-- Dick Brandon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: UNIX domain sockets MFC's

2007-05-11 Thread Robert Watson


On Tue, 8 May 2007, Robert Watson wrote:

Right now I am tracking two known issues with UNIX domain sockets in 
RELENG_6:


- Reported NULL point derference in unp_connect(), which occurs due to the
 dropping of locks around sonewconn().  This is fixed in HEAD, and I am
 preparing an MFC of this patch.


The fix for this has now been merged as 1.155.2.22.  As there have been no new 
reports of UNIX domain socket problems in the last couple of days, it sounds 
like the MFC of the last batch of fixes and cleanups has not lead to problems.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock problem

2007-05-11 Thread Martin Dieringer


On Thu, 10 May 2007, M. Warner Losh wrote:


In message: [EMAIL PROTECTED]
   Martin Dieringer [EMAIL PROTECTED] writes:
: well now it works without restrict:
: # ntpq -p
:   remote   refid  st t when poll reach   delay offset  jitter
: ==
: *time192.53.103.108   2 u   19   64   77   91.454 301.926 860.104
:
:
: and the clock is only 3 seconds late now...

only 300ms late, or .3s you mean.



Well it says so, but I meant 3 seconds, compared by eye to a radio
clock.


This is NOT a hardware problem. 1. I have this on 2 machines, 2. the
problem is solved by switching to ACPI instead of APM

m.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

[releng_6 tinderbox] failure on sparc64/sparc64

2007-05-11 Thread FreeBSD Tinderbox

TB --- 2007-05-11 13:24:38 - tinderbox 2.3 running on freebsd-stable.sentex.ca
TB --- 2007-05-11 13:24:38 - starting RELENG_6 tinderbox run for sparc64/sparc64
TB --- 2007-05-11 13:24:38 - cleaning the object tree
TB --- 2007-05-11 13:25:06 - checking out the source tree
TB --- 2007-05-11 13:25:06 - cd /tinderbox/RELENG_6/sparc64/sparc64
TB --- 2007-05-11 13:25:06 - /usr/bin/cvs -f -R -q -d/home/ncvs update -Pd 
-rRELENG_6 src
TB --- 2007-05-11 13:35:38 - building world (CFLAGS=-O2 -pipe)
TB --- 2007-05-11 13:35:38 - cd /src
TB --- 2007-05-11 13:35:38 - /usr/bin/make -B buildworld
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 stage 4.3: make dependencies
 stage 4.4: building everything
TB --- 2007-05-11 14:40:11 - generating LINT kernel config
TB --- 2007-05-11 14:40:11 - cd /src/sys/sparc64/conf
TB --- 2007-05-11 14:40:11 - /usr/bin/make -B LINT
TB --- 2007-05-11 14:40:12 - building LINT kernel (COPTFLAGS=-O2 -pipe)
TB --- 2007-05-11 14:40:12 - cd /src
TB --- 2007-05-11 14:40:12 - /usr/bin/make buildkernel KERNCONF=LINT
 Kernel build for LINT started on Fri May 11 14:40:12 UTC 2007
 stage 1: configuring the kernel
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3.1: making dependencies
[...]
awk -f /src/sys/tools/makeobjops.awk /src/sys/kern/linker_if.m -h
awk -f /src/sys/tools/makeobjops.awk /src/sys/libkern/iconv_converter_if.m -h
awk -f /src/sys/tools/makeobjops.awk /src/sys/dev/ofw/ofw_bus_if.m -h
awk -f /src/sys/tools/makeobjops.awk /src/sys/sparc64/pci/ofw_pci_if.m -h
rm -f .newdep
/usr/bin/make -V CFILES -V SYSTEM_CFILES -V GEN_CFILES |  MKDEP_CPP=cc -E 
CC=cc xargs mkdep -a -f .newdep -O2 -pipe -fno-strict-aliasing  -Wall 
-Wredundant-decls -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes 
-Wpointer-arith -Winline -Wcast-qual  -fformat-extensions -std=c99  -nostdinc 
-I-  -I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter 
-I/src/sys/contrib/pf -I/src/sys/dev/ath -I/src/sys/contrib/ngatm 
-I/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h 
-fno-common -finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mcmodel=medlow -msoft-float 
-ffreestanding
/src/sys/dev/isp/isp_sbus.c:487:67: macro isp_dma_tag_create requires 12 
arguments, but only 1 given
mkdep: compile failed
*** Error code 1

Stop in /obj/sparc64/src/sys/LINT.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2007-05-11 14:41:34 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2007-05-11 14:41:34 - ERROR: failed to build lint kernel
TB --- 2007-05-11 14:41:34 - tinderbox aborted
TB --- 0.95 user 2.65 system 4616.05 real


http://tinderbox.des.no/tinderbox-releng_6-RELENG_6-sparc64-sparc64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock too slow - big time offset with ntpdate

2007-05-11 Thread Martin Dieringer


On Wed, 9 May 2007, Ian Smith wrote:


Bottom line might be: if it hurts when you run powerd with APM,
don't.

If you want powerd to work, I'd suggest trying ACPI again




ok, using ACPI solved the clock problem, the suspend problem
has to be solved later


m.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: UNIX domain sockets MFC's

2007-05-11 Thread Marc G. Fournier

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Friday, May 11, 2007 12:49:32 +0100 Robert Watson [EMAIL PROTECTED] 
wrote:

 On Tue, 8 May 2007, Robert Watson wrote:

 Right now I am tracking two known issues with UNIX domain sockets in
 RELENG_6:

 - Reported NULL point derference in unp_connect(), which occurs due to the
  dropping of locks around sonewconn().  This is fixed in HEAD, and I am
  preparing an MFC of this patch.

 The fix for this has now been merged as 1.155.2.22.  As there have been no
 new reports of UNIX domain socket problems in the last couple of days, it
 sounds like the MFC of the last batch of fixes and cleanups has not lead to
 problems.

I will work on upgrading that system right now to the latest -STABLE and let y 
ou know ... did you happen to receive my email concerning that java process in 
a soclose state?

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGRISl4QvfyHIvDvMRAhNVAJ94AKDAhNQIk3Kkq3PRbiru0a+T2QCfWglT
kwaljA9wg70RKzqcyOwDz3U=
=FuMA
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock problem

2007-05-11 Thread M. Warner Losh

In message: [EMAIL PROTECTED]
Martin Dieringer [EMAIL PROTECTED] writes:
: This is NOT a hardware problem. 1. I have this on 2 machines, 2. the
: problem is solved by switching to ACPI instead of APM

It is a hardware problem.  APM + powerd changes the frequency of the
TSC.  If the TSC is used as the time source, then you'll get bad
timekeeping.  ACPI uses its own frequency source that is much more
stable and independent of the TSC, so switching to it fixes the
problem because you are switching the hardware from using a really bad
frequency source with ugly steps to using a good frequency source w/o
steps.

Warner
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock problem

2007-05-11 Thread M. Warner Losh

In message: [EMAIL PROTECTED]
Oliver Fromme [EMAIL PROTECTED] writes:
: M. Warner Losh wrote:
:   Peter Jeremy wrote:
:   : There seems to be a bug in ntpd where the PLL can saturate at
:   : +/-500ppm and will not recover.  This problem seems too occur mostly
:   : where the reference servers have lots of jitter (ie a fairly congested
:   : link to them).
:   
:   Yes.  This is a rather interesting misfeature of ntpd.  Its rails are
:   at +/- 500ppm, and when it hits the rail it assumes that things are
:   too bad to continue and it stops.
: 
: I think it is related to the maximum slew rate of 1/2000,
: which is equivalent to 500 ppm.  The ntpd(8) manpage says:
: 
: Since the slew rate of typical Unix kernels is limited to
: 0.5 ms/s, each second of adjustment requires an amortization
: interval of 2000 s.
: 
: And a bit further down:
: 
: The maximum slew rate possible is limited to 500 parts-per-
: million (PPM) as a consequence of the correctness principles
: on which the NTP protocol and algorithm design are based.
: As a result, the local clock can take a long time to converge
: to an acceptable offset, about 2,000 s for each second the
: clock is outside the acceptable range.

I think you are confusing two things here.  One is the maximum
frequency error of the system clock that ntpd can tolerate.  The other
is the maximum slew rate of the system clock.

The actual error in nominal frequency of the system clock is what is
recorded.  When ntpd slams the system clock to do its 2ms/s
adjustment, it still records the actual error.  Since the original
drift file was 500.000, this indicates a very bad clock.

:   Most PC clocks have a frequency error on the order of 10-150ppm, so it
:   doesn't take a whole lot of jitter from a conjectsted remote network
:   to exceed the limits...
: 
: I think the burst and iburst options for the server lines
: in ntp.conf might help in such cases.

It might.

: Of course, the best solution is to buy a GPS or DCF radio
: receiver and set up a startum-1 yourself.  But last time
: I tried to do that with a cheap DCF plug, it wasn't very
: well supported on FreeBSD.  Even an expensive Mainberg
: receiver ( http://www.meinberg.de/english/ ) with an RS232
: output worked much more accurately with a Solaris machine
: than with FreeBSD.  (Unfortunately, the Mainberg model
: availbale to us did not have NTP support via ethernet
: itself, only serial output.)  I have to admit that that
: was in FreeBSD 4.x days.  The situation might have
: improved in the meantime (I don't know).

My company has used FreeBSD's ntpd since 3.x with a small, custom
driver that I wrote.  It turns out to work very well in practice.  I'd
suggest that it is well supported, even in FreeBSD 4.x.  It isn't well
documented.

As for working better on Solaris, I've not done measurements there.  I
do know that our custom clock drivers typically stay less than a
microsecond of the reference clock when the unit has good temperature
stability, and three or five microseconds in a durunal swing when the
units aren't well thermally regulated.

Of course, we are using what is effecitvely a GPS disciplined Rubidium
(Rb) oscillator's PPS as our time base.  We haven't gone the extra
step of using the Rb's 10MHz to synthesize frequencies for the
motherboard, since we don't need system time to be that stable (it
gives two or three more orders of magnitude of stability).  We do use
the kernel FLL, and our custom driver provides the absolute phase.

We get slightly better results in 6.x than we did in 4.x because of a
subtle bug in the kernel that phk fixed in the calculation of error
where two terms that were almost the same had the wrong signs and
almost cancelled out...

Warner
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock problem

2007-05-11 Thread Tom Evans

On Fri, 2007-05-11 at 08:53 -0600, M. Warner Losh wrote:
 In message: [EMAIL PROTECTED]
 Martin Dieringer [EMAIL PROTECTED] writes:
 : This is NOT a hardware problem. 1. I have this on 2 machines, 2. the
 : problem is solved by switching to ACPI instead of APM
 
 It is a hardware problem.  APM + powerd changes the frequency of the
 TSC.  If the TSC is used as the time source, then you'll get bad
 timekeeping.  ACPI uses its own frequency source that is much more
 stable and independent of the TSC, so switching to it fixes the
 problem because you are switching the hardware from using a really bad
 frequency source with ugly steps to using a good frequency source w/o
 steps.
 
 Warner

Surely that would imply that it is a software misconfiguration issue. If
the TSC is unreliable under fairly standard duties, and there exists an
alternate source that is reliable, surely that indicates the
manufacturer has identified a problem, and solved it with alternate
hardware.

The failure then to use the correct hardware is a software
misconfiguration.

Cheers

Tom


signature.asc
Description: This is a digitally signed message part

Re: panic: spin lock held too long (w/ backtrace)

2007-05-11 Thread Scott Swanson

 [EMAIL PROTECTED] /usr/src/sys/i386/conf  kldstat
 Id Refs AddressSize Name
  13 0xc040 65e308   kernel
  21 0xc0a5f000 59f20acpi.ko
 
 So, yes then :) Can you follow the steps for debugging modules and see
 if it gives a better trace?
 
 Kris
 

Unfortunately, after prepping and adding the symbol file for acpi.ko, I
got the exact same backtrace.  Any other thoughts?

Regards;
Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Intel ICH5 UDMA100 controller TIMEOUT - READ_DMA

2007-05-11 Thread Richard Puga

I am working with a new IBM XSeries 226 server.

It worked fine with the original 80 gig drives.

Upon replacing them with 2 new Hitichi 500 gig drives I get DMA timouts
at random times while using the on board Intel SATA controller.

I put a Promice SATA controller in the machine and everything works
great.


Has anyone heard of a problem with Intel controllers?


Thanks in advance

Richard Puga

Here is the info from dmesg and atacontrol;



 kernel: ad3: TIMEOUT - READ_DMA retrying (1 retry left) LBA=0
 kernel: ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left)
LBA=324524575
 kernel: ad2: TIMEOUT - READ_DMA retrying (1 retry left) LBA=3780487
 kernel: ad2: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2651511
and so on

atapci1: Intel ICH5 UDMA100 controller port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x14a0-0x14af at device 31.1 on pci0

ad4: 476940MB Hitachi HDT725050VLA360 V56OA73A at ata2-master SATA150
ad6: 476940MB Hitachi HDT725050VLA360 V56OA73A at ata3-master SATA150


 atacontrol cap ad4

Protocol  Serial ATA II
device model  Hitachi HDT725050VLA360
serial number VFD400R40E0EHC
firmware revision V56OA73A
cylinders 16383
heads 16
sectors/track 63
lba supported 268435455 sectors
lba48 supported   976773168 sectors
dma supported
overlap not supported

Feature  Support  EnableValue   Vendor
write cacheyes  yes
read ahead yes  yes
Native Command Queuing (NCQ)   yes   -  31/0x1F
Tagged Command Queuing (TCQ)   no   no  31/0x1F
SMART  yes  no
microcode download yes  yes
security   yes  no
power management   yes  yes
advanced power management  yes  no  0/0x00
automatic acoustic management  yes  no  254/0xFE128/0x80




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Intel ICH5 UDMA100 controller TIMEOUT - READ_DMA

2007-05-11 Thread Jeremy Chadwick

On Fri, May 11, 2007 at 07:20:06AM -1000, Richard Puga wrote:
 I am working with a new IBM XSeries 226 server.
 
 It worked fine with the original 80 gig drives.
 
 Upon replacing them with 2 new Hitichi 500 gig drives I get DMA timouts
 at random times while using the on board Intel SATA controller.
 
 I put a Promice SATA controller in the machine and everything works
 great.

There's no mention of what FreeBSD version and kernel build date
you're using.  uname -a would be very useful here.

  kernel: ad3: TIMEOUT - READ_DMA retrying (1 retry left) LBA=0
  kernel: ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left)
 LBA=324524575
  kernel: ad2: TIMEOUT - READ_DMA retrying (1 retry left) LBA=3780487
  kernel: ad2: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2651511
 and so on

The interesting part is that the LBAs are all over the place; it's
not sequential, which means (in my opinion) the drive itself is fine.

 atapci1: Intel ICH5 UDMA100 controller port
 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x14a0-0x14af at device 31.1 on pci0
 
 ad4: 476940MB Hitachi HDT725050VLA360 V56OA73A at ata2-master SATA150
 ad6: 476940MB Hitachi HDT725050VLA360 V56OA73A at ata3-master SATA150

Some clarification:

These drives are not attached to atapci1.  They're attached to a
different PCI device.  UDMA100 is the ATA/IDE port (read: old PATA), not
an SATA port.  What you should be pointing to is something that looks
like this:

atapci0: Intel ICH5 SATA150 controller port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f irq 18 at device 31.2 on pci0

(The above example is from a machine we have sitting around doing
heavy I/O work due to MySQL.  We have no disk problems there.)

Now...

I have seen similar behaviour to what you've described on an Intel-based
SATA controller (ICH6) with a Western Digital drive that I have
personally used and determined to be reliable on Windows and verified as
such with WD's testing software under DOS too.  I've only seen this
happen *once* on the system.  That system:

FreeBSD eos.sc1.parodius.com 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Mar 8 
10:41:09 PST 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/EOS  i386

[EMAIL PROTECTED]:31:2:  class=0x010180 card=0x628015d9 chip=0x26528086 
rev=0x03 hdr=0x00
vendor = 'Intel Corporation'
device = '82801FR/FRW ICH6R/ICH6RW SATA Controller'
class  = mass storage
subclass   = ATA

Master:  ad0 WDC WD2500KS-00MJB0/02.01C03 Serial ATA II
Slave:   no device present

ad0: timeout waiting to issue command
ad0: error issuing WRITE_DMA command
ad0: timeout waiting to issue command
ad0: error issuing WRITE_DMA command
ad0: timeout waiting to issue command
ad0: error issuing WRITE_DMA command
ad0: timeout waiting to issue command
ad0: error issuing WRITE_DMA command
ad0: timeout waiting to issue command
ad0: error issuing WRITE_DMA command
g_vfs_done():ad0s1d[WRITE(offset=16821780480, length=16384)]error = 5
g_vfs_done():ad0s1d[WRITE(offset=16826417152, length=16384)]error = 5
g_vfs_done():ad0s1d[WRITE(offset=813531136, length=16384)]error = 5
g_vfs_done():ad0s1d[WRITE(offset=817922048, length=16384)]error = 5
g_vfs_done():ad0s1d[WRITE(offset=870563840, length=16384)]error = 5

And SMART (smartctl) shows absolutely no signs of any problems with the
drive (the Temperature_Celcius in_the_past error is how the drive came
from the factory -- I think Western Digital was doing some testing, who
knows.)

ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   200   200   051Pre-fail  Always   
-   0
  3 Spin_Up_Time0x0003   214   214   021Pre-fail  Always   
-   4283
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   9
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000f   200   200   051Pre-fail  Always   
-   0
  9 Power_On_Hours  0x0032   095   095   000Old_age   Always   
-   4145
 10 Spin_Retry_Count0x0013   100   253   051Pre-fail  Always   
-   0
 11 Calibration_Retry_Count 0x0012   100   253   051Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   8
190 Temperature_Celsius 0x0022   063   042   045Old_age   Always   
In_the_past 37
194 Temperature_Celsius 0x0022   113   092   000Old_age   Always   
-   37
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0012   200   200   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0010   200   200   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   Always   
-   0
200 Multi_Zone_Error_Rate   0x0009   200   200   051Pre-fail  Offline  
-   0

SMART

Re: clock problem

2007-05-11 Thread Ian Smith

On Fri, 11 May 2007, Tom Evans wrote:
  On Fri, 2007-05-11 at 08:53 -0600, M. Warner Losh wrote:
   In message: [EMAIL PROTECTED]
   Martin Dieringer [EMAIL PROTECTED] writes:
   : This is NOT a hardware problem. 1. I have this on 2 machines, 2. the
   : problem is solved by switching to ACPI instead of APM
   
   It is a hardware problem.  APM + powerd changes the frequency of the
   TSC.  If the TSC is used as the time source, then you'll get bad
   timekeeping.  ACPI uses its own frequency source that is much more
   stable and independent of the TSC, so switching to it fixes the
   problem because you are switching the hardware from using a really bad
   frequency source with ugly steps to using a good frequency source w/o
   steps.
   
   Warner

Yes, but Martin already showed it was using the i8254, not TSC; would
you expect the same effect using powerd with the i8254 clock?  It seems
so, unless it's some problem with est and/or p4tcc under APM (canoworms)

Runnning APM, at least on my ol' Compaq 1500c 5.5-S running APM - really
too ancient to expect ACPI to work properly - on verbose boot states: 

Calibrating clock(s) ... i8254 clock: 1193216 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter i8254 frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 300011839 Hz
[..]
TSC timecounter disabled: APM enabled. 
Timecounter TSC frequency 300011839 Hz quality -1000

ie:
kern.timecounter.hardware: i8254
kern.timecounter.choice: TSC(-1000) i8254(0) dummy(-100)

  Surely that would imply that it is a software misconfiguration issue. If
  the TSC is unreliable under fairly standard duties, and there exists an
  alternate source that is reliable, surely that indicates the
  manufacturer has identified a problem, and solved it with alternate
  hardware.
  
  The failure then to use the correct hardware is a software
  misconfiguration.

If one considers disabling ACPI and enabling APM misconfiguration ..
which in Martin's case it turned out to be, since his ACPI works, but

 est0: Enhanced SpeedStep Frequency Control on cpu0
 p4tcc0: CPU Frequency Thermal Control on cpu0
 apm0: APM BIOS on motherboard
 apm0: found APM BIOS v1.2, connected at v1.2

together with powerd appeared to heavily retard time on both laptops,
beyond ntpd's ability to cope.

Cheers, Ian

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: panic: spin lock held too long (w/ backtrace)

2007-05-11 Thread Kris Kennaway

On Fri, May 11, 2007 at 10:43:54AM -0600, Scott Swanson wrote:
  [EMAIL PROTECTED] /usr/src/sys/i386/conf  kldstat
  Id Refs AddressSize Name
   13 0xc040 65e308   kernel
   21 0xc0a5f000 59f20acpi.ko
  
  So, yes then :) Can you follow the steps for debugging modules and see
  if it gives a better trace?
  
  Kris
  
 
 Unfortunately, after prepping and adding the symbol file for acpi.ko, I
 got the exact same backtrace.  Any other thoughts?

Sometimes -O2 can confuse gdb...unfortunately there is no way to
repair it after the fact.  Maybe someone else has ideas.

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock problem

2007-05-11 Thread M. Warner Losh

In message: [EMAIL PROTECTED]
Tom Evans [EMAIL PROTECTED] writes:
: On Fri, 2007-05-11 at 08:53 -0600, M. Warner Losh wrote:
:  In message: [EMAIL PROTECTED]
:  Martin Dieringer [EMAIL PROTECTED] writes:
:  : This is NOT a hardware problem. 1. I have this on 2 machines, 2. the
:  : problem is solved by switching to ACPI instead of APM
:  
:  It is a hardware problem.  APM + powerd changes the frequency of the
:  TSC.  If the TSC is used as the time source, then you'll get bad
:  timekeeping.  ACPI uses its own frequency source that is much more
:  stable and independent of the TSC, so switching to it fixes the
:  problem because you are switching the hardware from using a really bad
:  frequency source with ugly steps to using a good frequency source w/o
:  steps.
:  
:  Warner
: 
: Surely that would imply that it is a software misconfiguration issue. If
: the TSC is unreliable under fairly standard duties, and there exists an
: alternate source that is reliable, surely that indicates the
: manufacturer has identified a problem, and solved it with alternate
: hardware.
: 
: The failure then to use the correct hardware is a software
: misconfiguration.

TSC is very accurate if you don't have the clock frequency slammed
around, which is why its quality is listed as 800 and the i8254 is
listed as 0.  If you do anything that slams the TSC frequency, then
you need to reconfigure the timecounter used.

It is hard for the timekeeping part of the software to know if you are
on a sane system (TSC-wise) or an insane one.

Warner
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: UNIX domain sockets MFC's

2007-05-11 Thread Abdullah Ibn Hamad Al-Marri


On 5/11/07, Robert Watson [EMAIL PROTECTED] wrote:

On Tue, 8 May 2007, Robert Watson wrote:

 Right now I am tracking two known issues with UNIX domain sockets in
 RELENG_6:

 - Reported NULL point derference in unp_connect(), which occurs due to the
  dropping of locks around sonewconn().  This is fixed in HEAD, and I am
  preparing an MFC of this patch.

The fix for this has now been merged as 1.155.2.22.  As there have been no new
reports of UNIX domain socket problems in the last couple of days, it sounds
like the MFC of the last batch of fixes and cleanups has not lead to problems.

Robert N M Watson
Computer Laboratory
University of Cambridge


I updated my server which has dual CPU with SMP one hour ago, and I
didn't see any speed improvement in the MySQL.

Hints?

--
Regards,

-Abdullah Ibn Hamad Al-Marri
Arab Portal
http://www.WeArab.Net/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.2-R on Dell Poweredge 2950 with Dell PERC 5/i [mfi(4)]

2007-05-11 Thread David Wolfskill

On Thu, May 10, 2007 at 05:14:01PM -0600, Scott Long wrote:
 ...
 Not sure that this impression is entirely accurate.  The biggest problem
 with MFI machines is online RAID management.  The storage driver itself
 matured very quickly and has been very reliable.

Ah; good to know:  thank you.

 Well, now a colleague is trying to run 6.2-R on one of these 2950s; dmesg
 says the controller is:
 
 mfi0: Dell PERC 5/i mem 0xd80f-0xd80f,0xfc4e-0xfc4f irq 
 78 at device 14.0 on pci2
 ...
 and the disks looks like:
 
 mfid0: MFI Logical Disk on mfi0
 mfid0: 418176MB (856424448 sectors) RAID volume '' is optimal
 
 
 Looks A OK to me.

Even better.  :-)

 The intended production workload involves creation and deletion of
 a large number of files rather rapidly.
 ...
 sysctl vfs.ffs.doasyncfree=0 might help.  Running the syncer more 
 frequently might also help, but I don't recall the sysctl node for
 that.

OK; I've relayed your suggestion to my colleague, but haven't heard back
from her yet.

 ...
 Very strange.  No chance that it was due to files that were deleted but
 still referenced by open apps?

I don't think so.  She's deployed 13 other boxen over the last few years
with -- naturally! -- different hardware specs, but all running
essentailly the same application.

The big question for her is whether or not the Dell 2950, as specified,
will do  the job.

 ...
 This sounds purely like a filesystem issue, not an MFI driver issue.

Hmmm... I'll admit to knowing little about RAID configurations; is it
possible that some RAID configurations might exacerbate problems with
such a workload -- or that others might be more amenable to it?

Thanks again!

Peace,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
Believe SORBS at your own risk: 63.193.123.122 has been static since Aug 1999.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpprO0rFWQoZ.pgp
Description: PGP signature

Re: clock problem

2007-05-11 Thread M. Warner Losh

In message: [EMAIL PROTECTED]
Ian Smith [EMAIL PROTECTED] writes:
: On Fri, 11 May 2007, Tom Evans wrote:
:   On Fri, 2007-05-11 at 08:53 -0600, M. Warner Losh wrote:
:In message: [EMAIL PROTECTED]
:Martin Dieringer [EMAIL PROTECTED] writes:
:: This is NOT a hardware problem. 1. I have this on 2 machines, 2. the
:: problem is solved by switching to ACPI instead of APM
:
:It is a hardware problem.  APM + powerd changes the frequency of the
:TSC.  If the TSC is used as the time source, then you'll get bad
:timekeeping.  ACPI uses its own frequency source that is much more
:stable and independent of the TSC, so switching to it fixes the
:problem because you are switching the hardware from using a really bad
:frequency source with ugly steps to using a good frequency source w/o
:steps.
:
:Warner
: 
: Yes, but Martin already showed it was using the i8254, not TSC; would
: you expect the same effect using powerd with the i8254 clock?  It seems
: so, unless it's some problem with est and/or p4tcc under APM (canoworms)

No.  I would not have expected it at all.  I would have expected the
i8254 to not be able to provide time much better than a microsecond or
two, but I'd expect time to be relatively stable, modulo the normal
walking due to thermal variation you'd see given the relatively low
quality oscillators that feed it.  However, see below.

:   Surely that would imply that it is a software misconfiguration issue. If
:   the TSC is unreliable under fairly standard duties, and there exists an
:   alternate source that is reliable, surely that indicates the
:   manufacturer has identified a problem, and solved it with alternate
:   hardware.
:   
:   The failure then to use the correct hardware is a software
:   misconfiguration.
: 
: If one considers disabling ACPI and enabling APM misconfiguration ..
: which in Martin's case it turned out to be, since his ACPI works, but
:  est0: Enhanced SpeedStep Frequency Control on cpu0
:  p4tcc0: CPU Frequency Thermal Control on cpu0
:  apm0: APM BIOS on motherboard
:  apm0: found APM BIOS v1.2, connected at v1.2
: together with powerd appeared to heavily retard time on both laptops,
: beyond ntpd's ability to cope.

The i8254 time counter has a frequency of about 1.19 MHz, but it wraps
about 18 times a second (or once every ~55ms).  I think that if the
clock speed was slow enough, there might be situations where
interrupts are disabled long enough to blow past that 55ms mark,
especially on a 300MHz laptop that might be running at a very slow
clock rate when idle.  If it misses the wrap, then you'll see time
slip away.

Maybe you can experiment with the lower bounds the frequency of the
system can run and keep accurate time.  debug.cpufreq.verbose=1 might
be helpful.  You can override the lowest setting of powerd by using 
the sysctl debug.cpufreq.lowest.

Warner
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: UNIX domain sockets MFC's

2007-05-11 Thread Robert Watson


On Fri, 11 May 2007, Abdullah Ibn Hamad Al-Marri wrote:


On 5/11/07, Robert Watson [EMAIL PROTECTED] wrote:

On Tue, 8 May 2007, Robert Watson wrote:

 Right now I am tracking two known issues with UNIX domain sockets in
 RELENG_6:

 - Reported NULL point derference in unp_connect(), which occurs due to 
the

  dropping of locks around sonewconn().  This is fixed in HEAD, and I am
  preparing an MFC of this patch.

The fix for this has now been merged as 1.155.2.22.  As there have been no 
new reports of UNIX domain socket problems in the last couple of days, it 
sounds like the MFC of the last batch of fixes and cleanups has not lead to 
problems.


I updated my server which has dual CPU with SMP one hour ago, and I didn't 
see any speed improvement in the MySQL.


The speed improvements associated with these MFC's is minor; primarily they 
are stability improvements under high load.  There are major performance 
improvements in the 7.x implementation, especially for multi-core systems, but 
I have no current plans to MFC them, as they interact with other system 
components and may depend on other changes that also haven't been MFC'd.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.2-R on Dell Poweredge 2950 with Dell PERC 5/i [mfi(4)]

2007-05-11 Thread Scott Long


David Wolfskill wrote:

On Thu, May 10, 2007 at 05:14:01PM -0600, Scott Long wrote:

...
Not sure that this impression is entirely accurate.  The biggest problem
with MFI machines is online RAID management.  The storage driver itself
matured very quickly and has been very reliable.


Ah; good to know:  thank you.


Well, now a colleague is trying to run 6.2-R on one of these 2950s; dmesg
says the controller is:

mfi0: Dell PERC 5/i mem 0xd80f-0xd80f,0xfc4e-0xfc4f irq 
78 at device 14.0 on pci2

...

and the disks looks like:

mfid0: MFI Logical Disk on mfi0
mfid0: 418176MB (856424448 sectors) RAID volume '' is optimal


Looks A OK to me.


Even better.  :-)


The intended production workload involves creation and deletion of
a large number of files rather rapidly.

...
sysctl vfs.ffs.doasyncfree=0 might help.  Running the syncer more 
frequently might also help, but I don't recall the sysctl node for

that.


OK; I've relayed your suggestion to my colleague, but haven't heard back
from her yet.


...
Very strange.  No chance that it was due to files that were deleted but
still referenced by open apps?


I don't think so.  She's deployed 13 other boxen over the last few years
with -- naturally! -- different hardware specs, but all running
essentailly the same application.

The big question for her is whether or not the Dell 2950, as specified,
will do  the job.


...
This sounds purely like a filesystem issue, not an MFI driver issue.


Hmmm... I'll admit to knowing little about RAID configurations; is it
possible that some RAID configurations might exacerbate problems with
such a workload -- or that others might be more amenable to it?



If anything, a fast RAID controller will help reduce the lag that you 
get when the syncer does its periodic run.  But beyond that, I can't 
think of anything that would cause problems.


Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock problem

2007-05-11 Thread Peter Jeremy

On 2007-May-11 12:11:29 +0200, Oliver Fromme [EMAIL PROTECTED] wrote:
Of course, the best solution is to buy a GPS or DCF radio
receiver and set up a startum-1 yourself.

One of our customers has 6 GPS-locked NTP servers.  Only problem is
that two of them are reporting a time that is exactly one second
different to the other four.  You shouldn't rely solely on your
GPS or DCF receiver - use it as the primary source but have some
secondary sources for sanity checks.  (From experience, I can state
that ntpd does not behave well when presented with two stratum 1
servers that differ by 1 second).

-- 
Peter Jeremy


pgpNDrmLVcVm1.pgp
Description: PGP signature

Hard Hang, nothing in logs / no panics

2007-05-11 Thread Roger Miranda

Good Day everyone.

I have this one system setup with If_bridge to filter traffic. It does work 
quite good.  I am running FreeBSD 6.2 but as a TINYBSD Image.  The one 
problem I have is I place the machine at the perimeter on our network with 27 
seats.  At that time anywhere between 15min - 24hours the entire system goes 
into a Hard Lock (Physical reboot needed).  The thing is there is no logs or 
kernel panics or anything.  No IRQ Conflicts exists.  

I am looking for any inputs or any ways to go after looking how to even 
diagnosed this.

Here is a copy of my dmesg.

Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-RELEASE #0: Wed May  9 17:48:30 UTC 2007
root@:/usr/obj/usr/src/sys/TINYBSD
WARNING: MPSAFE network stack disabled, expect reduced performance.
ACPI APIC Table: IntelR AWRDACPI
Timecounter i8254 frequency 1193520 Hz quality 0
CPU: Intel(R) Celeron(R) CPU 2.80GHz (2792.85-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf34  Stepping = 4
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0x441dSSE3,RSVD2,MON,DS_CPL,CNTX-ID,b14
real memory  = 535691264 (510 MB)
avail memory = 51964 (494 MB)
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
acpi0: IntelR AWRDACPI on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x408-0x40b on acpi0
cpu0: ACPI CPU on acpi0
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pcib1: ACPI PCI-PCI bridge at device 3.0 on pci0
pci1: ACPI PCI bus on pcib1
em0: Intel(R) PRO/1000 Network Connection Version - 6.3.9 port 0xc000-0xc01f 
mem 0xf200-0xf201 irq 18 at device 1.0 on pci1
em0: Ethernet address: 00:30:48:86:97:62
em0: [GIANT-LOCKED]
pcib2: ACPI PCI-PCI bridge at device 28.0 on pci0
pci2: ACPI PCI bus on pcib2
pci0: serial bus, USB at device 29.0 (no driver attached)
pci0: serial bus, USB at device 29.1 (no driver attached)
pci0: base peripheral at device 29.4 (no driver attached)
pci0: base peripheral, interrupt controller at device 29.5 (no driver 
attached)
pci0: serial bus, USB at device 29.7 (no driver attached)
pcib3: ACPI PCI-PCI bridge at device 30.0 on pci0
pci3: ACPI PCI bus on pcib3
pci3: display, VGA at device 9.0 (no driver attached)
em1: Intel(R) PRO/1000 Network Connection Version - 6.3.9 port 0xd100-0xd13f 
mem 0xf100-0xf101 irq 19 at device 10.0 on pci3
em1: Ethernet address: 00:30:48:86:97:63
em1: [GIANT-LOCKED]
isab0: PCI-ISA bridge at device 31.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel 6300ESB SATA150 controller port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f irq 18 at device 31.2 on 
pci0
ata0: ATA channel 0 on atapci0
ata1: ATA channel 1 on atapci0
pci0: serial bus, SMBus at device 31.3 (no driver attached)
acpi_tz0: Thermal Zone on acpi0
sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
atkbd0: AT Keyboard irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
pmtimer0 on isa0
orm0: ISA Option ROM at iomem 0xc-0xc7fff on isa0
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x300
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
Timecounter TSC frequency 2792849472 Hz quality 800
Timecounters tick every 1.000 msec
IP Filter: v4.1.13 initialized.  Default = pass all, Logging = enabled
ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, 
default to accept, logging limited to 100 packets/entry by default
ad0: 977MB SanDisk SDCFH-1024 HDX 3.19 at ata0-master PIO4
ad2: 76319MB Seagate ST380811AS 3.AAB at ata1-master SATA150
Trying to mount root from ufs:/dev/ad0s1a
em1: link state changed to DOWN
em0: link state changed to DOWN
bridge0: Ethernet address: 46:e0:af:c9:e6:b7
em0: promiscuous mode enabled
em1: promiscuous mode enabled
em0: link state changed to UP
em1: link state changed to UP
em1: link state changed to DOWN
em1: link state changed to UP
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...1 0 0 done
All buffers synced.
Uptime: 47m5s
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD

Re: UNIX domain sockets MFC's

2007-05-11 Thread Abdullah Ibn Hamad Al-Marri


On 5/11/07, Robert Watson [EMAIL PROTECTED] wrote:

On Fri, 11 May 2007, Abdullah Ibn Hamad Al-Marri wrote:

 On 5/11/07, Robert Watson [EMAIL PROTECTED] wrote:
 On Tue, 8 May 2007, Robert Watson wrote:

  Right now I am tracking two known issues with UNIX domain sockets in
  RELENG_6:
 
  - Reported NULL point derference in unp_connect(), which occurs due to
 the
   dropping of locks around sonewconn().  This is fixed in HEAD, and I am
   preparing an MFC of this patch.

 The fix for this has now been merged as 1.155.2.22.  As there have been no
 new reports of UNIX domain socket problems in the last couple of days, it
 sounds like the MFC of the last batch of fixes and cleanups has not lead to
 problems.

 I updated my server which has dual CPU with SMP one hour ago, and I didn't
 see any speed improvement in the MySQL.

The speed improvements associated with these MFC's is minor; primarily they
are stability improvements under high load.  There are major performance
improvements in the 7.x implementation, especially for multi-core systems, but
I have no current plans to MFC them, as they interact with other system
components and may depend on other changes that also haven't been MFC'd.

Robert N M Watson
Computer Laboratory
University of Cambridge



Robert,

Thanks for clearing this up.

How safe is using 7.0 for MySQL server now?
--
Regards,

-Abdullah Ibn Hamad Al-Marri
Arab Portal
http://www.WeArab.Net/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Hang, nothing in logs / no panics

2007-05-11 Thread Kris Kennaway

On Fri, May 11, 2007 at 02:42:51PM -0500, Roger Miranda wrote:
 Good Day everyone.
 
 I have this one system setup with If_bridge to filter traffic. It does work 
 quite good.  I am running FreeBSD 6.2 but as a TINYBSD Image.  The one 
 problem I have is I place the machine at the perimeter on our network with 27 
 seats.  At that time anywhere between 15min - 24hours the entire system goes 
 into a Hard Lock (Physical reboot needed).  The thing is there is no logs or 
 kernel panics or anything.  No IRQ Conflicts exists.  
 
 I am looking for any inputs or any ways to go after looking how to even 
 diagnosed this.

See the developers handbook chapter on kernel debugging.

Kris (I really need to make a keyboard shortcut for typing this phrase)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Hang, nothing in logs / no panics

2007-05-11 Thread Roger Miranda

On Friday 11 May 2007 16:00, Kris Kennaway wrote:
 See the developers handbook chapter on kernel debugging.

Kris,  I have gone through the kernel debugging sections of the developers 
handbook.  The one problem is when I get a hard hang, I do not get any error 
or panics.  And there is no crash or dump data on reboot. Yes. I have enabled 
DDB and KDB and a dumpdir (in /etc/rc.conf)

Am I missing something in the kernel debugging section?  I see 11.9 Debugging 
Deadlocks talk about Deadlocks.  But at the time of the lock I have no way 
of doing a ps or really anything as the system is locked up solid.

Roger

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock problem

2007-05-11 Thread Matthew Dillon


:One of our customers has 6 GPS-locked NTP servers.  Only problem is
:that two of them are reporting a time that is exactly one second
:different to the other four.  You shouldn't rely solely on your
:GPS or DCF receiver - use it as the primary source but have some
:secondary sources for sanity checks.  (From experience, I can state
:that ntpd does not behave well when presented with two stratum 1
:servers that differ by 1 second).
:
:--=20
:Peter Jeremy

Ntp will also become really unhappy when chunky time slips occur
or if the skew rate is more then a few hundred ppm.  Ntp will also blow
up if it loses the network link for a long period of time.  It will just
give up and stop making corrections entirely, even after the link is
restored.  This is particularly true when it is used over a dialup
(me having done that for over a year in 1997, so I can tell you how
badly it works).

A slow time slip over a day could still be chunky, which would imply
lost interrupts.  Determining whether the problem is due to an 8254
rollover or lost hardclock interrupts is easy... just set 'hz' to
something really high, like 2, and see if your time goes crazy.
If it does, then you have your culprit.

I don't know if those bugs are still present in FreeBSD, but I do
remember that I had to redo all the timekeeping in DragonFly because
lost interrupts from high 'hz' settings were causing timekeeping to
go nuts.  That turned out to mainly be due to the same 8254 timer being
used to generate the hardclock interrupt AND handle time keeping.
i.e. at high hz settings one was not getting the full 1/18 second
benefit from the timer.  You just can't do that... it doesn't work.
It is almost 100% guarenteed to result in a bad time base.

It is easy to test.. just set your kern.hz in the boot env, reboot,
and see if things blow up or not.  Time keeping should be stable
regardless of what hz is set to (provisio: never set hz less then 100).

Unfortunately, all the timebases in the system have their own quirks.
Blame the hardware manufacturers.  The 8254 timer 0 is actually the
MOST consistent of the lot, with the ACPI timer coming a close second.

TSC Haha.  Good luck.  Nice wide timer, easy to read,
but any power savings mode, including the failsafe
modes that intel has when a cpu overheats, will
probably blow it up.  Because of that it is not
really a good idea to use it as a timebase.  I shake
my fist at Intel! $#%$#%$#% 

ACPI timer  Despite the hardware bugs this almost always works
as a timebase, but sometimes the frequency changes
when the cpu goes into power savings mode or EST,
and sometimes the frequency is something other
then what it is supposed to be.

8254 timer 0Almost always works as a timebase, but only if
not also used to generate high-speed interrupts
(because interrupts are lost easily).  Set it to
a full cycle (1/18 second) and you will be fine.
Set it to anything else and you will lose interrupts.

The BIOS will sometimes mess with timer 0, but not
as often as it messes with timer 2.

8254 timer 1Sometimes works as a time base, but can lock older
machines up.  Can even lock up newer machines.
Why?  Because hardware manufacturers are idiots.

8254 timer 2Often can be used as a time base, but video bios
calls often try to use it too.  [EMAIL PROTECTED] bios 
makers!
Still, this is better then losing interrupts when
timer 0 is set to high speed so DragonFly uses
timer 2 for its timebase as a default until the
ACPI timer becomes available, with a boot option
to use timer 1 instead.  Using timer 2 as a time 
base means you don't get motherboard speaker sound
(the old beep beep BEEP!).  Do I care?  No.

LAPIC timer Dunno.  Probably best to use it as a high speed
clock interrupt which would free 8254 timer 0 to
use as a time base.

RTC interrupt   Basically unusable.  Stable, but doesn't have
sufficient resolution to be helpful and takes
forever to read.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

Re: Hard Hang, nothing in logs / no panics

2007-05-11 Thread Kris Kennaway

On Fri, May 11, 2007 at 04:21:25PM -0500, Roger Miranda wrote:
 On Friday 11 May 2007 16:00, Kris Kennaway wrote:
  See the developers handbook chapter on kernel debugging.
 
 Kris,  I have gone through the kernel debugging sections of the developers 
 handbook.  The one problem is when I get a hard hang, I do not get any error 
 or panics.  And there is no crash or dump data on reboot. Yes. I have enabled 
 DDB and KDB and a dumpdir (in /etc/rc.conf)
 
 Am I missing something in the kernel debugging section?  I see 11.9 
 Debugging 
 Deadlocks talk about Deadlocks.  But at the time of the lock I have no way 
 of doing a ps or really anything as the system is locked up solid.

You missed that the debugger is there to debug bugs (including
deadlocks).  Break to the debugger and obtain the necessary debugging
:)

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Hang, nothing in logs / no panics

2007-05-11 Thread Roger Miranda


 You missed that the debugger is there to debug bugs (including
 deadlocks).  Break to the debugger and obtain the necessary debugging

I should've been more clear.  I can not break to the debugger (CTRL-ALT-ESC) 
when the system locks up.  Am I possible looking at a hardware issue?  If so 
what is the best way to test for it?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Hang, nothing in logs / no panics

2007-05-11 Thread Kris Kennaway

On Fri, May 11, 2007 at 04:43:06PM -0500, Roger Miranda  wrote:
 
  You missed that the debugger is there to debug bugs (including
  deadlocks).  Break to the debugger and obtain the necessary debugging
 
 I should've been more clear.  I can not break to the debugger (CTRL-ALT-ESC) 
 when the system locks up.

You may need the KDB_STOP_NMI option, especially if it is an SMP
system.  I forget if you also need to enable a sysctl on 6.x (look at
sysctl -a | grep nmi for the obvious one)

 Am I possible looking at a hardware issue?  If so 
 what is the best way to test for it?

If that doesn't work then in my experience it is likely to be
hardware-related.  The usual debugging procedure then involves trying
to replicate on an unrelated machine, and/or swapping out hardware
components.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: clock problem

2007-05-11 Thread Matthew Dillon

Another idea to help track down timebase problems.  Port dntpd to
FreeBSD.  You need like three sysctls (because the ntp API and the
original sysctl API are both insufficient).  Alternatively you could
probably hack dntpd to run in debug mode without having to implement
any new sysctls, as long as you be sure to clean out any active
kernel timebase adjustments in the kernel before you run it.

Here's some sample output:

http://apollo.backplane.com/DFlyMisc/dntpd.sample01.txt

Dntpd in debug mode will print out the results from two staggered
continuously running linear regressions (resets after 30 samples,
staggered by 15 samples).

For anyone who understands how linear regressions work, finding kernel
timekeeping bugs is really easy with this sort of output.  You get the
slope, y-intercept, correlation, and standard deviation, and then you
get calculated frequency drift and time offset based on those numbers.

The correlation is accurate after around 10 samples.  Note that
frequency drift calculations require longer intervals to get better
results.  The forced 30 second interval set in the sample output is
way too short, hence the errors (it has to be in 90th percentile to
even have a chance of producing a reasonable PPM calculation).  But
also remember we are talking parts per million here.

If you throw away iteration numbers  15 or so you will get very nice
output and kernel bugs will show up in fairly short order.  Kernel
bugs will show up as non-trivial y-intercept calculations over
multiple samples, large jumps in the offset, inability to get a good
correlation (provisio: sample interval has to be at least 120 seconds,
not the 30 in my example), and so on and so forth.

Also be sure to use a locked ntp source, otherwise running corrections on
the source will show up as problems in the debug output.  ntp.pool.org
is usually good enough.  It's fun checking various time sources with
an idle box with a good timebase. hhahahhaha. OMG.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Hang, nothing in logs / no panics

2007-05-11 Thread Diane Bruce

On Fri, May 11, 2007 at 05:47:35PM -0400, Kris Kennaway wrote:
 On Fri, May 11, 2007 at 04:43:06PM -0500, Roger Miranda  wrote:
 
   You missed that the debugger is there to debug bugs (including
   deadlocks).  Break to the debugger and obtain the necessary debugging
 
  I should've been more clear.  I can not break to the debugger (CTRL-ALT-ESC)
  when the system locks up.

 You may need the KDB_STOP_NMI option, especially if it is an SMP

He can also try a serial console, if he can scare up something to use
as a serial console. Serial ports are becoming legacy but if he can
do it, it might help him.

- Diane
--
- [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.db.net/~db
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Hang, nothing in logs / no panics

2007-05-11 Thread Daniel O'Connor

On Saturday 12 May 2007 07:34, Diane Bruce wrote:
  You may need the KDB_STOP_NMI option, especially if it is an SMP

 He can also try a serial console, if he can scare up something to use
 as a serial console. Serial ports are becoming legacy but if he can
 do it, it might help him.

Or Firewire, you can do that with an uncooperative system - unless the 
PCI bus is hung (which would be useful information in an of itself)

Might be worth trying a BIOS update too.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
The nice thing about standards is that there
are so many of them to choose from.
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


pgpPaEg1OsOeo.pgp
Description: PGP signature

Re: UNIX domain sockets MFC's

2007-05-11 Thread Mark Linimon

On Fri, May 11, 2007 at 11:26:37PM +0300, Abdullah Ibn Hamad Al-Marri wrote:
 How safe is using 7.0 for MySQL server now?

-CURRENT is never recommended for mission-critical applications.

mcl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

37 matches

Mail list logo