Re: RELENG7 using lpt causes panic

2008-01-08 Thread Adrian Wontroba
On Tue, Jan 08, 2008 at 01:52:40PM +0700, Eugene Grosbein wrote:
 On Tue, Jan 08, 2008 at 03:08:57AM +, Adrian Wontroba wrote:
 
  I've recently switched some of my home systems to RELENG7.
  
  All seemed fairly well until I tried printing a CUPS test page on my
  backup and print server to an elderly Laserjet IIIp, where I seem to
  have a reproducible panic. It has happened twice.  This is painful, as
  I have a big home fileystem (striped over two mirrors over most of two
  500 GB disks). The gmirror syncronisation and background fsck leave the
  system close to unusable for hours while they fight over the disks.
  
  I was somewhat startled that something so basic as printing causes a
  panic. There have been no hardware changes since I last printed under
  RELENG6, but I don't print often, so hardware decay is a possibility.
  
  Is this a known problem? If not, I'll take the time to try various tests
  (with /home unmounted) and raise a PR.
 
 There is a PR about this problem with workaround:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/117973
 
 Eugene Grosbein

My thanks to Eugene for the pointer to a known workaround
(switching the printer driver to extended mode before printing with
'/usr/sbin/lptcontrol -e -d /dev/lpt0.ctl') and to John for explaining
the underlying issue and intentions for fixing it.

-- 
Adrian Wontroba
Heisenberg may have done it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG7 using lpt causes panic

2008-01-08 Thread Scott Long

John Baldwin wrote:

On Monday 07 January 2008 10:08:57 pm Adrian Wontroba wrote:

I've recently switched some of my home systems to RELENG7.

All seemed fairly well until I tried printing a CUPS test page on my
backup and print server to an elderly Laserjet IIIp, where I seem to
have a reproducible panic. It has happened twice.  This is painful, as
I have a big home fileystem (striped over two mirrors over most of two
500 GB disks). The gmirror syncronisation and background fsck leave the
system close to unusable for hours while they fight over the disks.

I was somewhat startled that something so basic as printing causes a
panic. There have been no hardware changes since I last printed under
RELENG6, but I don't print often, so hardware decay is a possibility.

Is this a known problem? If not, I'll take the time to try various tests
(with /home unmounted) and raise a PR.

I envisage tests such as:
* Does switching to a kernel without SMP and apic make a difference?
* Does direct output cause a crash?
* Does polling make a difference?
* Does the parallel port mode (I think extended at present) make a
  difference?

Some detail below.


This is a known issue and it has to do with some changes in the interrupt
code in 7.x that interact badly with the lpt(4) driver (which tears down its
interrupt handler and sets it back up again for each character, and the
panic you see is because an interrupt came when it wasn't expecting it).
The lpt(4) driver does this weird dance to allow coexisting with vpo so you
can have lpd running and unplug your printer and plug up a Zip drive w/o
having to stop lpd.  I think the way I want to fix it is to change the lpt
driver to not release the bus (and thus remove its interrupt handler) for
every char but to keep the bus while /dev/lpt0 is open (which would be all
the time with lpd running).



My guess is that LPT ZIP drives all died a clicking death many, many 
years ago.  I usually don't advocate for the removal of hardware 
support, but these drives were so pitiful and so poorly engineered that

I honestly doubt there is any value in keeping the driver around.

Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG7 using lpt causes panic

2008-01-08 Thread John Baldwin
On Tuesday 08 January 2008 10:08:05 am Scott Long wrote:
 John Baldwin wrote:
  On Monday 07 January 2008 10:08:57 pm Adrian Wontroba wrote:
  I've recently switched some of my home systems to RELENG7.
 
  All seemed fairly well until I tried printing a CUPS test page on my
  backup and print server to an elderly Laserjet IIIp, where I seem to
  have a reproducible panic. It has happened twice.  This is painful, as
  I have a big home fileystem (striped over two mirrors over most of two
  500 GB disks). The gmirror syncronisation and background fsck leave the
  system close to unusable for hours while they fight over the disks.
 
  I was somewhat startled that something so basic as printing causes a
  panic. There have been no hardware changes since I last printed under
  RELENG6, but I don't print often, so hardware decay is a possibility.
 
  Is this a known problem? If not, I'll take the time to try various tests
  (with /home unmounted) and raise a PR.
 
  I envisage tests such as:
  * Does switching to a kernel without SMP and apic make a difference?
  * Does direct output cause a crash?
  * Does polling make a difference?
  * Does the parallel port mode (I think extended at present) make a
difference?
 
  Some detail below.
  
  This is a known issue and it has to do with some changes in the interrupt
  code in 7.x that interact badly with the lpt(4) driver (which tears down its
  interrupt handler and sets it back up again for each character, and the
  panic you see is because an interrupt came when it wasn't expecting it).
  The lpt(4) driver does this weird dance to allow coexisting with vpo so you
  can have lpd running and unplug your printer and plug up a Zip drive w/o
  having to stop lpd.  I think the way I want to fix it is to change the lpt
  driver to not release the bus (and thus remove its interrupt handler) for
  every char but to keep the bus while /dev/lpt0 is open (which would be all
  the time with lpd running).
  
 
 My guess is that LPT ZIP drives all died a clicking death many, many 
 years ago.  I usually don't advocate for the removal of hardware 
 support, but these drives were so pitiful and so poorly engineered that
 I honestly doubt there is any value in keeping the driver around.

It's not a matter of removing vpo(4) so much as reworking the various ppbus
drivers to not be so fancy with trying to drop and release the ppbus but
only do that in open/close.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Wayne Sierke
FreeBSD 6.3-PRERELEASE #5: Sat Dec 29 19:25:43 CST 2007 i386

I've noticed that this system is freezing periodically, for anywhere
from around a fraction of a second up to perhaps 1.5 seconds, and
occurring on average about twice a minute, but with no particular
pattern - i.e. it could happen twice in fairly quick succession, or not
for 40-50 seconds.

By running glxgears the problem is easily witnessed, also by maintaining
the mouse in constant motion, and it was originally noticed during
normal use when, e.g. the system momentarily becomes unresponsive to
keyboard input. I've now also witnessed it by running top with a delay
setting of zero and displaying system processes. The shorter delays are
not easily spotted but the longer delays are. This was seen both in a
terminal session and at a console.

Using top also revealed another interesting phenomenon - the 'top'
display freezes for very brief periods, probably less than 0.1s, but
quite regularly, at around once per second. Seeing this prompted me to
realise that glxgears stutters in what appears to be a corresponding
way. This behaviour (the freezing itself) is not entirely consistent,
either. Just while I've been writing these paragraphs I've noticed
glxgears' behaviour range from stuttering at around once per second, to
around twice per second, to not doing it (perceptibly) at all.

I now realise that I've actually witnessed this behaviour in glxgears
for quite some time, extending back into the distant history of this
system, which may include 6.0-RELEASE but would certainly include a few
different revisions in the 6.x series. It has been always
source-upgraded since the original install. I'd always just thought that
it was some hiccough in the graphics system.

Ports were recently upgraded so it is now running xorg-7.3_1,
gnome-2.20.2, nvidia-driver-96.43.01. The system has been up for 5 days
now and is quite loaded up with applications at the moment and is
normally left running. Currently it has running (on gnome): evolution,
firefox-devel (with some dozens of windows and tabs open), xmms, gimp,
xchat, eclipse-devel, gedit, a handful of terminal sessions. It's
running a custom kernel - mainly raid and unused network drivers removed
- mods listed below.

The hardware is a Gigabyte GA-8SQ800 motherboard (SiS), with a 2.4GHz
P4, 1.5GB DDR, nVidia MX440 AGP graphics, RTL-8139 network, AHA-2940
SCSI controller (for a scanner), 2 x 80GB ATA disks.

This system has also had a tendency at times to crash when switching VTs
between console and xorg. This problem has varied in consistency with
the different incarnations of OS and nvidia-driver versions and hasn't
yet occurred with this latest update.

I'd really appreciate any suggestions to help identify what's going on
here.



Thanks,

Wayne


diff of kernel config with GENERIC:
-# cpu  I486_CPU
-# cpu  I586_CPU
-# device   eisa
-# device   ataraid # ATA RAID drives
-# device   atapifd # ATAPI floppy drives
-# device   amr # AMI MegaRAID
-# device   arcmsr  # Areca SATA II RAID
-# device   asr # DPT SmartRAID V, VI and Adaptec SCSI 
RAID
-# device   ciss# Compaq Smart RAID 5*
-# device   dpt # DPT Smartcache III, IV - See NOTES 
for options
-# device   hptmv   # Highpoint RocketRAID 182x
-# device   hptrr   # Highpoint RocketRAID 17xx, 22xx, 
23xx, 25xx
-# device   rr232x  # Highpoint RocketRAID 232x
-# device   iir # Intel Integrated RAID
-# device   ips # IBM (Adaptec) ServeRAID
-# device   mly # Mylex AcceleRAID/eXtremeRAID
-# device   twa # 3ware 9000 series PATA/SATA RAID
-# device   aac # Adaptec FSA RAID
-# device   aacp# SCSI passthrough for aac (requires 
CAM)
-# device   ida # Compaq Smart RAID
-# device   mfi # LSI MegaRAID SAS
-# device   mlx # Mylex DAC960 family
-# device   pst # Promise Supertrak SX6000
-# device   twe # 3ware ATA RAID
-# device   plip# TCP/IP over parallel
-# device   cs  # Crystal Semiconductor CS89x0 NIC
-# device   ed  # NE[12]000, SMC Ultra, 3c503, DS8390 
cards
-# device   ex  # Intel EtherExpress Pro/10 and Pro/10+
-# device   ep  # Etherlink III based cards
-# device   fe  # Fujitsu MB8696x based cards
-# device   ie  # EtherExpress 8/16, 3C507, StarLAN 10 
etc.
-# device   lnc # NE2100, NE32-VL Lance Ethernet cards
-# device   sn  # SMC's 9000 series of 

Re: 6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Julian H. Stacey

 By running glxgears the problem is easily witnessed, also by maintaining

To save others hunting sources which have migrated between FreeBSD releasess:

FreeBSD/releases/4.11-RELEASE/ports
x11/XFree86-4-clients/pkg-plist:bin/glxgears
x11/xorg-clients/pkg-plist:bin/glxgears

FreeBSD/releases/6.2-RELEASE/ports
x11/XFree86-4-clients/files/manpages: glxgears.1 \
x11/XFree86-4-clients/pkg-plist:bin/glxgears
x11/xorg-clients/files/manpages:  glxgears.1 \
x11/xorg-clients/pkg-plist:bin/glxgears

FreeBSD/branches/-current/ports
graphics/mesa-demos/Makefile:XDEMO_PROGS= glthreads glxcontexts 
glxdemo glxgears glxgears_fbconfig \
builds OK on 7-Stable
x11/XFree86-4-clients/files/manpages: glxgears.1 \
x11/XFree86-4-clients/pkg-plist:bin/glxgears
XFree86-clients-4.5.0_4 is part of XFree86 and you
have xorg set for X11 distribution. 
-- 
Julian Stacey. Munich Computer Consultant, BSD Unix C Linux. http://berklix.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


What current Dell Systems are supported/work

2008-01-08 Thread Richard Bates

Sorry for the repost...
I don't think the first one posted..

posted to freebsd.stable, freebsd-current, Freebsd-hardware

I checked the hardware in the online documentation manual/hardware

It only lists the bits and peices of the machine say the hard drive  
controller and so forth. but doesn't give you a particular system to  
look at as a working machine with FreeBSD 6.2


does anybody know if a Dell PowerEdge 1950
• Quad-Core Intel Xeon Processors 5400 series 3.16GHz
• 4GB Ram

I am looking to attach 2 machines to a SAN to make a constantly up  
system. Is there a Dell San and San Switch that will work with this  
version of BSD?


Thank you for your help

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Toomas Aas

Wayne Sierke wrote:


FreeBSD 6.3-PRERELEASE #5: Sat Dec 29 19:25:43 CST 2007 i386

I've noticed that this system is freezing periodically, for anywhere
from around a fraction of a second up to perhaps 1.5 seconds, and
occurring on average about twice a minute, but with no particular
pattern - i.e. it could happen twice in fairly quick succession, or not
for 40-50 seconds.


I'm very surprised to find out that maybe I'm not alone in the universe. I 
have a 6.3-PRERELEASE system (last cvsupped Dec 12), which always freezes 
*once* a few minutes after booting up. This happens typically at the time 
when I've just logged in using xdm, fluxbox has started up and I'm opening 
my first xterm. The freeze lasts approximately 1-3 seconds. Once this 
freeze is over, it doesn't happen again until I shut down the machine for 
the night. But it does happen every time after starting up.



Ports were recently upgraded so it is now running xorg-7.3_1,
gnome-2.20.2, nvidia-driver-96.43.01. 


I have xorg 7.3. I don't have the full Gnome installed, but I do have gtk 
2.12 and lots of other gtk-based stuff from gnome 2.20


The system has been up for 5 days now and is quite loaded up with 
applications at the moment and is normally left running. 


Mine gets shut down every night.


The hardware is a Gigabyte GA-8SQ800 motherboard (SiS), with a 2.4GHz
P4, 1.5GB DDR, nVidia MX440 AGP graphics, RTL-8139 network, AHA-2940
SCSI controller (for a scanner), 2 x 80GB ATA disks.


Intel D865GLC mobo (i865), 2.8 GHz P4, 768 MB DDR, integrated i865 
graphics, integrated Intel Pro/100 Ethernet, additional VIA Fire II 
(VT6306) PCI FireWire adapter, 1x80 GB ATA disk. Arbitrary PCI 
SoundBlaster workalike which calls itself AudioPCI ES1373-B.



I'd really appreciate any suggestions to help identify what's going on
here.


I've always just thought it to be some random hardware glitch that is 
hopeless to track down, but maybe I'm wrong. This machine was installed 
with FreeBSD just last autumn and initially it didn't have this problem 
running 6.2-STABLE. Unfortunately I can't pinpoint which source update or 
ports update or any other change it was that triggered this behaviour.


My kernel config diff from GENERIC:

-cpuI486_CPU
-cpuI586_CPU
-makeoptionsDEBUG=-g# (- the irony)
-optionsINET6
-optionsNFSCLIENT
-optionsNFSSERVER
-optionsNFS_ROOT
-optionsCOMPAT_FREEBSD4
-optionsCOMPAT_FREEBSD5

-device ataraid
-device atapifd
-device atapist

-device ahb
-device ahc
-optionsAHC_REG_PRETTY_PRINT
-device ahd
-optionsAHD_REG_PRETTY_PRINT
-device amd
-device isp
-device mpt
-device sym
-device trm
-device adv
-device adw
-device aha
-device aic
-device bt
-device ncv
-device nsp
-device stg

-device ch
-device sa
-device ses

-device amr
-device arcmsr
-device asr
-device ciss
-device dpt
-device hptmv
-device rr232x
-device iir
-device ips
-device mly
-device twa

-device aac
-device aacp
-device ida
-device mfi
-device mlx
-device pst
-device twe

-device cbb
-device pccard
-device cardbus

-device ppc
-device ppbus
-device lpt
-device plip
-device ppi

-device de
-device em
-device ixgb
-device txp
-device vx

-device bce
-device bfe
-device bge
-device dc

-device lge
-device msk
-device nge
-device nve
-device pcn
-device re
-device rl
-device sf
-device sis
-device sk
-device ste
-device stge
-device ti
-device tl
-device tx
-device vge
-device vr
-device wb
-device xl

-device cs
-device ed
-device ex
-device ep
-device fe
-device ie
-device lnc
-device sn
-device xe

-device wlan
-device wlan_wep
-device wlan_ccmp
-device wlan_tkip
-device an
-device ath
-device ath_hal
-device ath_rate_sample
-device awi
-device ral
-device wi

-device sl
-device ppp
-device gif
-device faith

-device ohci
-device ural
-device urio
-device uscanner
-device aue
-device axe
-device cdce
-device cue
-device kue
-device rue

-device fwe

+device sound
+device snd_es137x

+device smb
+device smbus
+device ichsmb

+device atapicam


Re: 6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Kris Kennaway

Toomas Aas wrote:

Wayne Sierke wrote:


FreeBSD 6.3-PRERELEASE #5: Sat Dec 29 19:25:43 CST 2007 i386

I've noticed that this system is freezing periodically, for anywhere
from around a fraction of a second up to perhaps 1.5 seconds, and
occurring on average about twice a minute, but with no particular
pattern - i.e. it could happen twice in fairly quick succession, or not
for 40-50 seconds.


I'm very surprised to find out that maybe I'm not alone in the universe. 
I have a 6.3-PRERELEASE system (last cvsupped Dec 12), which always 
freezes *once* a few minutes after booting up. This happens typically at 
the time when I've just logged in using xdm, fluxbox has started up and 
I'm opening my first xterm. The freeze lasts approximately 1-3 seconds. 
Once this freeze is over, it doesn't happen again until I shut down the 
machine for the night. But it does happen every time after starting up.


This is usually when your system accesses swap because you ran out of 
free RAM.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Toomas Aas

Kris Kennaway wrote:


Toomas Aas wrote:
I have a 6.3-PRERELEASE system (last cvsupped Dec 12), which 
always freezes *once* a few minutes after booting up. 


This is usually when your system accesses swap because you ran out of 
free RAM.


Hmm, I thought that 768 MB RAM (minus 16 for integrated video) should be 
enough for anyone ;) at least when just running fluxbox with no apps 
started yet. Right now 'top' shows that no swap is being used and I have 
Thunderbird, Firefox, xchat, xmms, xcalc and 5 xterms with ssh sessions:


last pid: 1264;  load averages:  1.04,  1.04,  1.01 up 0+02:07:15 21:59:30
82 processes:  2 running, 80 sleeping
CPU states: 0.4% user, 98.5% nice, 1.1% system, 0.0% interrupt,  0.0% idle
Mem: 149M Active, 227M Inact, 100M Wired, 1388K Cache, 84M Buf, 251M Free
Swap: 512M Total, 512M Free

but of course I can't claim that swap wasn't being used when the problem 
happened.


I do understand that there is virtually nothing in my initial posting that 
could possibly help track down the problem. I just got this me too feeling.


--
Toomas Aas

... Windws is ine for bckgroun comunicaions - Bll Gats, 192
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Kris Kennaway

Toomas Aas wrote:

Kris Kennaway wrote:


Toomas Aas wrote:
I have a 6.3-PRERELEASE system (last cvsupped Dec 12), which always 
freezes *once* a few minutes after booting up. 


This is usually when your system accesses swap because you ran out of 
free RAM.


Hmm, I thought that 768 MB RAM (minus 16 for integrated video) should be 
enough for anyone ;) at least when just running fluxbox with no apps 
started yet. Right now 'top' shows that no swap is being used and I have 
Thunderbird, Firefox, xchat, xmms, xcalc and 5 xterms with ssh sessions:


last pid: 1264;  load averages:  1.04,  1.04,  1.01 up 0+02:07:15 21:59:30
82 processes:  2 running, 80 sleeping
CPU states: 0.4% user, 98.5% nice, 1.1% system, 0.0% interrupt,  0.0% idle
Mem: 149M Active, 227M Inact, 100M Wired, 1388K Cache, 84M Buf, 251M Free
Swap: 512M Total, 512M Free

but of course I can't claim that swap wasn't being used when the problem 
happened.


I do understand that there is virtually nothing in my initial posting 
that could possibly help track down the problem. I just got this me 
too feeling.


OK, it is not in use now, but check again immediately after the pause.

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Toomas Aas

Kris Kennaway wrote:

Toomas Aas wrote:



Kris Kennaway wrote:


Toomas Aas wrote:


I have a 6.3-PRERELEASE system (last cvsupped Dec 12), which always 
freezes *once* a few minutes after booting up. 


This is usually when your system accesses swap because you ran out of 
free RAM.


last pid: 1264;  load averages:  1.04,  1.04,  1.01 up 0+02:07:15 
21:59:30

82 processes:  2 running, 80 sleeping
CPU states: 0.4% user, 98.5% nice, 1.1% system, 0.0% interrupt,  0.0% 
idle

Mem: 149M Active, 227M Inact, 100M Wired, 1388K Cache, 84M Buf, 251M Free
Swap: 512M Total, 512M Free



OK, it is not in use now, but check again immediately after the pause.


I just did that. Rebooted, logged back in using xdm, quickly started an 
xterm and top in it. Then moved around the mouse pointer. When the freeze 
happened, 'top' (which itself froze too) showed that 609 MB memory was 
free and no swap was being used.


--
Toomas Aas

... Kindred: Fear that relatives are coming to stay.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Kris Kennaway

Toomas Aas wrote:

Kris Kennaway wrote:

Toomas Aas wrote:



Kris Kennaway wrote:


Toomas Aas wrote:


I have a 6.3-PRERELEASE system (last cvsupped Dec 12), which always 
freezes *once* a few minutes after booting up. 


This is usually when your system accesses swap because you ran out 
of free RAM.


last pid: 1264;  load averages:  1.04,  1.04,  1.01 up 0+02:07:15 
21:59:30

82 processes:  2 running, 80 sleeping
CPU states: 0.4% user, 98.5% nice, 1.1% system, 0.0% interrupt,  0.0% 
idle
Mem: 149M Active, 227M Inact, 100M Wired, 1388K Cache, 84M Buf, 251M 
Free

Swap: 512M Total, 512M Free



OK, it is not in use now, but check again immediately after the pause.


I just did that. Rebooted, logged back in using xdm, quickly started an 
xterm and top in it. Then moved around the mouse pointer. When the 
freeze happened, 'top' (which itself froze too) showed that 609 MB 
memory was free and no swap was being used.


OK, you may need to set up hwpmc or LOCK_PROFILING to figure out what 
your system is doing at that moment.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: What current Dell Systems are supported/work

2008-01-08 Thread Paul Macdonald

Hi Richard,

I run freeBSD 6.2 on a 1950 and the only i had issue i had was with the 
on board broadcom ethernet,


my workaround is detailed here
http://www.ifdnrg.com/freebsd_broadcom_dell_1950.htm

hope that helps
Paul.




http://www.ifdnrg.com   *web and video services*

*Paul Macdonald*
Director
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
www.ifdnrg.com http://www.ifdnrg.com

*IFDNRG*
*Please note new address!*
127 Rose St South Lane, Edinburgh, EH2 4BB
+44.131.2257470



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.3-PRERELEASE desktop system periodically freezes momentarily

2008-01-08 Thread Kris Kennaway

Toomas Aas wrote:

Kris Kennaway wrote:

OK, you may need to set up hwpmc or LOCK_PROFILING to figure out what 
your system is doing at that moment.


I set up a kernel with hwpmc support, but am feeling a bit (to put it 
mildly) lost, since I haven't really done anything with hwpmc ever 
before. I was hoping to maybe figure out whether this freezing is caused 
by some device interrupt, so I tried:


# pmcstat -S interrupts -o /tmp/sample.txt

But all I get is this:
pmcstat: ERROR: Cannot allocate system-mode pmc with specification 
interrupts: Invalid argument


Yet interrupts is listed in pmc(3).

I'm sure I'm making some ridiculous error.

Having now revealed my ignorance, I go to sleep(1).


Probably you want 'instructions'.

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: What current Dell Systems are supported/work

2008-01-08 Thread Vlad GALU
On 1/8/08, Richard Bates [EMAIL PROTECTED] wrote:
 Sorry for the repost...
 I don't think the first one posted..

 posted to freebsd.stable, freebsd-current, Freebsd-hardware

 I checked the hardware in the online documentation manual/hardware

 It only lists the bits and peices of the machine say the hard drive
 controller and so forth. but doesn't give you a particular system to
 look at as a working machine with FreeBSD 6.2

 does anybody know if a Dell PowerEdge 1950
  • Quad-Core Intel Xeon Processors 5400 series 3.16GHz
  • 4GB Ram

 I am looking to attach 2 machines to a SAN to make a constantly up
 system. Is there a Dell San and San Switch that will work with this
 version of BSD?

   I'm using a newer version of the PE2950, which has the PERC 6/i
controller. Older ones use the PERC 5/i, which is supported by 6.2.
Dell machines are pretty well supported.


 Thank you for your help

 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to [EMAIL PROTECTED]



-- 
Mahnahmahnah!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RELENG_7: zfs mirror causes ata timeout

2008-01-08 Thread Stephen M. Rumble

Hi all,

I'm having a bit of trouble with a new machine running the latest  
RELENG_7 code. I have two 500GB WD Caviar GP disks on a mini-itx  
GM965-based board (MSI fuzzy) running amd64 with 4GB of ram. The  
disks are:


ad4: 476940MB WDC WD5000AACS-00ZUB0 01.01B01 at ata2-master SATA150
ad6: 476940MB WDC WD5000AACS-00ZUB0 01.01B01 at ata3-master SATA150

Both appear to work great alone with UFS and ZFS and separate  
filesystems/pools. However, soon after I create a ZFS mirror between  
the two I run into the following sort of trouble:


ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -  
completing request directly
ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -  
completing request directly
ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -  
completing request directly
ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -  
completing request directly

ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad6: FAILURE - READ_DMA timed out LBA=

Usually these continue on ad infinitum. Sometimes the machine  
recovers, only to fail soon after. These errors also aren't trivial to  
reproduce. They seem to happen at random, especially when the system  
is under low utilisation. Sometimes, however, they occur immediately  
upon boot.


I've tried different power supplies and cables. I've enabled and  
disabled spread spectrum clocking and tried both SATA300 and SATA150  
rates. I've also tried switching drives between ports so that what was  
ad4 is ad6 and what was ad6 is ad4. The problems persist, but seem to  
follow the same drive (ad6 originally, then ad4 when swapped). This  
seems to indicate a drive problem, but it works great on its own, even  
when exercising both disks simultaneously. SMART reports no problems  
and ZFS reports no issues when ad6 is used on its own outside of a zfs  
mirror. It seems like it's the drive, but it works fine when not in a  
mirror. I'm stumped. Any ideas?


The only interesting bit of evidence I could find is that when these  
errors do occur, smartctl reports an increase in the Start_Stop_Count  
field on ad6. ad4, which appears to work fine, doesn't demonstrate  
this and has a much lower value.


Any input would be appreciated. I've tried disabling ACPI, but the  
kernel cannot find the controller (ICH8M). I'm using AHCI, but  
compatibility mode doesn't appear to alter the behaviour. I don't know  
if it's important, but I'm not using ZFS on the whole drive, just  
ad{4,6}s1d.


Any help would be appreciated.

Thanks,
Steve

P.S. Please cc me on replies as I'm not subscribed.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG_7: zfs mirror causes ata timeout

2008-01-08 Thread Jeremy Chadwick
On Tue, Jan 08, 2008 at 05:28:46PM -0500, Stephen M. Rumble wrote:
 I'm having a bit of trouble with a new machine running the latest RELENG_7 
 code. I have two 500GB WD Caviar GP disks on a mini-itx GM965-based board 
 (MSI fuzzy) running amd64 with 4GB of ram. The disks are:

Could be related to a PR that I submit long ago, but was not specific to
ZFS -- instead, it appeared to be specific to the motherboard I was
using.  There's also some tidbits posted by others which appeared to
help them, although performance was impacted:

http://www.freebsd.org/cgi/query-pr.cgi?pr=103435

Another related PR, which seems to indicate motherboard problems:

http://www.freebsd.org/cgi/query-pr.cgi?pr=93885

 ad4: 476940MB WDC WD5000AACS-00ZUB0 01.01B01 at ata2-master SATA150
 ad6: 476940MB WDC WD5000AACS-00ZUB0 01.01B01 at ata3-master SATA150

 I've tried different power supplies and cables. I've enabled and disabled 
 spread spectrum clocking and tried both SATA300 and SATA150 rates. I've 
 also tried switching drives between ports so that what was ad4 is ad6 and 
 what was ad6 is ad4. The problems persist, but seem to follow the same 
 drive (ad6 originally, then ad4 when swapped). This seems to indicate a 
 drive problem, but it works great on its own, even when exercising both 
 disks simultaneously. SMART reports no problems and ZFS reports no issues 
 when ad6 is used on its own outside of a zfs mirror. It seems like it's the 
 drive, but it works fine when not in a mirror. I'm stumped. Any ideas?

Have you tried running long SMART tests (smartctl -t long) on both of
these drives, ditto with an offline test (smartctl -t offline)?
Statistics that are labelled Offline as their type won't get updated
until an offline test is performed.  It's possible those statistics may
provide some answers, but no guarantees.

 The only interesting bit of evidence I could find is that when these errors 
 do occur, smartctl reports an increase in the Start_Stop_Count field on 
 ad6. ad4, which appears to work fine, doesn't demonstrate this and has a 
 much lower value.

Start_Stop_Count indicates the drive is actually stopping then spinning
back up (usually caused by a reset of some kind; equivalent of powering
down then back up but without the loss of power).  It's possible that
your drive has actual problems -- this is supported by the fact that the
problem follows the disk (when moving the disk to another SATA port).

Tracking down the source of this problem usually requires a lot of time,
money, and trial-and-error techniques.  This is what I'd go with:

1) See if there's a BIOS update.  I know at least in the case of Intel
manufactured boards BIOS updates have solved weird problems like this in
the past.

2) Try an Advanced RMA with Western Digital (which guarantees you get a
brand new drive rather than chancing that they repair the one you send
them) and see if a new drive helps.

3) Try replacing the motherboard with a different brand (non-MSI).  I
have nothing against MSI, but switching vendors usually means that you
ensure a cross-model h/w bug (e.g. something vendor does in the BIOS or
engineering which is suspect).  Try Asus or Gigabyte.  Obviously this
will cost money to do and will very likely set you out the cost of the
motherboard you have currently, but it's a viable option since you've
already tried replacing SATA cables.

I'm not sure why ZFS would cause something like this to happen vs. UFS.
I happen to run ZFS at home (same machine as what's mentioned in PR
103435, with the replaced motherboard of course) doing very heavy disk
I/O across two disks, and I have never seen problems of this sort.  That
doesn't mean there isn't a problem, just that I haven't encountered it
with ZFS.

My box at home is an Asus A8N-E w/ 2GB, running RELENG_7 i386.  I don't
use any of the on-board RAID garbage; I use FreeBSD for it.  Relevant
SATA stuff:

atapci1: nVidia nForce CK804 SATA300 controller port 
0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xd800-0xd80f mem 
0xd3002000-0xd3002fff irq 23 at device 7.0 on pci0
atapci1: [ITHREAD]
ata2: ATA channel 0 on atapci1
ata2: [ITHREAD]
ata3: ATA channel 1 on atapci1
ata3: [ITHREAD]
atapci2: nVidia nForce CK804 SATA300 controller port 
0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xc400-0xc40f mem 
0xd3001000-0xd3001fff irq 21 at device 8.0 on pci0
atapci2: [ITHREAD]
ata4: ATA channel 0 on atapci2
ata4: [ITHREAD]
ata5: ATA channel 1 on atapci2
ata5: [ITHREAD]
ad4: 476940MB WDC WD5000AAKS-00TMA0 12.01C01 at ata2-master SATA300
ad6: 476940MB WDC WD5000AAKS-00TMA0 12.01C01 at ata3-master SATA300
ad8: 190782MB WDC WD2000JD-00HBB0 08.02D08 at ata4-master SATA150
ad10: 476940MB Seagate ST3500630AS 3.AAE at ata5-master SATA300

Disks ad4/ad6 are in a ZFS pool (RAID-0, not mirror), and ad8/ad10 are
UFS.  All are on the same physical SATA controller, as you can see.

icarus# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ 

Re: Performance!

2008-01-08 Thread Kris Kennaway

Kris Kennaway wrote:

Krassimir Slavchev wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello Kris,

Here is the lock profiling results, see the attachment.

Please, let me know if you want ssh access to this machine?


Thanks, this is very interesting.  The problem is already fixed in 8.0 
but we were not seeing it being this much of a factor on our test 
hardware.  Possibly it is due to your CPUs being moderately faster and 
causing a timing difference.  Anyway, try this patch.  If there is still 
a performance deficit then repeating the lock profiling will be useful.


Kris




Now with actual patch.

Kris
  FreeBSD src repository

  Modified files:
sys/kern kern_event.c kern_thread.c sys_generic.c 
 sys_pipe.c uipc_sockbuf.c 
sys/netncp   ncp_rq.c ncp_sock.c ncp_sock.h 
sys/netsmb   smb_trantcp.c 
sys/sys  proc.h selinfo.h socketvar.h systm.h 
  Log:
  Refactor select to reduce contention and hide internal implementation
  details from consumers.
  
   - Track individual selecters on a per-descriptor basis such that there
 are no longer collisions and after sleeping for events only those
 descriptors which triggered events must be rescaned.
   - Protect the selinfo (per descriptor) structure with a mtx pool mutex.
 mtx pool mutexes were chosen to preserve api compatibility with
 existing code which does nothing but bzero() to setup selinfo
 structures.
   - Use a per-thread wait channel rather than a global wait channel.
   - Hide select implementation details in a seltd structure which is
 opaque to the rest of the kernel.
   - Provide a 'selsocket' interface for those kernel consumers who wish to
 select on a socket when they have no fd so they no longer have to
 be aware of select implementation details.
  
  Tested by:  kris
  Reviewed on:arch
  
  Revision  ChangesPath
  1.114 +6 -3  src/sys/kern/kern_event.c
  1.264 +2 -0  src/sys/kern/kern_thread.c
  1.160 +414 -168  src/sys/kern/sys_generic.c
  1.194 +6 -3  src/sys/kern/sys_pipe.c
  1.173 +2 -1  src/sys/kern/uipc_sockbuf.c
  1.16  +9 -3  src/sys/netncp/ncp_rq.c
  1.20  +0 -105src/sys/netncp/ncp_sock.c
  1.7   +0 -3  src/sys/netncp/ncp_sock.h
  1.27  +1 -79 src/sys/netsmb/smb_trantcp.c
  1.498 +2 -3  src/sys/sys/proc.h
  1.19  +8 -8  src/sys/sys/selinfo.h
  1.159 +2 -0  src/sys/sys/socketvar.h
  1.263 +0 -4  src/sys/sys/systm.h
http://cvsweb.FreeBSD.org/src/sys/kern/kern_event.c.diff?r1=1.113r2=1.114
--- src/sys/kern/kern_event.c   2007/07/14 21:23:30 1.113
+++ src/sys/kern/kern_event.c   2007/12/16 06:21:19 1.114
@@ -1400,7 +1400,8 @@ kqueue_poll(struct file *fp, int events,
revents |= events  (POLLIN | POLLRDNORM);
} else {
selrecord(td, kq-kq_sel);
-   kq-kq_state |= KQ_SEL;
+   if (SEL_WAITING(kq-kq_sel))
+   kq-kq_state |= KQ_SEL;
}
}
kqueue_release(kq, 1);
@@ -1486,8 +1487,9 @@ kqueue_close(struct file *fp, struct thr
}
 
if ((kq-kq_state  KQ_SEL) == KQ_SEL) {
-   kq-kq_state = ~KQ_SEL;
selwakeuppri(kq-kq_sel, PSOCK);
+   if (!SEL_WAITING(kq-kq_sel))
+   kq-kq_state = ~KQ_SEL;
}
 
KQ_UNLOCK(kq);
@@ -1522,8 +1524,9 @@ kqueue_wakeup(struct kqueue *kq)
wakeup(kq);
}
if ((kq-kq_state  KQ_SEL) == KQ_SEL) {
-   kq-kq_state = ~KQ_SEL;
selwakeuppri(kq-kq_sel, PSOCK);
+   if (!SEL_WAITING(kq-kq_sel))
+   kq-kq_state = ~KQ_SEL;
}
if (!knlist_empty(kq-kq_sel.si_note))
kqueue_schedtask(kq);
http://cvsweb.FreeBSD.org/src/sys/kern/kern_thread.c.diff?r1=1.263r2=1.264

diff -u -r1.255.2.1 kern_thread.c
--- src/sys/kern/kern_thread.c  14 Dec 2007 13:41:09 -  1.255.2.1
+++ src/sys/kern/kern_thread.c  8 Jan 2008 23:47:00 -
@@ -40,6 +40,7 @@
 #include sys/sysctl.h
 #include sys/sched.h
 #include sys/sleepqueue.h
+#include sys/selinfo.h
 #include sys/turnstile.h
 #include sys/ktr.h
 #include sys/umtx.h
@@ -207,6 +208,7 @@
turnstile_free(td-td_turnstile);
sleepq_free(td-td_sleepqueue);
umtx_thread_fini(td);
+   seltdfini(td);
vm_thread_dispose(td);
 }
 
http://cvsweb.FreeBSD.org/src/sys/kern/sys_generic.c.diff?r1=1.159r2=1.160
--- src/sys/kern/sys_generic.c  2007/11/14 06:21:23 1.159
+++ src/sys/kern/sys_generic.c  2007/12/16 06:21:19 1.160
@@ -69,17 +69,59 @@ __FBSDID($FreeBSD: /usr/local/www/cvsro
 #include sys/ktrace.h
 #endif
 
+#include sys/ktr.h
+
 static MALLOC_DEFINE(M_IOCTLOPS, ioctlops, ioctl data buffer);
 static MALLOC_DEFINE(M_SELECT, select, select() buffer);
 MALLOC_DEFINE(M_IOV, 

RE: RELENG_7: zfs mirror causes ata timeout

2008-01-08 Thread Stephen M. Rumble

Quoting Daniel Eriksson [EMAIL PROTECTED]:


Stephen M. Rumble wrote:


The only interesting bit of evidence I could find is that when these
errors do occur, smartctl reports an increase in the
Start_Stop_Count
field on ad6. ad4, which appears to work fine, doesn't demonstrate
this and has a much lower value.


This looks a lot like the drive momentarily shutting down due to a power
outage/dip, only to immediately start again.


Well, there's usually a sort of click, perhaps as though the drive is  
parking itself, near when the errors occur. I guess this is it  
resetting.



Are you sure the power supplies you've tested are good and powerful
enough to power your box?


I've tried three supplies. One old, two new. The current one is 300  
watts, the largest was 400 watts and the system uses about 40 idle, 60  
loaded (it's a mobile cpu/chipset). I doubt supplied power is the issue.



Have you tried replacing the SATA power cables (as well as the actual
data cables)? Are you using the SATA power connectors that shipped with
the PSU or a Y-cable with a molex plug? Molex-molex connections are
notoriously unreliable (either the plug breaks allowing one of the
connecting cables to halfway slip out, or the connection is simply not
electrically sound due to bad tolerances).


I've used both molex-SATA adapters for the old power supply, as well  
as SATA connectors for the new one. The issues are always the same, it  
seems.


I'm starting to lose track of everything I've tried. Just to be sure,  
I'll swap power connections between the drives and see what happens.


Thanks for the input,
Steve

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Performance!

2008-01-08 Thread Kris Kennaway

Krassimir Slavchev wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello Kris,

Here is the lock profiling results, see the attachment.

Please, let me know if you want ssh access to this machine?


Thanks, this is very interesting.  The problem is already fixed in 8.0 
but we were not seeing it being this much of a factor on our test 
hardware.  Possibly it is due to your CPUs being moderately faster and 
causing a timing difference.  Anyway, try this patch.  If there is still 
a performance deficit then repeating the lock profiling will be useful.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: RELENG_7: zfs mirror causes ata timeout

2008-01-08 Thread Daniel Eriksson
Stephen M. Rumble wrote:

 The only interesting bit of evidence I could find is that when these  
 errors do occur, smartctl reports an increase in the 
 Start_Stop_Count  
 field on ad6. ad4, which appears to work fine, doesn't demonstrate  
 this and has a much lower value.

This looks a lot like the drive momentarily shutting down due to a power
outage/dip, only to immediately start again.

Are you sure the power supplies you've tested are good and powerful
enough to power your box?

Have you tried replacing the SATA power cables (as well as the actual
data cables)? Are you using the SATA power connectors that shipped with
the PSU or a Y-cable with a molex plug? Molex-molex connections are
notoriously unreliable (either the plug breaks allowing one of the
connecting cables to halfway slip out, or the connection is simply not
electrically sound due to bad tolerances).

/Daniel Eriksson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: What current Dell Systems are supported/work

2008-01-08 Thread Tom Judge

Richard Bates wrote:

Sorry for the repost...
I don't think the first one posted..

posted to freebsd.stable, freebsd-current, Freebsd-hardware

I checked the hardware in the online documentation manual/hardware

It only lists the bits and peices of the machine say the hard drive 
controller and so forth. but doesn't give you a particular system to 
look at as a working machine with FreeBSD 6.2


does anybody know if a Dell PowerEdge 1950
• Quad-Core Intel Xeon Processors 5400 series 3.16GHz
• 4GB Ram



We have ~20 PE [12]950 systems here all running 6/2 with a back ported 
bce driver from RELENG_6.


Tom
I am looking to attach 2 machines to a SAN to make a constantly up 
system. Is there a Dell San and San Switch that will work with this 
version of BSD?


Thank you for your help

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG7 using lpt causes panic

2008-01-08 Thread John Baldwin
On Tuesday 08 January 2008 10:32:56 am John Baldwin wrote:
 On Tuesday 08 January 2008 10:08:05 am Scott Long wrote:
  John Baldwin wrote:
   On Monday 07 January 2008 10:08:57 pm Adrian Wontroba wrote:
   I've recently switched some of my home systems to RELENG7.
  
   All seemed fairly well until I tried printing a CUPS test page on my
   backup and print server to an elderly Laserjet IIIp, where I seem to
   have a reproducible panic. It has happened twice.  This is painful, as
   I have a big home fileystem (striped over two mirrors over most of two
   500 GB disks). The gmirror syncronisation and background fsck leave the
   system close to unusable for hours while they fight over the disks.
  
   I was somewhat startled that something so basic as printing causes a
   panic. There have been no hardware changes since I last printed under
   RELENG6, but I don't print often, so hardware decay is a possibility.
  
   Is this a known problem? If not, I'll take the time to try various tests
   (with /home unmounted) and raise a PR.
  
   I envisage tests such as:
   * Does switching to a kernel without SMP and apic make a difference?
   * Does direct output cause a crash?
   * Does polling make a difference?
   * Does the parallel port mode (I think extended at present) make a
 difference?
  
   Some detail below.
   
   This is a known issue and it has to do with some changes in the interrupt
   code in 7.x that interact badly with the lpt(4) driver (which tears down 
   its
   interrupt handler and sets it back up again for each character, and the
   panic you see is because an interrupt came when it wasn't expecting it).
   The lpt(4) driver does this weird dance to allow coexisting with vpo so 
   you
   can have lpd running and unplug your printer and plug up a Zip drive w/o
   having to stop lpd.  I think the way I want to fix it is to change the lpt
   driver to not release the bus (and thus remove its interrupt handler) for
   every char but to keep the bus while /dev/lpt0 is open (which would be all
   the time with lpd running).
   
  
  My guess is that LPT ZIP drives all died a clicking death many, many 
  years ago.  I usually don't advocate for the removal of hardware 
  support, but these drives were so pitiful and so poorly engineered that
  I honestly doubt there is any value in keeping the driver around.
 
 It's not a matter of removing vpo(4) so much as reworking the various ppbus
 drivers to not be so fancy with trying to drop and release the ppbus but
 only do that in open/close.

As mentioned on current@, there is a sort of quick-hack patch available for
testing at http://www.freebsd.org/~jhb/patches/ppbus_intr.patch and you can
see the post to current@ for more details.  I chose a quick-hack approach
for now as it is less risky than more proper fixes for ppbus/lpt/etc.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: RELENG_7: zfs mirror causes ata timeout

2008-01-08 Thread Stephen M. Rumble

Quoting Jeremy Chadwick [EMAIL PROTECTED]:


On Tue, Jan 08, 2008 at 05:28:46PM -0500, Stephen M. Rumble wrote:

I'm having a bit of trouble with a new machine running the latest RELENG_7
code. I have two 500GB WD Caviar GP disks on a mini-itx GM965-based board
(MSI fuzzy) running amd64 with 4GB of ram. The disks are:


Could be related to a PR that I submit long ago, but was not specific to
ZFS -- instead, it appeared to be specific to the motherboard I was
using.  There's also some tidbits posted by others which appeared to
help them, although performance was impacted:

http://www.freebsd.org/cgi/query-pr.cgi?pr=103435

Another related PR, which seems to indicate motherboard problems:

http://www.freebsd.org/cgi/query-pr.cgi?pr=93885


Thanks. I'm not sure they apply, but I'll keep them in mind. The Intel  
chipsets seem to be rather bug-free; at least, I didn't see any  
mention of quirks or workarounds when glancing over code. The problems  
I'm seeing also seem to occur during low utilisation, not high  
(remarkably, keeping the system active seems to postpone issues!). I'm  
not sure PCI bus issues would be a likely culprit and I don't see any  
obviously relevant BIOS settings.



ad4: 476940MB WDC WD5000AACS-00ZUB0 01.01B01 at ata2-master SATA150
ad6: 476940MB WDC WD5000AACS-00ZUB0 01.01B01 at ata3-master SATA150

I've tried different power supplies and cables. I've enabled and disabled
spread spectrum clocking and tried both SATA300 and SATA150 rates. I've
also tried switching drives between ports so that what was ad4 is ad6 and
what was ad6 is ad4. The problems persist, but seem to follow the same
drive (ad6 originally, then ad4 when swapped). This seems to indicate a
drive problem, but it works great on its own, even when exercising both
disks simultaneously. SMART reports no problems and ZFS reports no issues
when ad6 is used on its own outside of a zfs mirror. It seems like it's the
drive, but it works fine when not in a mirror. I'm stumped. Any ideas?


Have you tried running long SMART tests (smartctl -t long) on both of
these drives, ditto with an offline test (smartctl -t offline)?
Statistics that are labelled Offline as their type won't get updated
until an offline test is performed.  It's possible those statistics may
provide some answers, but no guarantees.


Nope, but I'm going to do that right now!


The only interesting bit of evidence I could find is that when these errors
do occur, smartctl reports an increase in the Start_Stop_Count field on
ad6. ad4, which appears to work fine, doesn't demonstrate this and has a
much lower value.


Start_Stop_Count indicates the drive is actually stopping then spinning
back up (usually caused by a reset of some kind; equivalent of powering
down then back up but without the loss of power).  It's possible that
your drive has actual problems -- this is supported by the fact that the
problem follows the disk (when moving the disk to another SATA port).


I'm leaning ever closer to blaming the disk. I still can't explain why  
I couldn't make it misbehave with it on its own zfs pool and UFS  
filesystems. However, shortly after setting the dubious disk offline  
using zpool, I poked at it with 'atacontrol cap' and managed to wedge  
it. Upon issuing the command it sounded like it was spinning up (it  
should never spin down, although these GP drives are supposed to lower  
their RPM while idle) and atacontrol hung. I couldn't kill it and top  
listed the state as 'ata re'. The rest of the system was responsive,  
but the machine wouldn't shutdown properly, presumably on account of  
that stuck channel.



Tracking down the source of this problem usually requires a lot of time,
money, and trial-and-error techniques.  This is what I'd go with:

1) See if there's a BIOS update.  I know at least in the case of Intel
manufactured boards BIOS updates have solved weird problems like this in
the past.


None. BIOS version 1.0 doesn't leave me convinced it's bug-free, though ;)


2) Try an Advanced RMA with Western Digital (which guarantees you get a
brand new drive rather than chancing that they repair the one you send
them) and see if a new drive helps.


I'll definitely look into that.


3) Try replacing the motherboard with a different brand (non-MSI).  I
have nothing against MSI, but switching vendors usually means that you
ensure a cross-model h/w bug (e.g. something vendor does in the BIOS or
engineering which is suspect).  Try Asus or Gigabyte.  Obviously this
will cost money to do and will very likely set you out the cost of the
motherboard you have currently, but it's a viable option since you've
already tried replacing SATA cables.


I suppose I could always stick the disks in another box, boot it up,  
and see what happens. Actually, I may just do that next.



I'm not sure why ZFS would cause something like this to happen vs. UFS.
I happen to run ZFS at home (same machine as what's mentioned in PR
103435, with the replaced motherboard of course) 

Re: Performance!

2008-01-08 Thread Krassimir Slavchev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I can't see the patch?

Best Regards

Kris Kennaway wrote:
 Krassimir Slavchev wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hello Kris,

 Here is the lock profiling results, see the attachment.

 Please, let me know if you want ssh access to this machine?
 
 Thanks, this is very interesting.  The problem is already fixed in 8.0
 but we were not seeing it being this much of a factor on our test
 hardware.  Possibly it is due to your CPUs being moderately faster and
 causing a timing difference.  Anyway, try this patch.  If there is still
 a performance deficit then repeating the lock profiling will be useful.
 
 Kris
 
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHhHaAxJBWvpalMpkRAuZ0AKCCmeeAMv5VadvnSYmND+8MwOx0IACghPHm
JSurVau6L0resFWqNu15iKo=
=NTTA
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]