date:20080125

snd_maestro regression

2008-01-25 Thread Dominic Fandrey

I have stumbled upon a small regression in a RELENG_7 build from yesterday.

http://www.freebsd.org/cgi/query-pr.cgi?pr=119973

The PR is not yet showing up.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: T7200 CPU not detected by est

2008-01-25 Thread Krassimir Slavchev

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

John Baldwin wrote:
 On Wednesday 23 January 2008 02:42:52 am Krassimir Slavchev wrote:
 John Baldwin wrote:
 On Monday 21 January 2008 11:16:06 am Gerrit Kühn wrote:
 Hi folks,

 I have several systems using T7200 mobile CPUs running under 7-stable.
 However, EST does not recognize the cpus. When loading cpufreq I get:
 You can try this patch.  It won't add support for all of the levels, but it
 will support the current level and the highest level (IIRC).


 It works now on my T7700:

 dev.est.0.%desc: Enhanced SpeedStep Frequency Control
 dev.est.0.%driver: est
 dev.est.0.%parent: cpu0
 dev.est.0.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000
 1200/16000
 dev.est.1.%desc: Enhanced SpeedStep Frequency Control
 dev.est.1.%driver: est
 dev.est.1.%parent: cpu1
 dev.est.1.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000
 1200/16000
 
 Odd, it shouldn't have provided that many settings.  It also doesn't
 provide power info.  I wonder if you are getting the settings from
 ACPI.
 

That is the output of 'sysctl -a | grep dev.est' and I don't have any
additional settings.
May be something is wrong with the ACPI on this Acer notebook. There
were errors in the DSDT table but after fixing them the output is the
same. Actually I have problems with the bge card, it does not work with
acpi enabled because it can't map memory...
Let me know if you want any additional information?


Best Regards
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHmggIxJBWvpalMpkRAo8UAJ9uHbVnntYrxJS3NiDwb20xKlisVgCfY2qI
4VWpa8JKbckAXNMyupOGM4U=
=2opq
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: T7200 CPU not detected by est

2008-01-25 Thread Ian Smith

On Fri, 25 Jan 2008, John Baldwin wrote:
  On Wednesday 23 January 2008 02:42:52 am Krassimir Slavchev wrote:
   John Baldwin wrote:
On Monday 21 January 2008 11:16:06 am Gerrit Kühn wrote:
Hi folks,
   
I have several systems using T7200 mobile CPUs running under 7-stable.
However, EST does not recognize the cpus. When loading cpufreq I get:
   
You can try this patch.  It won't add support for all of the levels, but 
it
will support the current level and the highest level (IIRC).
   
   
   
   It works now on my T7700:
   
   dev.est.0.%desc: Enhanced SpeedStep Frequency Control
   dev.est.0.%driver: est
   dev.est.0.%parent: cpu0
   dev.est.0.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000
   1200/16000
   dev.est.1.%desc: Enhanced SpeedStep Frequency Control
   dev.est.1.%driver: est
   dev.est.1.%parent: cpu1
   dev.est.1.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000
   1200/16000
  
  Odd, it shouldn't have provided that many settings.  It also doesn't
  provide power info.  I wonder if you are getting the settings from
  ACPI.

Assuming so, wouldn't this seem to be an instance needing the recent:

 kern/114722: [acpi] [patch] Nearly duplicate p-state entries reported 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=114722

?

cheers, Ian

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Julian H. Stacey

Jeremy Chadwick wrote:
  wondering if this is a known issue.  Note that smartctl does not report
  errors logged and gives a PASSED to the drive.  I am running at
  UDMA100 ATA.  Also, if it matters, I am using ZFS.

 Can you please provide output of the following:
 
 * smartctl -a /dev/ad0

From ports/sysutils/smartmontools I presume ?
( Asking as I also have a DMA prob. to solve, at present
needing hw.ata.ata_dma=0 in /boot/loader.conf to boot,
( interuptions on sound on 7-stable, though no ZFS here)).
smartctl:
Not installed by /usr/src-7
No /usr/ports/*/smartctl
Clues found with locate for ports:
  sysutils/munin-node/files/patch-hddtemp_smartctl.in
  sysutils/sensors-applet/files/smartctl-helper.c
  sysutils/sensors-applet/files/smartctl-sensors-interface.c
  sysutils/sensors-applet/files/smartctl-sensors-interface.h

  sysutils/munin-main   # Not really ?
  ports/sysutils/sensors-applet - ports/sysutils/smartmontools

-- 
Julian Stacey. Munich Computer Consultant, BSD Unix C Linux. http://berklix.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Peter Jeremy

On Fri, Jan 25, 2008 at 12:46:08PM -0800, Chuck Swiger wrote:
On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote:
 ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
 WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   114   071   006Pre-fail  Always
-   82422948
[ ... ]
 
  7 Seek_Error_Rate 0x000f   084   060   030Pre-fail  Always
-   286126605
[ ... ]
 195 Hardware_ECC_Recovered  0x001a   063   046   000Old_age   Always   
 -   166181300

These numbers are quite worrysome-- they should be zero or nearly so in a 
healthy drive.

I see similarly wierd values from a basically new drive.  I'm not sure
that there's a requirement that the raw values start from 0 and increment
on each detected event.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpgupRZgQQEC.pgp
Description: PGP signature

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

On Fri, Jan 25, 2008 at 06:17:24PM -0700, Joe Peterson wrote:
 Glad you got it back!  Yes, when I was first playing with ZFS, I noticed
 that booting between single and multi user mode could make the pools
 invisible.  Import seemed to bring them back...

I did go into single-user mode and attempt to do ZFS-related commands,
which might explain the no datasets available once I was back in
multiuser!  I would classify that as a bug, and one which is going to
cause all sorts of hair-pulling for administrators in the future.  I
wonder what it's caused by.

The import technique I found on a forum somewhere, or possibly on a
Solaris mailing list.  I was really sweating there for a moment...

 So, is the disk toast, or can you still read anything from it (part
 table, etc.)?

The ad6 disk (/backups) fsck'd cleanly without any missing files or
anomalies.

The ZFS pool that has two striped disks (ad8 and ad10) is fully intact
too, with no loss of data that I can see.  I'll have to run a scrub
after I'm done copying data over to ad6, just to make sure though.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Ian Smith

On Fri, 25 Jan 2008, Jeremy Chadwick wrote:
  On Fri, Jan 25, 2008 at 06:03:33PM -0700, Joe Peterson wrote:
   Wow, pretty crazy!  Hmm, and yes, those LBAs do look close together.
   Well, let me know how the smartctl output looks.  I'd be curious if your
   bad sector count rises.
  
  Absolutely nada on the SMART statistics.  Nothing incremented or changed
  in any way.  My short and long tests did not change any of the data in
  the fields either.  Full output is below my .sig.
[..]
  It is interesting to note that we both have Seagate disks...  :-) I'll
  have to run SeaTools on my disk to see if anything comes back, or run a
  selective LBA test in smartctl (since the drive supports it).
[..]
  smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce 
  Allen

  === START OF INFORMATION SECTION ===
  Model Family: Seagate Barracuda 7200.10 family
  Device Model: ST3500630AS
  Serial Number:9QG1YWNL
  Firmware Version: 3.AAE

Same firmware as Joe's, too, though his ad1 was a bit later (3.AAG or H?)

  ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f   114   094   006Pre-fail  Always 
-   131599973
3 Spin_Up_Time0x0003   094   094   000Pre-fail  Always 
-   0
4 Start_Stop_Count0x0032   100   100   020Old_age   Always 
-   6
5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  Always 
-   0
7 Seek_Error_Rate 0x000f   082   060   030Pre-fail  Always 
-   200325271
9 Power_On_Hours  0x0032   097   097   000Old_age   Always 
-   2970
   10 Spin_Retry_Count0x0013   100   100   097Pre-fail  Always 
-   0
   12 Power_Cycle_Count   0x0032   100   100   020Old_age   Always 
-   9
  187 Unknown_Attribute   0x0032   100   100   000Old_age   Always 
-   0
  189 Unknown_Attribute   0x003a   100   100   000Old_age   Always 
-   0
  190 Temperature_Celsius 0x0022   063   050   045Old_age   Always 
-   773849125
  194 Temperature_Celsius 0x0022   037   050   000Old_age   Always 
-   37 (Lifetime Min/Max 0/29)

I noticed Joe's Temp readings look similarly borked too - attribute 190
is likely something else, despite same flag value as 194, which then
shows clearly wrong values for min/max, though raw temp is reasonable: 

   190 Temperature_Celsius 0x0022   065   056   045Old_age   Always 
  -   605749283
   194 Temperature_Celsius 0x0022   035   044   000Old_age   Always 
  -   35 (Lifetime Min/Max 0/15)

.. which only goes to show, as I've seen with other attributes on other
drive brands, that smartctl's database isn't necessarily reliable over
all versions / revisions of a given drive.  Add salt to taste ..

Cheers, Ian

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Well-supported SAS RAID card for 6.3?

2008-01-25 Thread Josh Endries


Hello,

I'm buying a new server and will put 6.3 on it and would like to use SAS; 
normally I use 3ware SATA. I've been reading a lot including man pages but can't 
seem to find definitive information on SAS cards that are well-supported and 
work well. I've found reports that cards listed in man pages don't seem to work, 
and confusion about chips/cards. Does anyone have experience with recent SAS 
cards or machines with integrated chips?


Thanks!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Andrew MacIntyre


Jeremy Chadwick wrote:

* Getting a larger power supply (usually when lots of disk are involved)

I only have two drives, so I think the PS has enough capacity in my case.


Agreed; even a 350W PSU should handle 2 disks without a problem.


I've seen power supplies with a sagging 12V rail cause these sorts of
problems.

--
-
Andrew I MacIntyre These thoughts are mine alone...
E-mail: [EMAIL PROTECTED]  (pref) | Snail: PO Box 370
   [EMAIL PROTECTED] (alt) |Belconnen ACT 2616
Web:http://www.andymac.org/   |Australia
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

On Fri, Jan 25, 2008 at 05:00:54PM -0800, Jeremy Chadwick wrote:
 icarus# zfs list
 no datasets available
 
 This doesn't bode well, and doesn't make me happy.  At all.

Pshew!  I was able to get ZFS to start seeing the pool again by doing
the following:  (Supposedly zpool import by itself will show you a
list of pools which it manages to see...)

icarus# zpool import -f storage
icarus# df -k /storage
Filesystem  1024-blocks  Used Avail Capacity  Mounted on
storage   957873024 106124032 85174899211%/storage
icarus# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
storage   101G   812G   101G  /storage
icarus# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
storage ONLINE   0 0 0
  ad8   ONLINE   0 0 0
  ad10  ONLINE   0 0 0

errors: No known data errors

Back to the drawing board.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

Joe, I wanted to send you a note about something that I'm still in the
process of dealing with.  The timing couldn't be more ironic.

I decided it would be worthwhile to migrate from my two-disk ZFS stripe
with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3
disks combined (since they're all the same size).  I had another
terminal with gstat -I500ms running in it, so I could see overall I/O.

All was going well until about the 81GB mark of the copy.  gstat started
showing 0KB in/out on all the drives, and the rsync was stalled.  ^Z did
nothing, which is usually a bad sign.  :-)  I ssh'd in and did a dmesg
(summarised):

ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing 
request directly
ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request 
directly
ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327
ad6: FAILURE - WRITE_DMA timed out LBA=13951071
ad6: FAILURE - WRITE_DMA timed out LBA=13951327
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839
ad6: FAILURE - WRITE_DMA timed out LBA=13951583
ad6: FAILURE - WRITE_DMA timed out LBA=13951839
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351
g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5
g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5
g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5
g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5
g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5

It appears my /dev/ad6 (a Seagate -- more irony) must have some bad
blocks.  Actually, after letting things go for a while, I realised the
box just locked up.  Probably kernel panic'd due to the I/O problem.
I'll have to poke at SMART stats later to see what showed up.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

On Fri, Jan 25, 2008 at 04:38:46PM -0800, Jeremy Chadwick wrote:
 I'll have to poke at SMART stats later to see what showed up.

So the box did indeed panic.  The backtrace contained about 1.5 screens
of function calls from the stack, which makes taking a photo of the
screen a bit worthless.  All the functions shown were predominantly I/O
related, and a disk locked up (or something), this didn't surprise me.

SMART stats showed absolutely nothing wrong with ad6, or any of the
other drives on the system.

Worse: my ZFS pool appears *completely* gone -- that's about 170GB of
data.  I don't even know how that happened, because there were
absolutely no issues reported on either of the disks on the ZFS pool.
It's like the situation somehow caused ZFS to go crazy and lose all of
it's metadata.

icarus# zfs list
no datasets available

This doesn't bode well, and doesn't make me happy.  At all.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson

Glad you got it back!  Yes, when I was first playing with ZFS, I noticed
that booting between single and multi user mode could make the pools
invisible.  Import seemed to bring them back...

So, is the disk toast, or can you still read anything from it (part
table, etc.)?

-Joe


Jeremy Chadwick wrote:
 On Fri, Jan 25, 2008 at 05:00:54PM -0800, Jeremy Chadwick wrote:
 icarus# zfs list
 no datasets available

 This doesn't bode well, and doesn't make me happy.  At all.
 
 Pshew!  I was able to get ZFS to start seeing the pool again by doing
 the following:  (Supposedly zpool import by itself will show you a
 list of pools which it manages to see...)
 
 icarus# zpool import -f storage
 icarus# df -k /storage
 Filesystem  1024-blocks  Used Avail Capacity  Mounted on
 storage   957873024 106124032 85174899211%/storage
 icarus# zfs list
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 storage   101G   812G   101G  /storage
 icarus# zpool status
   pool: storage
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 storage ONLINE   0 0 0
   ad8   ONLINE   0 0 0
   ad10  ONLINE   0 0 0
 
 errors: No known data errors
 
 Back to the drawing board.
 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson

Jeremy Chadwick wrote:
 Joe, I wanted to send you a note about something that I'm still in the
 process of dealing with.  The timing couldn't be more ironic.
 
 I decided it would be worthwhile to migrate from my two-disk ZFS stripe
 with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3
 disks combined (since they're all the same size).  I had another
 terminal with gstat -I500ms running in it, so I could see overall I/O.
 
 All was going well until about the 81GB mark of the copy.  gstat started
 showing 0KB in/out on all the drives, and the rsync was stalled.  ^Z did
 nothing, which is usually a bad sign.  :-)  I ssh'd in and did a dmesg
 (summarised):
 
 ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing 
 request directly
 ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing 
 request directly
 ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071
 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327
 ad6: FAILURE - WRITE_DMA timed out LBA=13951071
 ad6: FAILURE - WRITE_DMA timed out LBA=13951327
 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583
 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839
 ad6: FAILURE - WRITE_DMA timed out LBA=13951583
 ad6: FAILURE - WRITE_DMA timed out LBA=13951839
 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095
 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351
 g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5
 g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5
 g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5
 g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5
 g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5
 
 It appears my /dev/ad6 (a Seagate -- more irony) must have some bad
 blocks.  Actually, after letting things go for a while, I realised the
 box just locked up.  Probably kernel panic'd due to the I/O problem.
 I'll have to poke at SMART stats later to see what showed up.

Wow, pretty crazy!  Hmm, and yes, those LBAs do look close together.
Well, let me know how the smartctl output looks.  I'd be curious if your
bad sector count rises.  I had noticed that 1

BTW, I tried:

crater# dd if=/dev/ad1s4 of=/dev/null bs=64k
^C1408596+0 records in
1408596+0 records out
92313747456 bytes transferred in 1415.324362 secs (65224446 bytes/sec)

(I let it go for 92GB or so) - no messages about ad1.  So I wonder if
this points at either the cable connector on ad0 or the drive itself.  I
guess I'd rather have a failing drive than motherboard...

I originally was wondering if somehow something peculiar about ZFS's
disk access pattern was making it happen...

THanks for the recomendations.  I'll keep an eye on it, and I'll let you
know what a cable change does for me.  Still, I have not had any ad0
messages since this morning (I haven't been using the system today much,
but maybe the cron processes are more likely to trigger it...

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.3-RELEASE can not mount root on Cyrix 5530 ATA33 controller

2008-01-25 Thread Søren Schmidt


On 25Jan, 2008, at 15:05 , John Baldwin wrote:


On Wednesday 23 January 2008 03:52:39 pm Søren Schmidt wrote:

On 23Jan, 2008, at 21:09 , Xin LI wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yoshihiko Sarumaru wrote:

Hello,
I updated my Geode GX1 PC from RELENG_6_2 to RELENG_6_3 and found
root mount failed after reboot.

This problem was caused by a change to ata-pci.c to pick up wider  
old

ata controller as ata-pci devices at ata_legacy() function, and
roll backing
that file resolved this problem for me.


Which revision?


Actually, its the fix to pci/pci.c that hasn't been backported to 6.x
yet...


Rev 1.343?  It should apply to 6.x cleanly.  Patch below:


Yep, that one exactly.

-Søren



Index: pci.c
===
RCS file: /host/cvs/usr/cvs/src/sys/dev/pci/pci.c,v
retrieving revision 1.292.2.23
diff -u -r1.292.2.23 pci.c
--- pci.c   10 Jan 2008 21:17:12 -  1.292.2.23
+++ pci.c   25 Jan 2008 14:05:20 -
@@ -1898,7 +1898,9 @@
/* ATA devices needs special map treatment */
if ((pci_get_class(dev) == PCIC_STORAGE) 
(pci_get_subclass(dev) == PCIS_STORAGE_IDE) 
-   (pci_get_progif(dev)  PCIP_STORAGE_IDE_MASTERDEV))
+   ((pci_get_progif(dev)  PCIP_STORAGE_IDE_MASTERDEV) ||
+(!pci_read_config(dev, PCIR_BAR(0), 4) 
+ !pci_read_config(dev, PCIR_BAR(2), 4))) )
pci_ata_maps(pcib, bus, dev, b, s, f, rl, force, prefetchmask);
else
for (i = 0; i  cfg-nummaps;)


--
John Baldwin




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1

2008-01-25 Thread Joe Peterson

Sam Leffler wrote:
 Sigh, you are correct.  I backrev'd the machine where I ran schedgraph 
 to RELENG_7 and didn't notice the old version mis-parses the ktr file.  
 The graph is totally different w/ schedgraph from HEAD.
 
 Sorry Joe for misleading you.

No problem, Sam, but the question I have for you now is: do you see
anything with the updated schedgraph that indicates any freezes that
look funny?  The length of the ones I saw with mouse movement were
mostly some portion of a second, from maybe 1/8 to 1/2 sec.  And there
should be a lot of them in quick succession.

Thanks, Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Chuck Swiger


On Jan 25, 2008, at 1:05 PM, Thomas Hurst wrote:
These numbers are quite worrysome-- they should be zero or nearly  
so in a

healthy drive.


No, these are perfectly reasonable for a Seagate.  I have about 12
7200.X's and all show the same sort of behavior.  If they're nearly  
zero

it's probably a sign your manufacturer isn't actually counting them
(marketroids hate accurate SMART readings).

Try graphing them as counters; with an idle disk you'll see periodic
sawtooth patterns as the heads crawl from one side of the disk to the
other.


SMART attributes which end with _Ct or _Count are supposed to  
increment with every event; things which end with _Rate (ie,  
Raw_Read_Error_Rate, Seek_Error_Rate) are supposed to indicate the  
frequency of such errors over time.  It would be reasonable for  
Hardware_ECC_Recovered to keep the incremental count, but not the  
other two.


I agree that minor periodic errors happen over time and are not a  
great concern, but a happy drive will show zero reallocated sectors,  
or perhaps a few over the span of a year or two, and will have a ECC  
recovered or UDMA_CRC count which is much smaller than was reported by  
Joe.


YMMV, of course...

--
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Highpoint drivers on 7.0

2008-01-25 Thread Eirik Øverby


Hi all,

did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7-RC1 or  
later? I'm considering upgrading one of my servers here, but I need to  
know if my RAID-controller will work after reinstall..


A shame HPT doesn't release the driver to the community...

Thanks,
/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Highpoint drivers on 7.0

2008-01-25 Thread Alfred Perlstein

* Eirik ?verby [EMAIL PROTECTED] [080125 12:53] wrote:
 Hi all,
 
 did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7-RC1 or  
 later? I'm considering upgrading one of my servers here, but I need to  
 know if my RAID-controller will work after reinstall..
 
 A shame HPT doesn't release the driver to the community...

I would try the following:

1. get 7.x source tree.
2. make buildworld  make buildkernel
3. mv /boot/kernel /boot/kernel6
4. make installkernel
5. reboot, boot with -s
6. mount your filesystems, do some io testing.

If everything looks well, you can then make installworld and hopefully
things proceed safely.

-Alfred
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Chuck Swiger


On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE   
UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x000f   114   071   006Pre-fail   
Always   -   82422948

[ ... ]


 7 Seek_Error_Rate 0x000f   084   060   030Pre-fail   
Always   -   286126605

[ ... ]
195 Hardware_ECC_Recovered  0x001a   063   046   000Old_age
Always   -   166181300


These numbers are quite worrysome-- they should be zero or nearly so  
in a healthy drive.


--
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD 7.0-PRE + amd64 + Areca controller = probe12 warning

2008-01-25 Thread Mike Tancsa


At 09:00 AM 1/25/2008, Steven Hartland wrote:

When booting 7.0-PRERELEASE amd64 on our machines with areca controllers
we get the following odd message which doesn't appear on i386 is this
something to worry about or harmless?

(probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step



I get the same thing and I am running the latest BIOS from Areca. 
Doesnt seem to impact my limited testing so far.


arcmsr0: Areca SATA Host Adapter RAID Controller
 mem 0xe860-0xe8600fff,0xe800-0xe83f irq 18 at device 
14.0 on pci2

ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07
ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17
arcmsr0: [ITHREAD]

Timecounters tick every 1.000 msec
Waiting 5 seconds for SCSI devices to settle
ad5: 238475MB Seagate ST3250310AS 3.AAC at ata2-slave SATA150
(probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step
da0 at arcmsr0 bus 0 target 0 lun 0
da0: Areca ARC-1210-VOL#00 R001 Fixed Direct Access SCSI-5 device
da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit)
da0: 305175MB (624999424 512 byte sectors: 255H 63S/T 38904C)
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/ad5s1a

0[dbtest]% uname -a
FreeBSD dbtest.sentex.ca 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #4: 
Thu Jan 17 08:27:50 EST 
2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/db  amd64

0[dbtest]%



Full dmesg attached.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. 
and the person or entity to whom it is addressed. In the event of 
misdirection, the recipient is prohibited from using, copying, 
printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission 
please telephone +44 845 868 1337

or return the E.mail to [EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: snd_emu10k1.ko after 6.2 to 6.3 upgrade

2008-01-25 Thread John Baldwin

On Friday 25 January 2008 11:16:19 am Petr Holub wrote:
  Do you have an error message in the dmesg after this?
 
 Yes, I do - sorry, haven't thought it will end up in dmesg
 and not in the terminal. It says:
 
 KLD snd_emu10k1.ko: depends on midi - not available

Did you kldload sound.ko before snd_emu10k1.ko?  It maybe that freebsd-upgrade 
didn't run kldxref on your kernel dir to update the /boot/kernel/linker.hints 
file that is used to autoload dependencies.

You can try doing a 'kldxref /boot/kernel' to see if that fixes the dependency 
loading.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ipfwpcap in 6.3 ?

2008-01-25 Thread Bruce A. Mah


If memory serves me right, Kurt Jaeger wrote:


In

http://www.freebsd.org/releases/6.3R/relnotes-i386.html

ipfwpcap(8) is mentioned, but I can't find it after the upgrade ?


Argh.  My bad.  It got merged to RELENG_6 *just* after RELENG_6_3 was 
branched, by about a day or so.  Somehow I must have gotten confused and 
thought that it happened pre-branch (and thus had gotten included), thus 
it ended up in the release notes for 6.3 when it shouldn't have.  :-( 
I'll make a note in the post-release errata for this.


Very sorry for the confusion.  ipfwpcap(8) will appear in 6.4-RELEASE or 
in any 6-STABLE snapshot made after about 25 November 2007.


Bruce.





signature.asc
Description: OpenPGP digital signature

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

On Fri, Jan 25, 2008 at 06:42:04PM +0100, Julian H. Stacey wrote:
 Jeremy Chadwick wrote:
   wondering if this is a known issue.  Note that smartctl does not report
   errors logged and gives a PASSED to the drive.  I am running at
   UDMA100 ATA.  Also, if it matters, I am using ZFS.
 
  Can you please provide output of the following:
  
  * smartctl -a /dev/ad0
 
 From ports/sysutils/smartmontools I presume ?
   ( Asking as I also have a DMA prob. to solve, at present
   needing hw.ata.ata_dma=0 in /boot/loader.conf to boot,
   ( interuptions on sound on 7-stable, though no ZFS here)).

Yep!  smartctl comes with ports/sysutils/smartmontools.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

On Fri, Jan 25, 2008 at 08:58:41AM -0700, Joe Peterson wrote:
 I've seen mention of this kind of issue before, but I never saw a
 solution, except that someone reported that a certain version of 6.x
 seemed to make it go away - accounts of this problem are a bit vague.  I
 am running 7.0-RC1, and I am seeing the errors periodically, and I am
 wondering if this is a known issue.  Note that smartctl does not report
 errors logged and gives a PASSED to the drive.  I am running at
 UDMA100 ATA.  Also, if it matters, I am using ZFS.

What you've shown is usually the sign of a disk-related problem.  It's
very obvious when it's just one disk reporting DMA errors.  You use ZFS,
so chances are you have more than one disk in a pool/volume -- there's
no indication ad1, ad4, ad6, etc. are failing, so this seems to indicate
something specific to ad0.

Manufacturers pick very passive (non-aggressive) thresholds for error
conditions on disks, so disks which are failing very commonly show
PASSED during SMART analysis.  To make matters worse, most users I
know read SMART stats incorrectly (they're easy to misinterpret).

Can you please provide output of the following:

* smartctl -a /dev/ad0
* atacontrol cap ad0
* atacontrol info ata0, ata1, etc. -- any controller used by ZFS
* Relevant dmesg output that indicates what kind of ATA controller
  these disks are attached to.  Start with output from 'ad0:' and
  work backwards.  For example, ad0 on this machine is using an Intel
  ICH6 controller:
  atapci0: Intel ICH6 SATA150 controller port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.2 on pci0
  ata0: ATA channel 0 on atapci0
  ad0: 238475MB WDC WD2500KS-00MJB0 02.01C03 at ata0-master SATA150

Other stuff:

SMART stats which are labelled Offline are only updated when a short
or long offline test is performed.  Have you tried using smartctl -t
short /dev/ad0 and smartctl -t long /dev/ad0 to see if any of the raw
values on the far right column increment?

Have you tried using zpool scrub on the ZFS pool, then zpool status
to see if READ/WRITE/CHKSUM counters increment or if the scrub line
states there were errors?

Other things which have fixed problems in the past for others:

* BIOS updates
* Change of motherboards (sometimes replacing board with same model,
  other times going with a completely different vendor (implies weird
  implementation issues or BIOS problems))
* Changing SATA cables
* Getting a larger power supply (usually when lots of disk are involved)

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

sysinstall: weird ui problem

2008-01-25 Thread Andriy Gapon


FreeBSD 6.3-RELEASE amd64
Running sysinstall for post-installation configuration of sorts in
xterm/konsole/gnome-terminal.

Very strange issue: arrow keys work quite well throughout sysinstall
menus but in Fdisk and Label screens both up and down arrow keys are
interpreted as down key. This is not fatal in Label screen because
navigation cycles, but in Fdisk menu you can not reach up from the
bottom entry (slice).

In system console everything is OK, though.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: panic in vm_page_splay

2008-01-25 Thread John Baldwin

On Thursday 24 January 2008 09:31:24 am Mikhail T. wrote:
 Hello!
 
 The machine is running 6.3-PRERELEASE as of Dec 30th. It just paniced in
 the middle of web-session as I was browsing for a file to upload via a
 web-form... The firefox in use is native (amd64), not a Linux-binary.
 
 The firefox process had over 550Mb of memory to its name -- it was
 running for many days. The box has 2Gb of RAM and was performing fine
 despite 4 SETI-processes in the background.
 
 Please, advise. Thanks!

Is this the same box that you got the bad PTE panics on?  If so, have you run
memtest or the like to rule out bad RAM?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

On Fri, Jan 25, 2008 at 06:03:33PM -0700, Joe Peterson wrote:
 Wow, pretty crazy!  Hmm, and yes, those LBAs do look close together.
 Well, let me know how the smartctl output looks.  I'd be curious if your
 bad sector count rises.

Absolutely nada on the SMART statistics.  Nothing incremented or changed
in any way.  My short and long tests did not change any of the data in
the fields either.  Full output is below my .sig.

 BTW, I tried:
 
 crater# dd if=/dev/ad1s4 of=/dev/null bs=64k
 ^C1408596+0 records in
 1408596+0 records out
 92313747456 bytes transferred in 1415.324362 secs (65224446 bytes/sec)
 
 (I let it go for 92GB or so) - no messages about ad1.  So I wonder if
 this points at either the cable connector on ad0 or the drive itself.  I
 guess I'd rather have a failing drive than motherboard...
 
 I originally was wondering if somehow something peculiar about ZFS's
 disk access pattern was making it happen...

Since I'm used to dealing with disk issues (at work and personally), I'm
left wondering if this is some strange ATA subsystem quirk, or
ultimately something with ZFS (your something peculiar about ZFS's disk
access pattern claim is starting to look more plausible).

This may sound suicidal, but I'm hoping to recreate the scenario
somehow, and then punt the details to Soren or Xin Li for further
investigation -- if it looks like an ATA subsystem thing, that is.

It is interesting to note that we both have Seagate disks...  :-) I'll
have to run SeaTools on my disk to see if anything comes back, or run a
selective LBA test in smartctl (since the drive supports it).

I've restarted my rsync since, and it's happily chomping away without an
issue.  If my problem was TRULY a bad block or something causing
mechanical lock-up on the disk, I'd have expected my latest rsync to
induce it.

There's always the chance of some bizarre drive firmware bug too.

 THanks for the recomendations.  I'll keep an eye on it, and I'll let you
 know what a cable change does for me.  Still, I have not had any ad0
 messages since this morning (I haven't been using the system today much,
 but maybe the cron processes are more likely to trigger it...

Understood.  In my case, I *know* the cables are fine, because the box
itself I just built and migrated to a few days ago (change of
motherboard, chassis, and addition of SATA hot-swap backplane).  We use
the same motherboard (Supermicro PDSMI+) in all of our production
servers in our datacenter, and they're rock-solid.

I've done hot-swapping without any issue on those systems too, and I've
never seen any SATA system issues -- one of the systems is our
datacenter backup server, which holds nightly backups for all the other
boxes (about 6).  Due to the heavy disk I/O that occurs for hours at a
time, if this was some weird system quirk, motherboard problem, or SATA
bus/cable issue, we would've seen it by now.  FWIW: all our systems,
including the backup box, use UFS2 exclusively -- no ZFS in the picture.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |


icarus# smartctl -a /dev/ad6
smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3500630AS
Serial Number:9QG1YWNL
Firmware Version: 3.AAE
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Fri Jan 25 17:10:31 2008 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.

RE: snd_emu10k1.ko after 6.2 to 6.3 upgrade

2008-01-25 Thread Petr Holub

 Did you kldload sound.ko before snd_emu10k1.ko?  It maybe that
freebsd-upgrade
 didn't run kldxref on your kernel dir to update the
/boot/kernel/linker.hints
 file that is used to autoload dependencies.

Yes I did, sound.ko is loaded. Actually, I can use sound.ko from the
freebsd-update'd tree together with snd_emu10k1.ko built from source
and it works without any apparent problem.

 You can try doing a 'kldxref /boot/kernel' to see if that fixes the
dependency
 loading.

That hasn't helped:
# kldload snd_emu10k1
kldload: can't load snd_emu10k1: No such file or directory

dmesg says again:
KLD snd_emu10k1.ko: depends on midi - not available


Petr Holub
CESNET z.s.p.o.   Supercomputing Center Brno
Zikova 4 Institute of Compt. Science
162 00 Praha 6, CZMasaryk University
Czech Republic Botanicka 68a, 60200 Brno, CZ 
e-mail: [EMAIL PROTECTED]   phone: +420-549493944
 fax: +420-541212747
   e-mail: [EMAIL PROTECTED]


 -Original Message-
 From: John Baldwin [mailto:[EMAIL PROTECTED]
 Sent: Friday, January 25, 2008 9:10 PM
 To: Petr Holub
 Cc: freebsd-stable@freebsd.org
 Subject: Re: snd_emu10k1.ko after 6.2 to 6.3 upgrade
 
 On Friday 25 January 2008 11:16:19 am Petr Holub wrote:
   Do you have an error message in the dmesg after this?
 
  Yes, I do - sorry, haven't thought it will end up in dmesg
  and not in the terminal. It says:
 
  KLD snd_emu10k1.ko: depends on midi - not available
 
 
 --
 John Baldwin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

On Fri, Jan 25, 2008 at 12:24:20PM -0700, Joe Peterson wrote:
 In my case, I am using only one disk (ad0) for FreeBSD, and I am only
 using one partition on this disk in my ZFS pool.  So, in this case,
 unfortunately, it's not possible to tell from the fact that only ad0 is
 listed that it is specific to this drive.

Ah ha.  Well, in your below example, you may only be using one drive for
FreeBSD (ad0), but you do have a 2nd drive (ad1) which is installed.
I would try doing some I/O on /dev/ad1 to see if you can get the
timeouts to occur on that drive as well.  You don't have to do anything
risky with ad1 either: dd if=/dev/ad1 of=/dev/null bs=64k would probably
suffice.

 Yep, I am also always skeptical of smart reports.  That's one reason I
 am very interested in ZFS.  I don't trust the drive to be completely
 reliable, and the fact that ZFS does end-to-end data integrity is very
 intriguing.

I agree entirely -- and I also use ZFS myself (across two drives in a
RAID0-like fashion, with a completely separate drive which is used for
nightly backups of the ZFS pool).  I'm absolutely thrilled with it;
finally something clean, reliable, and simple -- something I've always
wanted in a LVM or LVM-like implementation.

  * smartctl -a /dev/ad0
 
 OK, I've attached this to the end of this email.

 atapci0: Intel ICH4 UDMA100 controller port
 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0
 ata0: ATA channel 0 on atapci0
 ata0: [ITHREAD]
 ad0: 476940MB Seagate ST3500630A 3.AAE at ata0-master UDMA100

The smartctl output for /dev/ad0 looks good, minus the one uncorrected
sector.  I'm ignoring that since it's proof that the drive knew of it
and remapped it.  If that number starts incrementing over time, though,
replace the drive ASAP, of course.

The atacontrol cap output looks fine too; nothing wonky, and the LBA
capabilities look fine.

The controller is nothing out-of-the-ordinary; it's reliable under
FreeBSD (I've had many a motherboard which used it).  Of course I
haven't used an ICH4 since FreeBSD 3.x, and the ATA layer has changed
substantially, numerous times.

 {regarding -t short and -t long}
 Also, none of the numbers that were zero incremented, esp:
 
 198 Offline_Uncorrectable   0x0010   100   100   000Old_age
 Offline  -   0
 
 Also, no more errors were reported in the system log during the self-tests.

Seem to indicate that the drive considers itself healthy.

Another test I could recommend at this point would be one that would
require a few hours of downtime: download Seagate's SeaTools (will
require a CD burner or floppies) and consider doing both quick and
long scans.  Quick checks some of the stuff we've looked at here,
but it also looks at some vendor-specific stuff within the drive.
Long will scan every block on the disk for errors (and will not
destroy data).

 OK, I started a scrub, and it will take some more time to complete...
 But I get the following with status.  Could this be due to the timeouts
 and failures?  I suspect so, so maybe this is not surprizing.

It depends on whether or not you saw more timeouts and cache errors spit
out by the kernel while zpool scrub ran.  If so, then yes, I would
definitely say they're related.

 I'd also guess that this doesn't necessarily point to the drive, but
 anything in the chain of events...  I do not have a mirror or RADI-Z,
 so I guess the reason there was no data loss (yet) is because the
 checksum passed, and maybe it just had to retry...?

I'm still new to ZFS myself, so I don't have an answer for you.  Your
conclusion is the same thing I'd conclude, though.

 I've been using this same motherboard/BIOS for a long time (as well as
 this drive), so no changes have happened to the HW recently.  The BIOS
 is the newest, available, I believe (It's a Tyan Trinity S2099, so it's
 a few years old)

I'd say the BIOS is probably not responsible at this point; I'd expect
other weird things to be going on with the system if the BIOS was broken
in some way (or possibly bit rot in the flash).

It's going to be difficult to determine if maybe something on the
mainboard has decided to start failing (some transistor within the ICH4,
etc...) though.  :-(

 I'm using regular ATA 80-pin cables.  Also, these seem to have been
 working fine for quite a while now.  But, yes, I have also witnessed bad
 cable issues on older systems in the past.  I certainly could try a new
 cable and see if it helps.

I'd try that for sure.  It's just one more thing to rule out.

  * Getting a larger power supply (usually when lots of disk are involved)
 
 I only have two drives, so I think the PS has enough capacity in my case.

Agreed; even a 350W PSU should handle 2 disks without a problem.

Here's something to ponder:

The LBAs being reported as having errors are scattered all over.  They
aren't lumped together (usually the sign of part of a platter going
bad); instead, they're all over the drive.

This would indicate either cable problems,

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Thomas Hurst

* Chuck Swiger ([EMAIL PROTECTED]) wrote:

 On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote:
 ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
 WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   114   071   006Pre-fail  Always 
   -   82422948
 [ ... ]
 
  7 Seek_Error_Rate 0x000f   084   060   030Pre-fail  Always 
   -   286126605
 [ ... ]
 195 Hardware_ECC_Recovered  0x001a   063   046   000Old_age   Always
-   166181300
 
 These numbers are quite worrysome-- they should be zero or nearly so in a 
 healthy drive.

No, these are perfectly reasonable for a Seagate.  I have about 12
7200.X's and all show the same sort of behavior.  If they're nearly zero
it's probably a sign your manufacturer isn't actually counting them
(marketroids hate accurate SMART readings).

Try graphing them as counters; with an idle disk you'll see periodic
sawtooth patterns as the heads crawl from one side of the disk to the
other.

-- 
Thomas 'Freaky' Hurst
http://hur.st/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Jeremy Chadwick

On Fri, Jan 25, 2008 at 12:46:08PM -0800, Chuck Swiger wrote:
 On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote:
 ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
 WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   114   071   006Pre-fail  Always
-   82422948
 [ ... ]

  7 Seek_Error_Rate 0x000f   084   060   030Pre-fail  Always
-   286126605
 [ ... ]
 195 Hardware_ECC_Recovered  0x001a   063   046   000Old_age   Always   
 -   166181300

 These numbers are quite worrysome-- they should be zero or nearly so in a 
 healthy drive.

On some drives, yes, but not all drives.  His is a Seagate drive --
Seagate uses some of the bits in the raw data section for some sort of
internal use by the drive firmware.  So as they may appear very high in
value, the drive appears to function normally, and the actual adjusted
SMART value (the field under VALUE) doesn't fluxuate.

I have Seagate drives all over the place which exhibit identical stats
to the above.  I've included some for comparison below; each listed is
on a different system.  Look at attribute 190 (Temperature Celcius) for
an example; I don't think any drive can reach 773849124C, for example.
Or, well, I sure hope not.  :-)  I believe in the case of attrib. 190,
that's why they present a human-readable value in attribute 194.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

==SNIP==

ad6: 476940MB Seagate ST3500630AS 3.AAE at ata3-master SATA300

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   112   094   006Pre-fail  Always   
-   221374987
  3 Spin_Up_Time0x0003   094   094   000Pre-fail  Always   
-   0
  4 Start_Stop_Count0x0032   100   100   020Old_age   Always   
-   6
  5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000f   082   060   030Pre-fail  Always   
-   29014
  9 Power_On_Hours  0x0032   097   097   000Old_age   Always   
-   2967
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   020Old_age   Always   
-   9
187 Unknown_Attribute   0x0032   100   100   000Old_age   Always   
-   0
189 Unknown_Attribute   0x003a   100   100   000Old_age   Always   
-   0
190 Temperature_Celsius 0x0022   064   050   045Old_age   Always   
-   773849124
194 Temperature_Celsius 0x0022   036   050   000Old_age   Always   
-   36 (Lifetime Min/Max 0/29)
195 Hardware_ECC_Recovered  0x001a   066   059   000Old_age   Always   
-   36458075
197 Current_Pending_Sector  0x0012   100   100   000Old_age   Always   
-   18
198 Offline_Uncorrectable   0x0010   100   100   000Old_age   Offline  
-   18
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   Always   
-   0
200 Multi_Zone_Error_Rate   0x   100   253   000Old_age   Offline  
-   0
202 TA_Increase_Count   0x0032   100   253   000Old_age   Always   
-   0


ad4: 114473MB Seagate ST3120827AS 3.42 at ata2-master SATA150

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   063   052   006Pre-fail  Always   
-   57703728
  3 Spin_Up_Time0x0003   096   096   000Pre-fail  Always   
-   0
  4 Start_Stop_Count0x0032   100   100   020Old_age   Always   
-   24
  5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000f   082   060   030Pre-fail  Always   
-   169005025
  9 Power_On_Hours  0x0032   096   096   000Old_age   Always   
-   3536
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   020Old_age   Always   
-   24
194 Temperature_Celsius 0x0022   027   040   000Old_age   Always   
-   27 (Lifetime Min/Max 0/15)
195 Hardware_ECC_Recovered  0x001a   063   052   000Old_age   Always   
-   57703728
197 Current_Pending_Sector  0x0012   100   100   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0010   100   100

Re: panic: vm_fault: fault on nofualt entry, addr: 81423000

2008-01-25 Thread Pete French

 Hmm, so that's fine.  What pointer is returned by madt_map_table?

0x800e7610

I also put some prints in afterwards to try and see how far through
the loop it was getting:

count = (xsdt-Header.Length - sizeof(ACPI_TABLE_HEADER)) /
sizeof(UINT64);
printf(DEBUG: count is %d\n, count);
for (i = 0; i  count; i++) {
printf(DEBUG: probing %d - offset %p\n,
i, xsdt-TableOffsetEntry[i]);
if (madt_probe_table(xsdt-TableOffsetEntry[i]))
break;
}

The output is interesting - I get count printed as 6, but then nothing
else, just the panic. Which leads me to believe that it is the access
to xsdt-TableOffsetEntry[0] which is causing the panic.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson

Jeremy Chadwick wrote:
 What you've shown is usually the sign of a disk-related problem.  It's
 very obvious when it's just one disk reporting DMA errors.  You use ZFS,
 so chances are you have more than one disk in a pool/volume -- there's
 no indication ad1, ad4, ad6, etc. are failing, so this seems to indicate
 something specific to ad0.

Jeremy, thanks for the response - I have tried to answer all of your
questions below...

In my case, I am using only one disk (ad0) for FreeBSD, and I am only
using one partition on this disk in my ZFS pool.  So, in this case,
unfortunately, it's not possible to tell from the fact that only ad0 is
listed that it is specific to this drive.

 Manufacturers pick very passive (non-aggressive) thresholds for error
 conditions on disks, so disks which are failing very commonly show
 PASSED during SMART analysis.  To make matters worse, most users I
 know read SMART stats incorrectly (they're easy to misinterpret).

Yep, I am also always skeptical of smart reports.  That's one reason I
am very interested in ZFS.  I don't trust the drive to be completely
reliable, and the fact that ZFS does end-to-end data integrity is very
intriguing.

 Can you please provide output of the following:
 
 * smartctl -a /dev/ad0

OK, I've attached this to the end of this email.

 * atacontrol cap ad0

Protocol  ATA/ATAPI revision 7
device model  ST3500630A
serial number 9QG0DG03
firmware revision 3.AAE
cylinders 16383
heads 16
sectors/track 63
lba supported 268435455 sectors
lba48 supported   976773168 sectors
dma supported
overlap not supported

Feature  Support  EnableValue   Vendor
write cacheyes  yes
read ahead yes  yes
Tagged Command Queuing (TCQ)   no   no  0/0x00
SMART  yes  yes
microcode download yes  yes
security   yes  no
power management   yes  yes
advanced power management  no   no  65278/0xFEFE
automatic acoustic management  no   no  0/0x00  208/0xD0

 * atacontrol info ata0, ata1, etc. -- any controller used by ZFS

Master:  ad0 ST3500630A/3.AAE ATA/ATAPI revision 7
Slave:   ad1 ST3160812A/3.AAH ATA/ATAPI revision 7

(but note that ad1 is not used by FreeBSD)

 * Relevant dmesg output that indicates what kind of ATA controller
   these disks are attached to.  Start with output from 'ad0:' and
   work backwards.  For example, ad0 on this machine is using an Intel
   ICH6 controller:
   atapci0: Intel ICH6 SATA150 controller port 
 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.2 on pci0
   ata0: ATA channel 0 on atapci0
   ad0: 238475MB WDC WD2500KS-00MJB0 02.01C03 at ata0-master SATA150

atapci0: Intel ICH4 UDMA100 controller port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0

ata0: ATA channel 0 on atapci0

ata0: [ITHREAD]
ad0: 476940MB Seagate ST3500630A 3.AAE at ata0-master UDMA100

 SMART stats which are labelled Offline are only updated when a short
 or long offline test is performed.  Have you tried using smartctl -t
 short /dev/ad0 and smartctl -t long /dev/ad0 to see if any of the raw
 values on the far right column increment?

I just tried one:

# 1  Short offline   Completed without error   00%  5252
 -
# 2  Short offline   Completed without error   00%  5252
 -

Also, none of the numbers that were zero incremented, esp:

198 Offline_Uncorrectable   0x0010   100   100   000Old_age
Offline  -   0

Also, no more errors were reported in the system log during the self-tests.

 Have you tried using zpool scrub on the ZFS pool, then zpool status
 to see if READ/WRITE/CHKSUM counters increment or if the scrub line
 states there were errors?

OK, I started a scrub, and it will take some more time to complete...
But I get the following with status.  Could this be due to the timeouts
and failures?  I suspect so, so maybe this is not surprizing.  I'd also
guess that this doesn't necessarily point to the drive, but anything in
the chain of events...  I do not have a mirror or RADI-Z, so I guess the
reason there was no data loss (yet) is because the checksum passed,
and maybe it just had to retry...?  Anyway, here's the output so far:

  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress, 2.50% done, 1h58m to go
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   1 3 0
  ad0s1dONLINE   1 3 0

errors: No known data errors

 Other

FreeBSD 7.0-PRE + amd64 + Areca controller = probe12 warning

2008-01-25 Thread Steven Hartland


When booting 7.0-PRERELEASE amd64 on our machines with areca controllers
we get the following odd message which doesn't appear on i386 is this
something to worry about or harmless?

(probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step

Full dmesg attached.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]

dmesg.boot
Description: Binary data
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: panic: vm_fault: fault on nofualt entry, addr: 81423000

2008-01-25 Thread John Baldwin

On Friday 25 January 2008 07:55:46 am Pete French wrote:
  Hmm, so that's fine.  What pointer is returned by madt_map_table?
 
 0x800e7610

That isn't page-aligned which is unexpected, though it should still
work fine.

 I also put some prints in afterwards to try and see how far through
 the loop it was getting:
 
   count = (xsdt-Header.Length - sizeof(ACPI_TABLE_HEADER)) /
 sizeof(UINT64);
 printf(DEBUG: count is %d\n, count);
 for (i = 0; i  count; i++) {
 printf(DEBUG: probing %d - offset %p\n,
 i, xsdt-TableOffsetEntry[i]);
 if (madt_probe_table(xsdt-TableOffsetEntry[i]))
 break;
 }
 
 The output is interesting - I get count printed as 6, but then nothing
 else, just the panic. Which leads me to believe that it is the access
 to xsdt-TableOffsetEntry[0] which is causing the panic.

Hmm, that is odd.  The header.Length and the the actual table should all be
in the same page, so you shouldn't be getting a page fault.  Can you add
some printfs to madt_map() to see what the final starting (pa, length) are
before the call to pmap_kenter_temporary() and then add a printf for each
iteration of the while loop showing the (pa, la, remaining length)?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: NO_ knobs in /etc/make.conf

2008-01-25 Thread John Baldwin

On Wednesday 23 January 2008 02:37:24 pm Doug Barton wrote:
 Vivek Khera wrote:
 
  I guess I wasn't clear about my confusion. What was broken about putting 
  all this in make.conf that necessitated a src.conf file too?
 
 One could argue that they didn't need to be moved at all. One of the 
 rationales at the time was that we didn't want the knobs for the base to 
 affect the ports.

Correct, and /etc/src.conf is optional.  It is a good place to put settings
that you want to only affect compiles in /usr/src and not affect building
apps from ports, standalone compiles, etc.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: T7200 CPU not detected by est

2008-01-25 Thread John Baldwin

On Wednesday 23 January 2008 02:42:52 am Krassimir Slavchev wrote:
 John Baldwin wrote:
  On Monday 21 January 2008 11:16:06 am Gerrit Kühn wrote:
  Hi folks,
 
  I have several systems using T7200 mobile CPUs running under 7-stable.
  However, EST does not recognize the cpus. When loading cpufreq I get:
 
  You can try this patch.  It won't add support for all of the levels, but it
  will support the current level and the highest level (IIRC).
 
 
 
 It works now on my T7700:
 
 dev.est.0.%desc: Enhanced SpeedStep Frequency Control
 dev.est.0.%driver: est
 dev.est.0.%parent: cpu0
 dev.est.0.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000
 1200/16000
 dev.est.1.%desc: Enhanced SpeedStep Frequency Control
 dev.est.1.%driver: est
 dev.est.1.%parent: cpu1
 dev.est.1.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000
 1200/16000

Odd, it shouldn't have provided that many settings.  It also doesn't
provide power info.  I wonder if you are getting the settings from
ACPI.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.3-RELEASE can not mount root on Cyrix 5530 ATA33 controller

2008-01-25 Thread John Baldwin

On Wednesday 23 January 2008 03:52:39 pm Søren Schmidt wrote:
 On 23Jan, 2008, at 21:09 , Xin LI wrote:
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Yoshihiko Sarumaru wrote:
  Hello,
  I updated my Geode GX1 PC from RELENG_6_2 to RELENG_6_3 and found
  root mount failed after reboot.
 
  This problem was caused by a change to ata-pci.c to pick up wider old
  ata controller as ata-pci devices at ata_legacy() function, and  
  roll backing
  that file resolved this problem for me.
 
  Which revision?
 
 Actually, its the fix to pci/pci.c that hasn't been backported to 6.x  
 yet...

Rev 1.343?  It should apply to 6.x cleanly.  Patch below:

Index: pci.c
===
RCS file: /host/cvs/usr/cvs/src/sys/dev/pci/pci.c,v
retrieving revision 1.292.2.23
diff -u -r1.292.2.23 pci.c
--- pci.c   10 Jan 2008 21:17:12 -  1.292.2.23
+++ pci.c   25 Jan 2008 14:05:20 -
@@ -1898,7 +1898,9 @@
/* ATA devices needs special map treatment */
if ((pci_get_class(dev) == PCIC_STORAGE) 
(pci_get_subclass(dev) == PCIS_STORAGE_IDE) 
-   (pci_get_progif(dev)  PCIP_STORAGE_IDE_MASTERDEV))
+   ((pci_get_progif(dev)  PCIP_STORAGE_IDE_MASTERDEV) ||
+(!pci_read_config(dev, PCIR_BAR(0), 4) 
+ !pci_read_config(dev, PCIR_BAR(2), 4))) )
pci_ata_maps(pcib, bus, dev, b, s, f, rl, force, prefetchmask);
else
for (i = 0; i  cfg-nummaps;)


-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1

2008-01-25 Thread John Baldwin

On Thursday 24 January 2008 06:22:57 am Sam Leffler wrote:
 Joe Peterson wrote:
  In an attempt to track down this mouse freezing/stuttering (i.e. jerky
  mouse movement) behavior in FreeBSD 7.0-RC1, I have come up with a
  reliable way to cause it to happen, and I have created a longer trace
  showing the results.  Note that I am using the ULE scheduler.
 
  In general, it becomes easier to see the effect if there is CPU
  activity.  I have noticed it during kernel compiles, while at the same
  time loading web pages in firefox that contain images (and moving the
  mouse while this is happening).  But a more controlled way to see it is
  to run something that uses some CPU and then generating lots of X events.
 
  In my case, I start xtrs (TRS-80 emulator) in Model IV mode, which
  happens to poll for input, using the CPU.  Then I move the mouse back
  and forth quickly between windows in focus under mouse mode (in my
  case, a KDE focus mode), which causes many focus events quickly.  In
  about 15 or 20 seconds, the mouse reliably starts to show erratic
  movement, not moving smoothly.
 
  I really hope this can shed more light on what might be going on.  Here
  is the trace:
 
  http://www.skyrush.com/downloads/ktr_ule_4.out
 

 
 This is an interesting trace.  It appears that something is blocking 
 threads in the runq from running for 2 seconds!  I don't see what it is 
 from the trace data.  It sort of looks like the last thing that ran is 
 the swi4 which is likely a callout (need to check the log file contents 
 to be certain).  If the callback function does something it wouldn't 
 necessarily be visible in the schedgraph plot.  If you could stick a 
 dmesg from booting out in the same spot it might be worthwhile.  Also if 
 you rebuild the kernel the kernel with DIAGNOSTIC then softclock() will 
 complain about callouts that take longer than 2ms to run.  This might 
 generate too much noise in which case you can adjust the threshold by 
 editing the code in sys/kern/kern_timeout.c.

Hmm, when I look at that graph using schedgraphy from HEAD it just looks
like xtrs is using up all the CPU.  I didn't see the 2 second window where
nothing was running.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: snd_emu10k1.ko after 6.2 to 6.3 upgrade

2008-01-25 Thread John Baldwin

On Wednesday 23 January 2008 03:08:41 pm Petr Holub wrote:
 Hi,
 
 I've found a problem after updating from 6.2-RELEASE to 6.3-RELEASE using
 freebsd-update as described on daemonology blog. It looks like if the
 snd_emu10k1.ko along with a few others was not appropriately updated:
 # ls -l snd_*
 -r-xr-xr-x  1 root  wheel  16566 Feb 20  2007 snd_ad1816.ko
 -r-xr-xr-x  1 root  wheel  17731 Feb 20  2007 snd_als4000.ko
 -r-xr-xr-x  1 root  wheel  20004 Jan 21 15:42 snd_atiixp.ko
 -r-xr-xr-x  1 root  wheel  19192 Feb 20  2007 snd_cmi.ko
 -r-xr-xr-x  1 root  wheel  18594 Feb 20  2007 snd_cs4281.ko
 -r-xr-xr-x  1 root  wheel  30814 Feb 20  2007 snd_csa.ko
 -r-xr-xr-x  1 root  wheel  11098 Jan 21 15:42 snd_driver.ko
 -r-xr-xr-x  1 root  wheel  45839 Feb 20  2007 snd_ds1.ko
 -r-xr-xr-x  1 root  wheel  30008 Feb 20  2007 snd_emu10k1.ko
 -r-xr-xr-x  1 root  wheel  59398 Feb 20  2007 snd_emu10kx.ko
 -rwxr-xr-x  1 root  wheel  31223 Jan 21 15:42 snd_envy24.ko
 -rwxr-xr-x  1 root  wheel  30504 Jan 21 15:42 snd_envy24ht.ko
 -r-xr-xr-x  1 root  wheel  32005 Feb 20  2007 snd_es137x.ko
 -r-xr-xr-x  1 root  wheel  20075 Feb 20  2007 snd_ess.ko
 -r-xr-xr-x  1 root  wheel  15636 Feb 20  2007 snd_fm801.ko
 -rwxr-xr-x  1 root  wheel  77423 Jan 21 15:42 snd_hda.ko
 -r-xr-xr-x  1 root  wheel  23812 Jan 21 15:42 snd_ich.ko
 -r-xr-xr-x  1 root  wheel  31117 Feb 20  2007 snd_maestro.ko
 -r-xr-xr-x  1 root  wheel  42945 Jan 21 15:42 snd_maestro3.ko
 -r-xr-xr-x  1 root  wheel  46976 Feb 20  2007 snd_mss.ko
 -r-xr-xr-x  1 root  wheel  68790 Feb 20  2007 snd_neomagic.ko
 -r-xr-xr-x  1 root  wheel  14783 Feb 20  2007 snd_null.ko
 -r-xr-xr-x  1 root  wheel  16934 Feb 20  2007 snd_sb16.ko
 -r-xr-xr-x  1 root  wheel  15418 Feb 20  2007 snd_sb8.ko
 -r-xr-xr-x  1 root  wheel  15397 Feb 20  2007 snd_sbc.ko
 -r-xr-xr-x  1 root  wheel  19397 Feb 20  2007 snd_solo.ko
 -r-xr-xr-x  1 root  wheel   7240 Jan 21 15:42 snd_spicds.ko
 -r-xr-xr-x  1 root  wheel  18856 Jan 21 15:42 snd_t4dwave.ko
 -r-xr-xr-x  1 root  wheel  36300 Jan 21 15:42 snd_uaudio.ko
 -r-xr-xr-x  1 root  wheel  21918 Jan 21 15:42 snd_via8233.ko
 -r-xr-xr-x  1 root  wheel  16075 Jan 21 15:42 snd_via82c686.ko
 -r-xr-xr-x  1 root  wheel  18707 Feb 20  2007 snd_vibes.ko
 
 Those modules dated Feb 20 2007 are not loadable actually:
 # kldload snd_emu10k1.ko
 kldload: can't load snd_emu10k1.ko: No such file or directory

Do you have an error message in the dmesg after this?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: panic in vm_page_splay

2008-01-25 Thread Mikhail Teterin

  The machine is running 6.3-PRERELEASE as of Dec 30th. It just
  paniced in the middle of web-session as I was browsing for a file to
  upload via a web-form... The firefox in use is native (amd64), not a
  Linux-binary.
 
  The firefox process had over 550Mb of memory to its name -- it was
  running for many days. The box has 2Gb of RAM and was performing
  fine despite 4 SETI-processes in the background.
 
  Please, advise. Thanks!

 Is this the same box that you got the bad PTE panics on? If so, have
 you run memtest or the like to rule out bad RAM?

No. This would be my own desktop...

-mi
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: snd_emu10k1.ko after 6.2 to 6.3 upgrade

2008-01-25 Thread Petr Holub

 Do you have an error message in the dmesg after this?

Yes, I do - sorry, haven't thought it will end up in dmesg
and not in the terminal. It says:

KLD snd_emu10k1.ko: depends on midi - not available

Petr

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

upcoming bugathon this weekend

2008-01-25 Thread Mark Linimon

If you're interested in helping out on our PR database problems, please
see my posting on freebsd-bugbusters@:

  http://docs.FreeBSD.org/cgi/mid.cgi?20080125182651.GA9914

We're having a bugathon this weekend, with the agenda being mostly
to figure out where we are, who would like to help, and coming up
with ways that they can do so.

Followups to freebsd-bugbusters@, please.

mcl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson

Jeremy Chadwick wrote:
 What you've shown is usually the sign of a disk-related problem.  It's
 very obvious when it's just one disk reporting DMA errors.  You use ZFS,
 so chances are you have more than one disk in a pool/volume -- there's
 no indication ad1, ad4, ad6, etc. are failing, so this seems to indicate
 something specific to ad0.

Jeremy, thanks for the response - I have tried to answer all of your
questions below...

In my case, I am using only one disk (ad0) for FreeBSD, and I am only
using one partition on this disk in my ZFS pool.  So, in this case,
unfortunately, it's not possible to tell from the fact that only ad0 is
listed that it is specific to this drive.

 Manufacturers pick very passive (non-aggressive) thresholds for error
 conditions on disks, so disks which are failing very commonly show
 PASSED during SMART analysis.  To make matters worse, most users I
 know read SMART stats incorrectly (they're easy to misinterpret).

Yep, I am also always skeptical of smart reports.  That's one reason I
am very interested in ZFS.  I don't trust the drive to be completely
reliable, and the fact that ZFS does end-to-end data integrity is very
intriguing.

 Can you please provide output of the following:
 
 * smartctl -a /dev/ad0

OK, I've attached this to the end of this email.

 * atacontrol cap ad0

Protocol  ATA/ATAPI revision 7
device model  ST3500630A
serial number 9QG0DG03
firmware revision 3.AAE
cylinders 16383
heads 16
sectors/track 63
lba supported 268435455 sectors
lba48 supported   976773168 sectors
dma supported
overlap not supported

Feature  Support  EnableValue   Vendor
write cacheyes  yes
read ahead yes  yes
Tagged Command Queuing (TCQ)   no   no  0/0x00
SMART  yes  yes
microcode download yes  yes
security   yes  no
power management   yes  yes
advanced power management  no   no  65278/0xFEFE
automatic acoustic management  no   no  0/0x00  208/0xD0

 * atacontrol info ata0, ata1, etc. -- any controller used by ZFS

Master:  ad0 ST3500630A/3.AAE ATA/ATAPI revision 7
Slave:   ad1 ST3160812A/3.AAH ATA/ATAPI revision 7

(but note that ad1 is not used by FreeBSD)

 * Relevant dmesg output that indicates what kind of ATA controller
   these disks are attached to.  Start with output from 'ad0:' and
   work backwards.  For example, ad0 on this machine is using an Intel
   ICH6 controller:
   atapci0: Intel ICH6 SATA150 controller port 
 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.2 on pci0
   ata0: ATA channel 0 on atapci0
   ad0: 238475MB WDC WD2500KS-00MJB0 02.01C03 at ata0-master SATA150

atapci0: Intel ICH4 UDMA100 controller port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0

ata0: ATA channel 0 on atapci0

ata0: [ITHREAD]
ad0: 476940MB Seagate ST3500630A 3.AAE at ata0-master UDMA100

 SMART stats which are labelled Offline are only updated when a short
 or long offline test is performed.  Have you tried using smartctl -t
 short /dev/ad0 and smartctl -t long /dev/ad0 to see if any of the raw
 values on the far right column increment?

I just tried one:

# 1  Short offline   Completed without error   00%  5252
 -
# 2  Short offline   Completed without error   00%  5252
 -

Also, none of the numbers that were zero incremented, esp:

198 Offline_Uncorrectable   0x0010   100   100   000Old_age
Offline  -   0

Also, no more errors were reported in the system log during the self-tests.

 Have you tried using zpool scrub on the ZFS pool, then zpool status
 to see if READ/WRITE/CHKSUM counters increment or if the scrub line
 states there were errors?

OK, I started a scrub, and it will take some more time to complete...
But I get the following with status.  Could this be due to the timeouts
and failures?  I suspect so, so maybe this is not surprizing.  I'd also
guess that this doesn't necessarily point to the drive, but anything in
the chain of events...  I do not have a mirror or RADI-Z, so I guess the
reason there was no data loss (yet) is because the checksum passed,
and maybe it just had to retry...?  Anyway, here's the output so far:

  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress, 2.50% done, 1h58m to go
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   1 3 0
  ad0s1dONLINE   1 3 0

errors: No known data errors

 Other

Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1

2008-01-25 Thread Sam Leffler


John Baldwin wrote:

On Thursday 24 January 2008 06:22:57 am Sam Leffler wrote:
  

Joe Peterson wrote:


In an attempt to track down this mouse freezing/stuttering (i.e. jerky
mouse movement) behavior in FreeBSD 7.0-RC1, I have come up with a
reliable way to cause it to happen, and I have created a longer trace
showing the results.  Note that I am using the ULE scheduler.

In general, it becomes easier to see the effect if there is CPU
activity.  I have noticed it during kernel compiles, while at the same
time loading web pages in firefox that contain images (and moving the
mouse while this is happening).  But a more controlled way to see it is
to run something that uses some CPU and then generating lots of X events.

In my case, I start xtrs (TRS-80 emulator) in Model IV mode, which
happens to poll for input, using the CPU.  Then I move the mouse back
and forth quickly between windows in focus under mouse mode (in my
case, a KDE focus mode), which causes many focus events quickly.  In
about 15 or 20 seconds, the mouse reliably starts to show erratic
movement, not moving smoothly.

I really hope this can shed more light on what might be going on.  Here
is the trace:

http://www.skyrush.com/downloads/ktr_ule_4.out

  
  
This is an interesting trace.  It appears that something is blocking 
threads in the runq from running for 2 seconds!  I don't see what it is 
from the trace data.  It sort of looks like the last thing that ran is 
the swi4 which is likely a callout (need to check the log file contents 
to be certain).  If the callback function does something it wouldn't 
necessarily be visible in the schedgraph plot.  If you could stick a 
dmesg from booting out in the same spot it might be worthwhile.  Also if 
you rebuild the kernel the kernel with DIAGNOSTIC then softclock() will 
complain about callouts that take longer than 2ms to run.  This might 
generate too much noise in which case you can adjust the threshold by 
editing the code in sys/kern/kern_timeout.c.



Hmm, when I look at that graph using schedgraphy from HEAD it just looks
like xtrs is using up all the CPU.  I didn't see the 2 second window where
nothing was running.
  


Sigh, you are correct.  I backrev'd the machine where I ran schedgraph 
to RELENG_7 and didn't notice the old version mis-parses the ktr file.  
The graph is totally different w/ schedgraph from HEAD.


Sorry Joe for misleading you.

   Sam


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1

2008-01-25 Thread Joe Peterson

Chuck Swiger wrote:
 On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote:
 ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE   
 UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   114   071   006Pre-fail   
 Always   -   82422948
 [ ... ]
  7 Seek_Error_Rate 0x000f   084   060   030Pre-fail   
 Always   -   286126605
 [ ... ]
 195 Hardware_ECC_Recovered  0x001a   063   046   000Old_age
 Always   -   166181300
 
 These numbers are quite worrysome-- they should be zero or nearly so  
 in a healthy drive.

It seems to depend on the drive manufacturer.  E.g. this is a Seagate.  Every
Seagate I've ever had (or heard about on the web via smartctl dumps) reports
very large numbers for these values.  I've heard it described that Seagate
shows you the raw numbers (and correctable errors do happen all the time in
all drives).

In Western Digital drives (IIRC), the numbers shown are the ones that *should*
be zero, thereby hiding the low-level errors.

Hard to say if my numbers are too high, but these corrected error counts
are always frighteningly high in Seagates.

-Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1

2008-01-25 Thread Joe Peterson

John Baldwin wrote:
 Hmm, when I look at that graph using schedgraphy from HEAD it just looks
 like xtrs is using up all the CPU.

Yeah, xtrs is eating a lot of CPU, but I've never seen this affect the
mouse movement (making it really jerky) the same way on, e.g., Linux.
And the xtrs test is just a way to *reliably* make it happen.  It
happens intermittently all of the time (at least every few minutes, and
often in small batches) even when the system is pretty idle...

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Highpoint drivers on 7.0

2008-01-25 Thread Steven Hartland


I would advise contacting them. There support was helpful when I last contacted
them and for the card that was involved the did release the code for the driver
when enabled us to fix the issues.

   Regards
   Steve

- Original Message - 
From: Alfred Perlstein [EMAIL PROTECTED]




* Eirik ?verby [EMAIL PROTECTED] [080125 12:53] wrote:

Hi all,

did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7-RC1 or  
later? I'm considering upgrading one of my servers here, but I need to  
know if my RAID-controller will work after reinstall..


A shame HPT doesn't release the driver to the community...




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

49 matches

Mail list logo