snd_maestro regression
I have stumbled upon a small regression in a RELENG_7 build from yesterday. http://www.freebsd.org/cgi/query-pr.cgi?pr=119973 The PR is not yet showing up. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: T7200 CPU not detected by est
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John Baldwin wrote: On Wednesday 23 January 2008 02:42:52 am Krassimir Slavchev wrote: John Baldwin wrote: On Monday 21 January 2008 11:16:06 am Gerrit Kühn wrote: Hi folks, I have several systems using T7200 mobile CPUs running under 7-stable. However, EST does not recognize the cpus. When loading cpufreq I get: You can try this patch. It won't add support for all of the levels, but it will support the current level and the highest level (IIRC). It works now on my T7700: dev.est.0.%desc: Enhanced SpeedStep Frequency Control dev.est.0.%driver: est dev.est.0.%parent: cpu0 dev.est.0.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000 1200/16000 dev.est.1.%desc: Enhanced SpeedStep Frequency Control dev.est.1.%driver: est dev.est.1.%parent: cpu1 dev.est.1.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000 1200/16000 Odd, it shouldn't have provided that many settings. It also doesn't provide power info. I wonder if you are getting the settings from ACPI. That is the output of 'sysctl -a | grep dev.est' and I don't have any additional settings. May be something is wrong with the ACPI on this Acer notebook. There were errors in the DSDT table but after fixing them the output is the same. Actually I have problems with the bge card, it does not work with acpi enabled because it can't map memory... Let me know if you want any additional information? Best Regards -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHmggIxJBWvpalMpkRAo8UAJ9uHbVnntYrxJS3NiDwb20xKlisVgCfY2qI 4VWpa8JKbckAXNMyupOGM4U= =2opq -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: T7200 CPU not detected by est
On Fri, 25 Jan 2008, John Baldwin wrote: On Wednesday 23 January 2008 02:42:52 am Krassimir Slavchev wrote: John Baldwin wrote: On Monday 21 January 2008 11:16:06 am Gerrit Kühn wrote: Hi folks, I have several systems using T7200 mobile CPUs running under 7-stable. However, EST does not recognize the cpus. When loading cpufreq I get: You can try this patch. It won't add support for all of the levels, but it will support the current level and the highest level (IIRC). It works now on my T7700: dev.est.0.%desc: Enhanced SpeedStep Frequency Control dev.est.0.%driver: est dev.est.0.%parent: cpu0 dev.est.0.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000 1200/16000 dev.est.1.%desc: Enhanced SpeedStep Frequency Control dev.est.1.%driver: est dev.est.1.%parent: cpu1 dev.est.1.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000 1200/16000 Odd, it shouldn't have provided that many settings. It also doesn't provide power info. I wonder if you are getting the settings from ACPI. Assuming so, wouldn't this seem to be an instance needing the recent: kern/114722: [acpi] [patch] Nearly duplicate p-state entries reported http://www.freebsd.org/cgi/query-pr.cgi?pr=114722 ? cheers, Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Jeremy Chadwick wrote: wondering if this is a known issue. Note that smartctl does not report errors logged and gives a PASSED to the drive. I am running at UDMA100 ATA. Also, if it matters, I am using ZFS. Can you please provide output of the following: * smartctl -a /dev/ad0 From ports/sysutils/smartmontools I presume ? ( Asking as I also have a DMA prob. to solve, at present needing hw.ata.ata_dma=0 in /boot/loader.conf to boot, ( interuptions on sound on 7-stable, though no ZFS here)). smartctl: Not installed by /usr/src-7 No /usr/ports/*/smartctl Clues found with locate for ports: sysutils/munin-node/files/patch-hddtemp_smartctl.in sysutils/sensors-applet/files/smartctl-helper.c sysutils/sensors-applet/files/smartctl-sensors-interface.c sysutils/sensors-applet/files/smartctl-sensors-interface.h sysutils/munin-main # Not really ? ports/sysutils/sensors-applet - ports/sysutils/smartmontools -- Julian Stacey. Munich Computer Consultant, BSD Unix C Linux. http://berklix.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 12:46:08PM -0800, Chuck Swiger wrote: On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 071 006Pre-fail Always - 82422948 [ ... ] 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail Always - 286126605 [ ... ] 195 Hardware_ECC_Recovered 0x001a 063 046 000Old_age Always - 166181300 These numbers are quite worrysome-- they should be zero or nearly so in a healthy drive. I see similarly wierd values from a basically new drive. I'm not sure that there's a requirement that the raw values start from 0 and increment on each detected event. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpgupRZgQQEC.pgp Description: PGP signature
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 06:17:24PM -0700, Joe Peterson wrote: Glad you got it back! Yes, when I was first playing with ZFS, I noticed that booting between single and multi user mode could make the pools invisible. Import seemed to bring them back... I did go into single-user mode and attempt to do ZFS-related commands, which might explain the no datasets available once I was back in multiuser! I would classify that as a bug, and one which is going to cause all sorts of hair-pulling for administrators in the future. I wonder what it's caused by. The import technique I found on a forum somewhere, or possibly on a Solaris mailing list. I was really sweating there for a moment... So, is the disk toast, or can you still read anything from it (part table, etc.)? The ad6 disk (/backups) fsck'd cleanly without any missing files or anomalies. The ZFS pool that has two striped disks (ad8 and ad10) is fully intact too, with no loss of data that I can see. I'll have to run a scrub after I'm done copying data over to ad6, just to make sure though. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, 25 Jan 2008, Jeremy Chadwick wrote: On Fri, Jan 25, 2008 at 06:03:33PM -0700, Joe Peterson wrote: Wow, pretty crazy! Hmm, and yes, those LBAs do look close together. Well, let me know how the smartctl output looks. I'd be curious if your bad sector count rises. Absolutely nada on the SMART statistics. Nothing incremented or changed in any way. My short and long tests did not change any of the data in the fields either. Full output is below my .sig. [..] It is interesting to note that we both have Seagate disks... :-) I'll have to run SeaTools on my disk to see if anything comes back, or run a selective LBA test in smartctl (since the drive supports it). [..] smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce Allen === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630AS Serial Number:9QG1YWNL Firmware Version: 3.AAE Same firmware as Joe's, too, though his ad1 was a bit later (3.AAG or H?) ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 094 006Pre-fail Always - 131599973 3 Spin_Up_Time0x0003 094 094 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 6 5 Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 082 060 030Pre-fail Always - 200325271 9 Power_On_Hours 0x0032 097 097 000Old_age Always - 2970 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020Old_age Always - 9 187 Unknown_Attribute 0x0032 100 100 000Old_age Always - 0 189 Unknown_Attribute 0x003a 100 100 000Old_age Always - 0 190 Temperature_Celsius 0x0022 063 050 045Old_age Always - 773849125 194 Temperature_Celsius 0x0022 037 050 000Old_age Always - 37 (Lifetime Min/Max 0/29) I noticed Joe's Temp readings look similarly borked too - attribute 190 is likely something else, despite same flag value as 194, which then shows clearly wrong values for min/max, though raw temp is reasonable: 190 Temperature_Celsius 0x0022 065 056 045Old_age Always - 605749283 194 Temperature_Celsius 0x0022 035 044 000Old_age Always - 35 (Lifetime Min/Max 0/15) .. which only goes to show, as I've seen with other attributes on other drive brands, that smartctl's database isn't necessarily reliable over all versions / revisions of a given drive. Add salt to taste .. Cheers, Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Well-supported SAS RAID card for 6.3?
Hello, I'm buying a new server and will put 6.3 on it and would like to use SAS; normally I use 3ware SATA. I've been reading a lot including man pages but can't seem to find definitive information on SAS cards that are well-supported and work well. I've found reports that cards listed in man pages don't seem to work, and confusion about chips/cards. Does anyone have experience with recent SAS cards or machines with integrated chips? Thanks! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Jeremy Chadwick wrote: * Getting a larger power supply (usually when lots of disk are involved) I only have two drives, so I think the PS has enough capacity in my case. Agreed; even a 350W PSU should handle 2 disks without a problem. I've seen power supplies with a sagging 12V rail cause these sorts of problems. -- - Andrew I MacIntyre These thoughts are mine alone... E-mail: [EMAIL PROTECTED] (pref) | Snail: PO Box 370 [EMAIL PROTECTED] (alt) |Belconnen ACT 2616 Web:http://www.andymac.org/ |Australia ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 05:00:54PM -0800, Jeremy Chadwick wrote: icarus# zfs list no datasets available This doesn't bode well, and doesn't make me happy. At all. Pshew! I was able to get ZFS to start seeing the pool again by doing the following: (Supposedly zpool import by itself will show you a list of pools which it manages to see...) icarus# zpool import -f storage icarus# df -k /storage Filesystem 1024-blocks Used Avail Capacity Mounted on storage 957873024 106124032 85174899211%/storage icarus# zfs list NAME USED AVAIL REFER MOUNTPOINT storage 101G 812G 101G /storage icarus# zpool status pool: storage state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 errors: No known data errors Back to the drawing board. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Joe, I wanted to send you a note about something that I'm still in the process of dealing with. The timing couldn't be more ironic. I decided it would be worthwhile to migrate from my two-disk ZFS stripe with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3 disks combined (since they're all the same size). I had another terminal with gstat -I500ms running in it, so I could see overall I/O. All was going well until about the 81GB mark of the copy. gstat started showing 0KB in/out on all the drives, and the rsync was stalled. ^Z did nothing, which is usually a bad sign. :-) I ssh'd in and did a dmesg (summarised): ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327 ad6: FAILURE - WRITE_DMA timed out LBA=13951071 ad6: FAILURE - WRITE_DMA timed out LBA=13951327 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839 ad6: FAILURE - WRITE_DMA timed out LBA=13951583 ad6: FAILURE - WRITE_DMA timed out LBA=13951839 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351 g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5 It appears my /dev/ad6 (a Seagate -- more irony) must have some bad blocks. Actually, after letting things go for a while, I realised the box just locked up. Probably kernel panic'd due to the I/O problem. I'll have to poke at SMART stats later to see what showed up. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 04:38:46PM -0800, Jeremy Chadwick wrote: I'll have to poke at SMART stats later to see what showed up. So the box did indeed panic. The backtrace contained about 1.5 screens of function calls from the stack, which makes taking a photo of the screen a bit worthless. All the functions shown were predominantly I/O related, and a disk locked up (or something), this didn't surprise me. SMART stats showed absolutely nothing wrong with ad6, or any of the other drives on the system. Worse: my ZFS pool appears *completely* gone -- that's about 170GB of data. I don't even know how that happened, because there were absolutely no issues reported on either of the disks on the ZFS pool. It's like the situation somehow caused ZFS to go crazy and lose all of it's metadata. icarus# zfs list no datasets available This doesn't bode well, and doesn't make me happy. At all. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Glad you got it back! Yes, when I was first playing with ZFS, I noticed that booting between single and multi user mode could make the pools invisible. Import seemed to bring them back... So, is the disk toast, or can you still read anything from it (part table, etc.)? -Joe Jeremy Chadwick wrote: On Fri, Jan 25, 2008 at 05:00:54PM -0800, Jeremy Chadwick wrote: icarus# zfs list no datasets available This doesn't bode well, and doesn't make me happy. At all. Pshew! I was able to get ZFS to start seeing the pool again by doing the following: (Supposedly zpool import by itself will show you a list of pools which it manages to see...) icarus# zpool import -f storage icarus# df -k /storage Filesystem 1024-blocks Used Avail Capacity Mounted on storage 957873024 106124032 85174899211%/storage icarus# zfs list NAME USED AVAIL REFER MOUNTPOINT storage 101G 812G 101G /storage icarus# zpool status pool: storage state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 errors: No known data errors Back to the drawing board. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Jeremy Chadwick wrote: Joe, I wanted to send you a note about something that I'm still in the process of dealing with. The timing couldn't be more ironic. I decided it would be worthwhile to migrate from my two-disk ZFS stripe with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3 disks combined (since they're all the same size). I had another terminal with gstat -I500ms running in it, so I could see overall I/O. All was going well until about the 81GB mark of the copy. gstat started showing 0KB in/out on all the drives, and the rsync was stalled. ^Z did nothing, which is usually a bad sign. :-) I ssh'd in and did a dmesg (summarised): ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327 ad6: FAILURE - WRITE_DMA timed out LBA=13951071 ad6: FAILURE - WRITE_DMA timed out LBA=13951327 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839 ad6: FAILURE - WRITE_DMA timed out LBA=13951583 ad6: FAILURE - WRITE_DMA timed out LBA=13951839 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351 g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5 It appears my /dev/ad6 (a Seagate -- more irony) must have some bad blocks. Actually, after letting things go for a while, I realised the box just locked up. Probably kernel panic'd due to the I/O problem. I'll have to poke at SMART stats later to see what showed up. Wow, pretty crazy! Hmm, and yes, those LBAs do look close together. Well, let me know how the smartctl output looks. I'd be curious if your bad sector count rises. I had noticed that 1 BTW, I tried: crater# dd if=/dev/ad1s4 of=/dev/null bs=64k ^C1408596+0 records in 1408596+0 records out 92313747456 bytes transferred in 1415.324362 secs (65224446 bytes/sec) (I let it go for 92GB or so) - no messages about ad1. So I wonder if this points at either the cable connector on ad0 or the drive itself. I guess I'd rather have a failing drive than motherboard... I originally was wondering if somehow something peculiar about ZFS's disk access pattern was making it happen... THanks for the recomendations. I'll keep an eye on it, and I'll let you know what a cable change does for me. Still, I have not had any ad0 messages since this morning (I haven't been using the system today much, but maybe the cron processes are more likely to trigger it... -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.3-RELEASE can not mount root on Cyrix 5530 ATA33 controller
On 25Jan, 2008, at 15:05 , John Baldwin wrote: On Wednesday 23 January 2008 03:52:39 pm Søren Schmidt wrote: On 23Jan, 2008, at 21:09 , Xin LI wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yoshihiko Sarumaru wrote: Hello, I updated my Geode GX1 PC from RELENG_6_2 to RELENG_6_3 and found root mount failed after reboot. This problem was caused by a change to ata-pci.c to pick up wider old ata controller as ata-pci devices at ata_legacy() function, and roll backing that file resolved this problem for me. Which revision? Actually, its the fix to pci/pci.c that hasn't been backported to 6.x yet... Rev 1.343? It should apply to 6.x cleanly. Patch below: Yep, that one exactly. -Søren Index: pci.c === RCS file: /host/cvs/usr/cvs/src/sys/dev/pci/pci.c,v retrieving revision 1.292.2.23 diff -u -r1.292.2.23 pci.c --- pci.c 10 Jan 2008 21:17:12 - 1.292.2.23 +++ pci.c 25 Jan 2008 14:05:20 - @@ -1898,7 +1898,9 @@ /* ATA devices needs special map treatment */ if ((pci_get_class(dev) == PCIC_STORAGE) (pci_get_subclass(dev) == PCIS_STORAGE_IDE) - (pci_get_progif(dev) PCIP_STORAGE_IDE_MASTERDEV)) + ((pci_get_progif(dev) PCIP_STORAGE_IDE_MASTERDEV) || +(!pci_read_config(dev, PCIR_BAR(0), 4) + !pci_read_config(dev, PCIR_BAR(2), 4))) ) pci_ata_maps(pcib, bus, dev, b, s, f, rl, force, prefetchmask); else for (i = 0; i cfg-nummaps;) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1
Sam Leffler wrote: Sigh, you are correct. I backrev'd the machine where I ran schedgraph to RELENG_7 and didn't notice the old version mis-parses the ktr file. The graph is totally different w/ schedgraph from HEAD. Sorry Joe for misleading you. No problem, Sam, but the question I have for you now is: do you see anything with the updated schedgraph that indicates any freezes that look funny? The length of the ones I saw with mouse movement were mostly some portion of a second, from maybe 1/8 to 1/2 sec. And there should be a lot of them in quick succession. Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Jan 25, 2008, at 1:05 PM, Thomas Hurst wrote: These numbers are quite worrysome-- they should be zero or nearly so in a healthy drive. No, these are perfectly reasonable for a Seagate. I have about 12 7200.X's and all show the same sort of behavior. If they're nearly zero it's probably a sign your manufacturer isn't actually counting them (marketroids hate accurate SMART readings). Try graphing them as counters; with an idle disk you'll see periodic sawtooth patterns as the heads crawl from one side of the disk to the other. SMART attributes which end with _Ct or _Count are supposed to increment with every event; things which end with _Rate (ie, Raw_Read_Error_Rate, Seek_Error_Rate) are supposed to indicate the frequency of such errors over time. It would be reasonable for Hardware_ECC_Recovered to keep the incremental count, but not the other two. I agree that minor periodic errors happen over time and are not a great concern, but a happy drive will show zero reallocated sectors, or perhaps a few over the span of a year or two, and will have a ECC recovered or UDMA_CRC count which is much smaller than was reported by Joe. YMMV, of course... -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Highpoint drivers on 7.0
Hi all, did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7-RC1 or later? I'm considering upgrading one of my servers here, but I need to know if my RAID-controller will work after reinstall.. A shame HPT doesn't release the driver to the community... Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Highpoint drivers on 7.0
* Eirik ?verby [EMAIL PROTECTED] [080125 12:53] wrote: Hi all, did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7-RC1 or later? I'm considering upgrading one of my servers here, but I need to know if my RAID-controller will work after reinstall.. A shame HPT doesn't release the driver to the community... I would try the following: 1. get 7.x source tree. 2. make buildworld make buildkernel 3. mv /boot/kernel /boot/kernel6 4. make installkernel 5. reboot, boot with -s 6. mount your filesystems, do some io testing. If everything looks well, you can then make installworld and hopefully things proceed safely. -Alfred ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 071 006Pre-fail Always - 82422948 [ ... ] 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail Always - 286126605 [ ... ] 195 Hardware_ECC_Recovered 0x001a 063 046 000Old_age Always - 166181300 These numbers are quite worrysome-- they should be zero or nearly so in a healthy drive. -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD 7.0-PRE + amd64 + Areca controller = probe12 warning
At 09:00 AM 1/25/2008, Steven Hartland wrote: When booting 7.0-PRERELEASE amd64 on our machines with areca controllers we get the following odd message which doesn't appear on i386 is this something to worry about or harmless? (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step I get the same thing and I am running the latest BIOS from Areca. Doesnt seem to impact my limited testing so far. arcmsr0: Areca SATA Host Adapter RAID Controller mem 0xe860-0xe8600fff,0xe800-0xe83f irq 18 at device 14.0 on pci2 ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17 arcmsr0: [ITHREAD] Timecounters tick every 1.000 msec Waiting 5 seconds for SCSI devices to settle ad5: 238475MB Seagate ST3250310AS 3.AAC at ata2-slave SATA150 (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step da0 at arcmsr0 bus 0 target 0 lun 0 da0: Areca ARC-1210-VOL#00 R001 Fixed Direct Access SCSI-5 device da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) da0: 305175MB (624999424 512 byte sectors: 255H 63S/T 38904C) SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/ad5s1a 0[dbtest]% uname -a FreeBSD dbtest.sentex.ca 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #4: Thu Jan 17 08:27:50 EST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/db amd64 0[dbtest]% Full dmesg attached. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: snd_emu10k1.ko after 6.2 to 6.3 upgrade
On Friday 25 January 2008 11:16:19 am Petr Holub wrote: Do you have an error message in the dmesg after this? Yes, I do - sorry, haven't thought it will end up in dmesg and not in the terminal. It says: KLD snd_emu10k1.ko: depends on midi - not available Did you kldload sound.ko before snd_emu10k1.ko? It maybe that freebsd-upgrade didn't run kldxref on your kernel dir to update the /boot/kernel/linker.hints file that is used to autoload dependencies. You can try doing a 'kldxref /boot/kernel' to see if that fixes the dependency loading. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ipfwpcap in 6.3 ?
If memory serves me right, Kurt Jaeger wrote: In http://www.freebsd.org/releases/6.3R/relnotes-i386.html ipfwpcap(8) is mentioned, but I can't find it after the upgrade ? Argh. My bad. It got merged to RELENG_6 *just* after RELENG_6_3 was branched, by about a day or so. Somehow I must have gotten confused and thought that it happened pre-branch (and thus had gotten included), thus it ended up in the release notes for 6.3 when it shouldn't have. :-( I'll make a note in the post-release errata for this. Very sorry for the confusion. ipfwpcap(8) will appear in 6.4-RELEASE or in any 6-STABLE snapshot made after about 25 November 2007. Bruce. signature.asc Description: OpenPGP digital signature
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 06:42:04PM +0100, Julian H. Stacey wrote: Jeremy Chadwick wrote: wondering if this is a known issue. Note that smartctl does not report errors logged and gives a PASSED to the drive. I am running at UDMA100 ATA. Also, if it matters, I am using ZFS. Can you please provide output of the following: * smartctl -a /dev/ad0 From ports/sysutils/smartmontools I presume ? ( Asking as I also have a DMA prob. to solve, at present needing hw.ata.ata_dma=0 in /boot/loader.conf to boot, ( interuptions on sound on 7-stable, though no ZFS here)). Yep! smartctl comes with ports/sysutils/smartmontools. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 08:58:41AM -0700, Joe Peterson wrote: I've seen mention of this kind of issue before, but I never saw a solution, except that someone reported that a certain version of 6.x seemed to make it go away - accounts of this problem are a bit vague. I am running 7.0-RC1, and I am seeing the errors periodically, and I am wondering if this is a known issue. Note that smartctl does not report errors logged and gives a PASSED to the drive. I am running at UDMA100 ATA. Also, if it matters, I am using ZFS. What you've shown is usually the sign of a disk-related problem. It's very obvious when it's just one disk reporting DMA errors. You use ZFS, so chances are you have more than one disk in a pool/volume -- there's no indication ad1, ad4, ad6, etc. are failing, so this seems to indicate something specific to ad0. Manufacturers pick very passive (non-aggressive) thresholds for error conditions on disks, so disks which are failing very commonly show PASSED during SMART analysis. To make matters worse, most users I know read SMART stats incorrectly (they're easy to misinterpret). Can you please provide output of the following: * smartctl -a /dev/ad0 * atacontrol cap ad0 * atacontrol info ata0, ata1, etc. -- any controller used by ZFS * Relevant dmesg output that indicates what kind of ATA controller these disks are attached to. Start with output from 'ad0:' and work backwards. For example, ad0 on this machine is using an Intel ICH6 controller: atapci0: Intel ICH6 SATA150 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.2 on pci0 ata0: ATA channel 0 on atapci0 ad0: 238475MB WDC WD2500KS-00MJB0 02.01C03 at ata0-master SATA150 Other stuff: SMART stats which are labelled Offline are only updated when a short or long offline test is performed. Have you tried using smartctl -t short /dev/ad0 and smartctl -t long /dev/ad0 to see if any of the raw values on the far right column increment? Have you tried using zpool scrub on the ZFS pool, then zpool status to see if READ/WRITE/CHKSUM counters increment or if the scrub line states there were errors? Other things which have fixed problems in the past for others: * BIOS updates * Change of motherboards (sometimes replacing board with same model, other times going with a completely different vendor (implies weird implementation issues or BIOS problems)) * Changing SATA cables * Getting a larger power supply (usually when lots of disk are involved) -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
sysinstall: weird ui problem
FreeBSD 6.3-RELEASE amd64 Running sysinstall for post-installation configuration of sorts in xterm/konsole/gnome-terminal. Very strange issue: arrow keys work quite well throughout sysinstall menus but in Fdisk and Label screens both up and down arrow keys are interpreted as down key. This is not fatal in Label screen because navigation cycles, but in Fdisk menu you can not reach up from the bottom entry (slice). In system console everything is OK, though. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in vm_page_splay
On Thursday 24 January 2008 09:31:24 am Mikhail T. wrote: Hello! The machine is running 6.3-PRERELEASE as of Dec 30th. It just paniced in the middle of web-session as I was browsing for a file to upload via a web-form... The firefox in use is native (amd64), not a Linux-binary. The firefox process had over 550Mb of memory to its name -- it was running for many days. The box has 2Gb of RAM and was performing fine despite 4 SETI-processes in the background. Please, advise. Thanks! Is this the same box that you got the bad PTE panics on? If so, have you run memtest or the like to rule out bad RAM? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 06:03:33PM -0700, Joe Peterson wrote: Wow, pretty crazy! Hmm, and yes, those LBAs do look close together. Well, let me know how the smartctl output looks. I'd be curious if your bad sector count rises. Absolutely nada on the SMART statistics. Nothing incremented or changed in any way. My short and long tests did not change any of the data in the fields either. Full output is below my .sig. BTW, I tried: crater# dd if=/dev/ad1s4 of=/dev/null bs=64k ^C1408596+0 records in 1408596+0 records out 92313747456 bytes transferred in 1415.324362 secs (65224446 bytes/sec) (I let it go for 92GB or so) - no messages about ad1. So I wonder if this points at either the cable connector on ad0 or the drive itself. I guess I'd rather have a failing drive than motherboard... I originally was wondering if somehow something peculiar about ZFS's disk access pattern was making it happen... Since I'm used to dealing with disk issues (at work and personally), I'm left wondering if this is some strange ATA subsystem quirk, or ultimately something with ZFS (your something peculiar about ZFS's disk access pattern claim is starting to look more plausible). This may sound suicidal, but I'm hoping to recreate the scenario somehow, and then punt the details to Soren or Xin Li for further investigation -- if it looks like an ATA subsystem thing, that is. It is interesting to note that we both have Seagate disks... :-) I'll have to run SeaTools on my disk to see if anything comes back, or run a selective LBA test in smartctl (since the drive supports it). I've restarted my rsync since, and it's happily chomping away without an issue. If my problem was TRULY a bad block or something causing mechanical lock-up on the disk, I'd have expected my latest rsync to induce it. There's always the chance of some bizarre drive firmware bug too. THanks for the recomendations. I'll keep an eye on it, and I'll let you know what a cable change does for me. Still, I have not had any ad0 messages since this morning (I haven't been using the system today much, but maybe the cron processes are more likely to trigger it... Understood. In my case, I *know* the cables are fine, because the box itself I just built and migrated to a few days ago (change of motherboard, chassis, and addition of SATA hot-swap backplane). We use the same motherboard (Supermicro PDSMI+) in all of our production servers in our datacenter, and they're rock-solid. I've done hot-swapping without any issue on those systems too, and I've never seen any SATA system issues -- one of the systems is our datacenter backup server, which holds nightly backups for all the other boxes (about 6). Due to the heavy disk I/O that occurs for hours at a time, if this was some weird system quirk, motherboard problem, or SATA bus/cable issue, we would've seen it by now. FWIW: all our systems, including the backup box, use UFS2 exclusively -- no ZFS in the picture. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | icarus# smartctl -a /dev/ad6 smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630AS Serial Number:9QG1YWNL Firmware Version: 3.AAE User Capacity:500,107,862,016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Fri Jan 25 17:10:31 2008 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported.
RE: snd_emu10k1.ko after 6.2 to 6.3 upgrade
Did you kldload sound.ko before snd_emu10k1.ko? It maybe that freebsd-upgrade didn't run kldxref on your kernel dir to update the /boot/kernel/linker.hints file that is used to autoload dependencies. Yes I did, sound.ko is loaded. Actually, I can use sound.ko from the freebsd-update'd tree together with snd_emu10k1.ko built from source and it works without any apparent problem. You can try doing a 'kldxref /boot/kernel' to see if that fixes the dependency loading. That hasn't helped: # kldload snd_emu10k1 kldload: can't load snd_emu10k1: No such file or directory dmesg says again: KLD snd_emu10k1.ko: depends on midi - not available Petr Holub CESNET z.s.p.o. Supercomputing Center Brno Zikova 4 Institute of Compt. Science 162 00 Praha 6, CZMasaryk University Czech Republic Botanicka 68a, 60200 Brno, CZ e-mail: [EMAIL PROTECTED] phone: +420-549493944 fax: +420-541212747 e-mail: [EMAIL PROTECTED] -Original Message- From: John Baldwin [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 9:10 PM To: Petr Holub Cc: freebsd-stable@freebsd.org Subject: Re: snd_emu10k1.ko after 6.2 to 6.3 upgrade On Friday 25 January 2008 11:16:19 am Petr Holub wrote: Do you have an error message in the dmesg after this? Yes, I do - sorry, haven't thought it will end up in dmesg and not in the terminal. It says: KLD snd_emu10k1.ko: depends on midi - not available -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 12:24:20PM -0700, Joe Peterson wrote: In my case, I am using only one disk (ad0) for FreeBSD, and I am only using one partition on this disk in my ZFS pool. So, in this case, unfortunately, it's not possible to tell from the fact that only ad0 is listed that it is specific to this drive. Ah ha. Well, in your below example, you may only be using one drive for FreeBSD (ad0), but you do have a 2nd drive (ad1) which is installed. I would try doing some I/O on /dev/ad1 to see if you can get the timeouts to occur on that drive as well. You don't have to do anything risky with ad1 either: dd if=/dev/ad1 of=/dev/null bs=64k would probably suffice. Yep, I am also always skeptical of smart reports. That's one reason I am very interested in ZFS. I don't trust the drive to be completely reliable, and the fact that ZFS does end-to-end data integrity is very intriguing. I agree entirely -- and I also use ZFS myself (across two drives in a RAID0-like fashion, with a completely separate drive which is used for nightly backups of the ZFS pool). I'm absolutely thrilled with it; finally something clean, reliable, and simple -- something I've always wanted in a LVM or LVM-like implementation. * smartctl -a /dev/ad0 OK, I've attached this to the end of this email. atapci0: Intel ICH4 UDMA100 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0 ata0: ATA channel 0 on atapci0 ata0: [ITHREAD] ad0: 476940MB Seagate ST3500630A 3.AAE at ata0-master UDMA100 The smartctl output for /dev/ad0 looks good, minus the one uncorrected sector. I'm ignoring that since it's proof that the drive knew of it and remapped it. If that number starts incrementing over time, though, replace the drive ASAP, of course. The atacontrol cap output looks fine too; nothing wonky, and the LBA capabilities look fine. The controller is nothing out-of-the-ordinary; it's reliable under FreeBSD (I've had many a motherboard which used it). Of course I haven't used an ICH4 since FreeBSD 3.x, and the ATA layer has changed substantially, numerous times. {regarding -t short and -t long} Also, none of the numbers that were zero incremented, esp: 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 0 Also, no more errors were reported in the system log during the self-tests. Seem to indicate that the drive considers itself healthy. Another test I could recommend at this point would be one that would require a few hours of downtime: download Seagate's SeaTools (will require a CD burner or floppies) and consider doing both quick and long scans. Quick checks some of the stuff we've looked at here, but it also looks at some vendor-specific stuff within the drive. Long will scan every block on the disk for errors (and will not destroy data). OK, I started a scrub, and it will take some more time to complete... But I get the following with status. Could this be due to the timeouts and failures? I suspect so, so maybe this is not surprizing. It depends on whether or not you saw more timeouts and cache errors spit out by the kernel while zpool scrub ran. If so, then yes, I would definitely say they're related. I'd also guess that this doesn't necessarily point to the drive, but anything in the chain of events... I do not have a mirror or RADI-Z, so I guess the reason there was no data loss (yet) is because the checksum passed, and maybe it just had to retry...? I'm still new to ZFS myself, so I don't have an answer for you. Your conclusion is the same thing I'd conclude, though. I've been using this same motherboard/BIOS for a long time (as well as this drive), so no changes have happened to the HW recently. The BIOS is the newest, available, I believe (It's a Tyan Trinity S2099, so it's a few years old) I'd say the BIOS is probably not responsible at this point; I'd expect other weird things to be going on with the system if the BIOS was broken in some way (or possibly bit rot in the flash). It's going to be difficult to determine if maybe something on the mainboard has decided to start failing (some transistor within the ICH4, etc...) though. :-( I'm using regular ATA 80-pin cables. Also, these seem to have been working fine for quite a while now. But, yes, I have also witnessed bad cable issues on older systems in the past. I certainly could try a new cable and see if it helps. I'd try that for sure. It's just one more thing to rule out. * Getting a larger power supply (usually when lots of disk are involved) I only have two drives, so I think the PS has enough capacity in my case. Agreed; even a 350W PSU should handle 2 disks without a problem. Here's something to ponder: The LBAs being reported as having errors are scattered all over. They aren't lumped together (usually the sign of part of a platter going bad); instead, they're all over the drive. This would indicate either cable problems,
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
* Chuck Swiger ([EMAIL PROTECTED]) wrote: On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 071 006Pre-fail Always - 82422948 [ ... ] 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail Always - 286126605 [ ... ] 195 Hardware_ECC_Recovered 0x001a 063 046 000Old_age Always - 166181300 These numbers are quite worrysome-- they should be zero or nearly so in a healthy drive. No, these are perfectly reasonable for a Seagate. I have about 12 7200.X's and all show the same sort of behavior. If they're nearly zero it's probably a sign your manufacturer isn't actually counting them (marketroids hate accurate SMART readings). Try graphing them as counters; with an idle disk you'll see periodic sawtooth patterns as the heads crawl from one side of the disk to the other. -- Thomas 'Freaky' Hurst http://hur.st/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Fri, Jan 25, 2008 at 12:46:08PM -0800, Chuck Swiger wrote: On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 071 006Pre-fail Always - 82422948 [ ... ] 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail Always - 286126605 [ ... ] 195 Hardware_ECC_Recovered 0x001a 063 046 000Old_age Always - 166181300 These numbers are quite worrysome-- they should be zero or nearly so in a healthy drive. On some drives, yes, but not all drives. His is a Seagate drive -- Seagate uses some of the bits in the raw data section for some sort of internal use by the drive firmware. So as they may appear very high in value, the drive appears to function normally, and the actual adjusted SMART value (the field under VALUE) doesn't fluxuate. I have Seagate drives all over the place which exhibit identical stats to the above. I've included some for comparison below; each listed is on a different system. Look at attribute 190 (Temperature Celcius) for an example; I don't think any drive can reach 773849124C, for example. Or, well, I sure hope not. :-) I believe in the case of attrib. 190, that's why they present a human-readable value in attribute 194. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ==SNIP== ad6: 476940MB Seagate ST3500630AS 3.AAE at ata3-master SATA300 SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 112 094 006Pre-fail Always - 221374987 3 Spin_Up_Time0x0003 094 094 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 6 5 Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 082 060 030Pre-fail Always - 29014 9 Power_On_Hours 0x0032 097 097 000Old_age Always - 2967 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020Old_age Always - 9 187 Unknown_Attribute 0x0032 100 100 000Old_age Always - 0 189 Unknown_Attribute 0x003a 100 100 000Old_age Always - 0 190 Temperature_Celsius 0x0022 064 050 045Old_age Always - 773849124 194 Temperature_Celsius 0x0022 036 050 000Old_age Always - 36 (Lifetime Min/Max 0/29) 195 Hardware_ECC_Recovered 0x001a 066 059 000Old_age Always - 36458075 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 18 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 18 199 UDMA_CRC_Error_Count0x003e 200 200 000Old_age Always - 0 200 Multi_Zone_Error_Rate 0x 100 253 000Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000Old_age Always - 0 ad4: 114473MB Seagate ST3120827AS 3.42 at ata2-master SATA150 SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 063 052 006Pre-fail Always - 57703728 3 Spin_Up_Time0x0003 096 096 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 24 5 Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 082 060 030Pre-fail Always - 169005025 9 Power_On_Hours 0x0032 096 096 000Old_age Always - 3536 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020Old_age Always - 24 194 Temperature_Celsius 0x0022 027 040 000Old_age Always - 27 (Lifetime Min/Max 0/15) 195 Hardware_ECC_Recovered 0x001a 063 052 000Old_age Always - 57703728 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100
Re: panic: vm_fault: fault on nofualt entry, addr: 81423000
Hmm, so that's fine. What pointer is returned by madt_map_table? 0x800e7610 I also put some prints in afterwards to try and see how far through the loop it was getting: count = (xsdt-Header.Length - sizeof(ACPI_TABLE_HEADER)) / sizeof(UINT64); printf(DEBUG: count is %d\n, count); for (i = 0; i count; i++) { printf(DEBUG: probing %d - offset %p\n, i, xsdt-TableOffsetEntry[i]); if (madt_probe_table(xsdt-TableOffsetEntry[i])) break; } The output is interesting - I get count printed as 6, but then nothing else, just the panic. Which leads me to believe that it is the access to xsdt-TableOffsetEntry[0] which is causing the panic. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Jeremy Chadwick wrote: What you've shown is usually the sign of a disk-related problem. It's very obvious when it's just one disk reporting DMA errors. You use ZFS, so chances are you have more than one disk in a pool/volume -- there's no indication ad1, ad4, ad6, etc. are failing, so this seems to indicate something specific to ad0. Jeremy, thanks for the response - I have tried to answer all of your questions below... In my case, I am using only one disk (ad0) for FreeBSD, and I am only using one partition on this disk in my ZFS pool. So, in this case, unfortunately, it's not possible to tell from the fact that only ad0 is listed that it is specific to this drive. Manufacturers pick very passive (non-aggressive) thresholds for error conditions on disks, so disks which are failing very commonly show PASSED during SMART analysis. To make matters worse, most users I know read SMART stats incorrectly (they're easy to misinterpret). Yep, I am also always skeptical of smart reports. That's one reason I am very interested in ZFS. I don't trust the drive to be completely reliable, and the fact that ZFS does end-to-end data integrity is very intriguing. Can you please provide output of the following: * smartctl -a /dev/ad0 OK, I've attached this to the end of this email. * atacontrol cap ad0 Protocol ATA/ATAPI revision 7 device model ST3500630A serial number 9QG0DG03 firmware revision 3.AAE cylinders 16383 heads 16 sectors/track 63 lba supported 268435455 sectors lba48 supported 976773168 sectors dma supported overlap not supported Feature Support EnableValue Vendor write cacheyes yes read ahead yes yes Tagged Command Queuing (TCQ) no no 0/0x00 SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management no no 65278/0xFEFE automatic acoustic management no no 0/0x00 208/0xD0 * atacontrol info ata0, ata1, etc. -- any controller used by ZFS Master: ad0 ST3500630A/3.AAE ATA/ATAPI revision 7 Slave: ad1 ST3160812A/3.AAH ATA/ATAPI revision 7 (but note that ad1 is not used by FreeBSD) * Relevant dmesg output that indicates what kind of ATA controller these disks are attached to. Start with output from 'ad0:' and work backwards. For example, ad0 on this machine is using an Intel ICH6 controller: atapci0: Intel ICH6 SATA150 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.2 on pci0 ata0: ATA channel 0 on atapci0 ad0: 238475MB WDC WD2500KS-00MJB0 02.01C03 at ata0-master SATA150 atapci0: Intel ICH4 UDMA100 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0 ata0: ATA channel 0 on atapci0 ata0: [ITHREAD] ad0: 476940MB Seagate ST3500630A 3.AAE at ata0-master UDMA100 SMART stats which are labelled Offline are only updated when a short or long offline test is performed. Have you tried using smartctl -t short /dev/ad0 and smartctl -t long /dev/ad0 to see if any of the raw values on the far right column increment? I just tried one: # 1 Short offline Completed without error 00% 5252 - # 2 Short offline Completed without error 00% 5252 - Also, none of the numbers that were zero incremented, esp: 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 0 Also, no more errors were reported in the system log during the self-tests. Have you tried using zpool scrub on the ZFS pool, then zpool status to see if READ/WRITE/CHKSUM counters increment or if the scrub line states there were errors? OK, I started a scrub, and it will take some more time to complete... But I get the following with status. Could this be due to the timeouts and failures? I suspect so, so maybe this is not surprizing. I'd also guess that this doesn't necessarily point to the drive, but anything in the chain of events... I do not have a mirror or RADI-Z, so I guess the reason there was no data loss (yet) is because the checksum passed, and maybe it just had to retry...? Anyway, here's the output so far: pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 2.50% done, 1h58m to go config: NAMESTATE READ WRITE CKSUM tankONLINE 1 3 0 ad0s1dONLINE 1 3 0 errors: No known data errors Other
FreeBSD 7.0-PRE + amd64 + Areca controller = probe12 warning
When booting 7.0-PRERELEASE amd64 on our machines with areca controllers we get the following odd message which doesn't appear on i386 is this something to worry about or harmless? (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step Full dmesg attached. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to [EMAIL PROTECTED] dmesg.boot Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: vm_fault: fault on nofualt entry, addr: 81423000
On Friday 25 January 2008 07:55:46 am Pete French wrote: Hmm, so that's fine. What pointer is returned by madt_map_table? 0x800e7610 That isn't page-aligned which is unexpected, though it should still work fine. I also put some prints in afterwards to try and see how far through the loop it was getting: count = (xsdt-Header.Length - sizeof(ACPI_TABLE_HEADER)) / sizeof(UINT64); printf(DEBUG: count is %d\n, count); for (i = 0; i count; i++) { printf(DEBUG: probing %d - offset %p\n, i, xsdt-TableOffsetEntry[i]); if (madt_probe_table(xsdt-TableOffsetEntry[i])) break; } The output is interesting - I get count printed as 6, but then nothing else, just the panic. Which leads me to believe that it is the access to xsdt-TableOffsetEntry[0] which is causing the panic. Hmm, that is odd. The header.Length and the the actual table should all be in the same page, so you shouldn't be getting a page fault. Can you add some printfs to madt_map() to see what the final starting (pa, length) are before the call to pmap_kenter_temporary() and then add a printf for each iteration of the while loop showing the (pa, la, remaining length)? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NO_ knobs in /etc/make.conf
On Wednesday 23 January 2008 02:37:24 pm Doug Barton wrote: Vivek Khera wrote: I guess I wasn't clear about my confusion. What was broken about putting all this in make.conf that necessitated a src.conf file too? One could argue that they didn't need to be moved at all. One of the rationales at the time was that we didn't want the knobs for the base to affect the ports. Correct, and /etc/src.conf is optional. It is a good place to put settings that you want to only affect compiles in /usr/src and not affect building apps from ports, standalone compiles, etc. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: T7200 CPU not detected by est
On Wednesday 23 January 2008 02:42:52 am Krassimir Slavchev wrote: John Baldwin wrote: On Monday 21 January 2008 11:16:06 am Gerrit Kühn wrote: Hi folks, I have several systems using T7200 mobile CPUs running under 7-stable. However, EST does not recognize the cpus. When loading cpufreq I get: You can try this patch. It won't add support for all of the levels, but it will support the current level and the highest level (IIRC). It works now on my T7700: dev.est.0.%desc: Enhanced SpeedStep Frequency Control dev.est.0.%driver: est dev.est.0.%parent: cpu0 dev.est.0.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000 1200/16000 dev.est.1.%desc: Enhanced SpeedStep Frequency Control dev.est.1.%driver: est dev.est.1.%parent: cpu1 dev.est.1.freq_settings: 2401/35000 2400/35000 2000/28000 1600/22000 1200/16000 Odd, it shouldn't have provided that many settings. It also doesn't provide power info. I wonder if you are getting the settings from ACPI. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.3-RELEASE can not mount root on Cyrix 5530 ATA33 controller
On Wednesday 23 January 2008 03:52:39 pm Søren Schmidt wrote: On 23Jan, 2008, at 21:09 , Xin LI wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yoshihiko Sarumaru wrote: Hello, I updated my Geode GX1 PC from RELENG_6_2 to RELENG_6_3 and found root mount failed after reboot. This problem was caused by a change to ata-pci.c to pick up wider old ata controller as ata-pci devices at ata_legacy() function, and roll backing that file resolved this problem for me. Which revision? Actually, its the fix to pci/pci.c that hasn't been backported to 6.x yet... Rev 1.343? It should apply to 6.x cleanly. Patch below: Index: pci.c === RCS file: /host/cvs/usr/cvs/src/sys/dev/pci/pci.c,v retrieving revision 1.292.2.23 diff -u -r1.292.2.23 pci.c --- pci.c 10 Jan 2008 21:17:12 - 1.292.2.23 +++ pci.c 25 Jan 2008 14:05:20 - @@ -1898,7 +1898,9 @@ /* ATA devices needs special map treatment */ if ((pci_get_class(dev) == PCIC_STORAGE) (pci_get_subclass(dev) == PCIS_STORAGE_IDE) - (pci_get_progif(dev) PCIP_STORAGE_IDE_MASTERDEV)) + ((pci_get_progif(dev) PCIP_STORAGE_IDE_MASTERDEV) || +(!pci_read_config(dev, PCIR_BAR(0), 4) + !pci_read_config(dev, PCIR_BAR(2), 4))) ) pci_ata_maps(pcib, bus, dev, b, s, f, rl, force, prefetchmask); else for (i = 0; i cfg-nummaps;) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1
On Thursday 24 January 2008 06:22:57 am Sam Leffler wrote: Joe Peterson wrote: In an attempt to track down this mouse freezing/stuttering (i.e. jerky mouse movement) behavior in FreeBSD 7.0-RC1, I have come up with a reliable way to cause it to happen, and I have created a longer trace showing the results. Note that I am using the ULE scheduler. In general, it becomes easier to see the effect if there is CPU activity. I have noticed it during kernel compiles, while at the same time loading web pages in firefox that contain images (and moving the mouse while this is happening). But a more controlled way to see it is to run something that uses some CPU and then generating lots of X events. In my case, I start xtrs (TRS-80 emulator) in Model IV mode, which happens to poll for input, using the CPU. Then I move the mouse back and forth quickly between windows in focus under mouse mode (in my case, a KDE focus mode), which causes many focus events quickly. In about 15 or 20 seconds, the mouse reliably starts to show erratic movement, not moving smoothly. I really hope this can shed more light on what might be going on. Here is the trace: http://www.skyrush.com/downloads/ktr_ule_4.out This is an interesting trace. It appears that something is blocking threads in the runq from running for 2 seconds! I don't see what it is from the trace data. It sort of looks like the last thing that ran is the swi4 which is likely a callout (need to check the log file contents to be certain). If the callback function does something it wouldn't necessarily be visible in the schedgraph plot. If you could stick a dmesg from booting out in the same spot it might be worthwhile. Also if you rebuild the kernel the kernel with DIAGNOSTIC then softclock() will complain about callouts that take longer than 2ms to run. This might generate too much noise in which case you can adjust the threshold by editing the code in sys/kern/kern_timeout.c. Hmm, when I look at that graph using schedgraphy from HEAD it just looks like xtrs is using up all the CPU. I didn't see the 2 second window where nothing was running. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: snd_emu10k1.ko after 6.2 to 6.3 upgrade
On Wednesday 23 January 2008 03:08:41 pm Petr Holub wrote: Hi, I've found a problem after updating from 6.2-RELEASE to 6.3-RELEASE using freebsd-update as described on daemonology blog. It looks like if the snd_emu10k1.ko along with a few others was not appropriately updated: # ls -l snd_* -r-xr-xr-x 1 root wheel 16566 Feb 20 2007 snd_ad1816.ko -r-xr-xr-x 1 root wheel 17731 Feb 20 2007 snd_als4000.ko -r-xr-xr-x 1 root wheel 20004 Jan 21 15:42 snd_atiixp.ko -r-xr-xr-x 1 root wheel 19192 Feb 20 2007 snd_cmi.ko -r-xr-xr-x 1 root wheel 18594 Feb 20 2007 snd_cs4281.ko -r-xr-xr-x 1 root wheel 30814 Feb 20 2007 snd_csa.ko -r-xr-xr-x 1 root wheel 11098 Jan 21 15:42 snd_driver.ko -r-xr-xr-x 1 root wheel 45839 Feb 20 2007 snd_ds1.ko -r-xr-xr-x 1 root wheel 30008 Feb 20 2007 snd_emu10k1.ko -r-xr-xr-x 1 root wheel 59398 Feb 20 2007 snd_emu10kx.ko -rwxr-xr-x 1 root wheel 31223 Jan 21 15:42 snd_envy24.ko -rwxr-xr-x 1 root wheel 30504 Jan 21 15:42 snd_envy24ht.ko -r-xr-xr-x 1 root wheel 32005 Feb 20 2007 snd_es137x.ko -r-xr-xr-x 1 root wheel 20075 Feb 20 2007 snd_ess.ko -r-xr-xr-x 1 root wheel 15636 Feb 20 2007 snd_fm801.ko -rwxr-xr-x 1 root wheel 77423 Jan 21 15:42 snd_hda.ko -r-xr-xr-x 1 root wheel 23812 Jan 21 15:42 snd_ich.ko -r-xr-xr-x 1 root wheel 31117 Feb 20 2007 snd_maestro.ko -r-xr-xr-x 1 root wheel 42945 Jan 21 15:42 snd_maestro3.ko -r-xr-xr-x 1 root wheel 46976 Feb 20 2007 snd_mss.ko -r-xr-xr-x 1 root wheel 68790 Feb 20 2007 snd_neomagic.ko -r-xr-xr-x 1 root wheel 14783 Feb 20 2007 snd_null.ko -r-xr-xr-x 1 root wheel 16934 Feb 20 2007 snd_sb16.ko -r-xr-xr-x 1 root wheel 15418 Feb 20 2007 snd_sb8.ko -r-xr-xr-x 1 root wheel 15397 Feb 20 2007 snd_sbc.ko -r-xr-xr-x 1 root wheel 19397 Feb 20 2007 snd_solo.ko -r-xr-xr-x 1 root wheel 7240 Jan 21 15:42 snd_spicds.ko -r-xr-xr-x 1 root wheel 18856 Jan 21 15:42 snd_t4dwave.ko -r-xr-xr-x 1 root wheel 36300 Jan 21 15:42 snd_uaudio.ko -r-xr-xr-x 1 root wheel 21918 Jan 21 15:42 snd_via8233.ko -r-xr-xr-x 1 root wheel 16075 Jan 21 15:42 snd_via82c686.ko -r-xr-xr-x 1 root wheel 18707 Feb 20 2007 snd_vibes.ko Those modules dated Feb 20 2007 are not loadable actually: # kldload snd_emu10k1.ko kldload: can't load snd_emu10k1.ko: No such file or directory Do you have an error message in the dmesg after this? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in vm_page_splay
The machine is running 6.3-PRERELEASE as of Dec 30th. It just paniced in the middle of web-session as I was browsing for a file to upload via a web-form... The firefox in use is native (amd64), not a Linux-binary. The firefox process had over 550Mb of memory to its name -- it was running for many days. The box has 2Gb of RAM and was performing fine despite 4 SETI-processes in the background. Please, advise. Thanks! Is this the same box that you got the bad PTE panics on? If so, have you run memtest or the like to rule out bad RAM? No. This would be my own desktop... -mi ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: snd_emu10k1.ko after 6.2 to 6.3 upgrade
Do you have an error message in the dmesg after this? Yes, I do - sorry, haven't thought it will end up in dmesg and not in the terminal. It says: KLD snd_emu10k1.ko: depends on midi - not available Petr ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
upcoming bugathon this weekend
If you're interested in helping out on our PR database problems, please see my posting on freebsd-bugbusters@: http://docs.FreeBSD.org/cgi/mid.cgi?20080125182651.GA9914 We're having a bugathon this weekend, with the agenda being mostly to figure out where we are, who would like to help, and coming up with ways that they can do so. Followups to freebsd-bugbusters@, please. mcl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Jeremy Chadwick wrote: What you've shown is usually the sign of a disk-related problem. It's very obvious when it's just one disk reporting DMA errors. You use ZFS, so chances are you have more than one disk in a pool/volume -- there's no indication ad1, ad4, ad6, etc. are failing, so this seems to indicate something specific to ad0. Jeremy, thanks for the response - I have tried to answer all of your questions below... In my case, I am using only one disk (ad0) for FreeBSD, and I am only using one partition on this disk in my ZFS pool. So, in this case, unfortunately, it's not possible to tell from the fact that only ad0 is listed that it is specific to this drive. Manufacturers pick very passive (non-aggressive) thresholds for error conditions on disks, so disks which are failing very commonly show PASSED during SMART analysis. To make matters worse, most users I know read SMART stats incorrectly (they're easy to misinterpret). Yep, I am also always skeptical of smart reports. That's one reason I am very interested in ZFS. I don't trust the drive to be completely reliable, and the fact that ZFS does end-to-end data integrity is very intriguing. Can you please provide output of the following: * smartctl -a /dev/ad0 OK, I've attached this to the end of this email. * atacontrol cap ad0 Protocol ATA/ATAPI revision 7 device model ST3500630A serial number 9QG0DG03 firmware revision 3.AAE cylinders 16383 heads 16 sectors/track 63 lba supported 268435455 sectors lba48 supported 976773168 sectors dma supported overlap not supported Feature Support EnableValue Vendor write cacheyes yes read ahead yes yes Tagged Command Queuing (TCQ) no no 0/0x00 SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management no no 65278/0xFEFE automatic acoustic management no no 0/0x00 208/0xD0 * atacontrol info ata0, ata1, etc. -- any controller used by ZFS Master: ad0 ST3500630A/3.AAE ATA/ATAPI revision 7 Slave: ad1 ST3160812A/3.AAH ATA/ATAPI revision 7 (but note that ad1 is not used by FreeBSD) * Relevant dmesg output that indicates what kind of ATA controller these disks are attached to. Start with output from 'ad0:' and work backwards. For example, ad0 on this machine is using an Intel ICH6 controller: atapci0: Intel ICH6 SATA150 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.2 on pci0 ata0: ATA channel 0 on atapci0 ad0: 238475MB WDC WD2500KS-00MJB0 02.01C03 at ata0-master SATA150 atapci0: Intel ICH4 UDMA100 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0 ata0: ATA channel 0 on atapci0 ata0: [ITHREAD] ad0: 476940MB Seagate ST3500630A 3.AAE at ata0-master UDMA100 SMART stats which are labelled Offline are only updated when a short or long offline test is performed. Have you tried using smartctl -t short /dev/ad0 and smartctl -t long /dev/ad0 to see if any of the raw values on the far right column increment? I just tried one: # 1 Short offline Completed without error 00% 5252 - # 2 Short offline Completed without error 00% 5252 - Also, none of the numbers that were zero incremented, esp: 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 0 Also, no more errors were reported in the system log during the self-tests. Have you tried using zpool scrub on the ZFS pool, then zpool status to see if READ/WRITE/CHKSUM counters increment or if the scrub line states there were errors? OK, I started a scrub, and it will take some more time to complete... But I get the following with status. Could this be due to the timeouts and failures? I suspect so, so maybe this is not surprizing. I'd also guess that this doesn't necessarily point to the drive, but anything in the chain of events... I do not have a mirror or RADI-Z, so I guess the reason there was no data loss (yet) is because the checksum passed, and maybe it just had to retry...? Anyway, here's the output so far: pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 2.50% done, 1h58m to go config: NAMESTATE READ WRITE CKSUM tankONLINE 1 3 0 ad0s1dONLINE 1 3 0 errors: No known data errors Other
Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1
John Baldwin wrote: On Thursday 24 January 2008 06:22:57 am Sam Leffler wrote: Joe Peterson wrote: In an attempt to track down this mouse freezing/stuttering (i.e. jerky mouse movement) behavior in FreeBSD 7.0-RC1, I have come up with a reliable way to cause it to happen, and I have created a longer trace showing the results. Note that I am using the ULE scheduler. In general, it becomes easier to see the effect if there is CPU activity. I have noticed it during kernel compiles, while at the same time loading web pages in firefox that contain images (and moving the mouse while this is happening). But a more controlled way to see it is to run something that uses some CPU and then generating lots of X events. In my case, I start xtrs (TRS-80 emulator) in Model IV mode, which happens to poll for input, using the CPU. Then I move the mouse back and forth quickly between windows in focus under mouse mode (in my case, a KDE focus mode), which causes many focus events quickly. In about 15 or 20 seconds, the mouse reliably starts to show erratic movement, not moving smoothly. I really hope this can shed more light on what might be going on. Here is the trace: http://www.skyrush.com/downloads/ktr_ule_4.out This is an interesting trace. It appears that something is blocking threads in the runq from running for 2 seconds! I don't see what it is from the trace data. It sort of looks like the last thing that ran is the swi4 which is likely a callout (need to check the log file contents to be certain). If the callback function does something it wouldn't necessarily be visible in the schedgraph plot. If you could stick a dmesg from booting out in the same spot it might be worthwhile. Also if you rebuild the kernel the kernel with DIAGNOSTIC then softclock() will complain about callouts that take longer than 2ms to run. This might generate too much noise in which case you can adjust the threshold by editing the code in sys/kern/kern_timeout.c. Hmm, when I look at that graph using schedgraphy from HEAD it just looks like xtrs is using up all the CPU. I didn't see the 2 second window where nothing was running. Sigh, you are correct. I backrev'd the machine where I ran schedgraph to RELENG_7 and didn't notice the old version mis-parses the ktr file. The graph is totally different w/ schedgraph from HEAD. Sorry Joe for misleading you. Sam ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Chuck Swiger wrote: On Jan 25, 2008, at 11:24 AM, Joe Peterson wrote: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 071 006Pre-fail Always - 82422948 [ ... ] 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail Always - 286126605 [ ... ] 195 Hardware_ECC_Recovered 0x001a 063 046 000Old_age Always - 166181300 These numbers are quite worrysome-- they should be zero or nearly so in a healthy drive. It seems to depend on the drive manufacturer. E.g. this is a Seagate. Every Seagate I've ever had (or heard about on the web via smartctl dumps) reports very large numbers for these values. I've heard it described that Seagate shows you the raw numbers (and correctable errors do happen all the time in all drives). In Western Digital drives (IIRC), the numbers shown are the ones that *should* be zero, thereby hiding the low-level errors. Hard to say if my numbers are too high, but these corrected error counts are always frighteningly high in Seagates. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: New KTR trace for mouse freezing/stuttering in 7.0-RC1
John Baldwin wrote: Hmm, when I look at that graph using schedgraphy from HEAD it just looks like xtrs is using up all the CPU. Yeah, xtrs is eating a lot of CPU, but I've never seen this affect the mouse movement (making it really jerky) the same way on, e.g., Linux. And the xtrs test is just a way to *reliably* make it happen. It happens intermittently all of the time (at least every few minutes, and often in small batches) even when the system is pretty idle... -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Highpoint drivers on 7.0
I would advise contacting them. There support was helpful when I last contacted them and for the card that was involved the did release the code for the driver when enabled us to fix the issues. Regards Steve - Original Message - From: Alfred Perlstein [EMAIL PROTECTED] * Eirik ?verby [EMAIL PROTECTED] [080125 12:53] wrote: Hi all, did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7-RC1 or later? I'm considering upgrading one of my servers here, but I need to know if my RAID-controller will work after reinstall.. A shame HPT doesn't release the driver to the community... This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]