Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
On 09/01/2011 05:50, Randy Bush wrote: given i have raid or raidz1, can i move to raidz2? # zpool status pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 ad4s2 ONLINE 0 0 0 ad8s2 ONLINE 0 0 0 ad6s1 ONLINE 0 0 0 ad10s1 ONLINE 0 0 0 or # zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 label/disk01 ONLINE 0 0 0 label/disk00 ONLINE 0 0 0 mirror ONLINE 0 0 0 label/disk02 ONLINE 0 0 0 label/disk03 ONLINE 0 0 0 Not without backing up your current data, destroying the existing zpool(s) and rebuilding from scratch. Note: raidz2 on 4 disks doesn't really win you anything over 2 x mirror pairs of disks, and the RAID10 mirror is going to be rather more performant. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
Brill! Thanks :) Joe On 8 Jan 2011, at 09:50, Jeremy Chadwick free...@jdc.parodius.com wrote: On Sat, Jan 08, 2011 at 09:14:19AM +, Josef Karthauser wrote: On 7 Jan 2011, at 17:30, Artem Belevich fbsdl...@src.cx wrote: One way to get specific ratio for *your* pool would be to collect record size statistics from your pool using zdb -L -b pool and then calculate L2ARC:ARC ratio based on average record size. I'm not sure, though whether L2ARC stores records in compressed or uncompressed form. Can someone point me to a reference describing the various zfs caches available? What's the arc and zil? Ive been running some zfs for a few years now, and must have missed thus entire subject :/. ARC: http://en.wikipedia.org/wiki/ZFS#Cache_management L2ARC: http://en.wikipedia.org/wiki/ZFS#Storage_pools L2ARC: http://blogs.sun.com/brendan/entry/test Both: http://www.c0t0d0s0.org/archives/5329-Some-insight-into-the-read-cache-of-ZFS-or-The-ARC.html Both: http://nilesh-joshi.blogspot.com/2010/07/zfs-revisited.html ZIL: http://blogs.sun.com/perrin/entry/the_lumberjack ZIL: http://blogs.sun.com/realneel/entry/the_zfs_intent_log Enjoy. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: New ZFSv28 patchset for 8-STABLE
On 12/16/2010 01:44 PM, Martin Matuska wrote: Hi everyone, following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am providing a ZFSv28 testing patch for 8-STABLE. Link to the patch: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz I've got an IO hang with dedup enabled (not sure it's related, I've started to rewrite all data on pool, which makes a heavy load): The processes are in various states: 65747 1001 1 54 10 28620K 24360K tx-tx 0 6:58 0.00% cvsup 80383 1001 1 54 10 40616K 30196K select 1 5:38 0.00% rsync 1501 www 1 440 7304K 2504K zio-i 0 2:09 0.00% nginx 1479 www 1 440 7304K 2416K zio-i 1 2:03 0.00% nginx 1477 www 1 440 7304K 2664K zio-i 0 2:02 0.00% nginx 1487 www 1 440 7304K 2376K zio-i 0 1:40 0.00% nginx 1490 www 1 440 7304K 1852K zfs 0 1:30 0.00% nginx 1486 www 1 440 7304K 2400K zfsvfs 1 1:05 0.00% nginx And everything which wants to touch the pool is/becomes dead. Procstat says about one process: # procstat -k 1497 PIDTID COMM TDNAME KSTACK 1497 100257 nginx-mi_switch sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter syscall Xfast_syscall ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
Hi On 9 January 2011 19:44, Matthew Seaman m.sea...@infracaninophile.co.uk wrote: Not without backing up your current data, destroying the existing zpool(s) and rebuilding from scratch. Note: raidz2 on 4 disks doesn't really win you anything over 2 x mirror pairs of disks, and the RAID10 mirror is going to be rather more performant. I would have thought that the probability of failure to be slightly different. Sure you out of 4 disks, 2 can fail in both conditions. *But*, in raidz2, any two of the four can fail. In RAID10, the two disks that failed must be in different block otherwise you loose it all As such the resilience for failure in a RAIDz2 is far greater than in a RAID10 system ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
On 09/01/2011 09:01, Jean-Yves Avenard wrote: Hi On 9 January 2011 19:44, Matthew Seaman m.sea...@infracaninophile.co.uk wrote: Not without backing up your current data, destroying the existing zpool(s) and rebuilding from scratch. Note: raidz2 on 4 disks doesn't really win you anything over 2 x mirror pairs of disks, and the RAID10 mirror is going to be rather more performant. I would have thought that the probability of failure to be slightly different. Sure you out of 4 disks, 2 can fail in both conditions. *But*, in raidz2, any two of the four can fail. In RAID10, the two disks that failed must be in different block otherwise you loose it all As such the resilience for failure in a RAIDz2 is far greater than in a RAID10 system So you sacrifice performance 100% of the time based on the very unlikely possibility of drives 1+2 or 3+4 failing simultaneously, compared to the similarly unlikely possibility of drives 1+3 or 1+4 or 2+3 or 2+4 failing simultaneously?[*] That's not a trade-off worth making IMHO. If the data is that valuable, you should be making copies of it to some independent machine all the time and backing up at frequent intervals, which backups you keep off-site in disaster-proof storage. Cheers, Matthew [*] All of this mathematics is pretty suspect, because if two drives fail simultaneously in a machine, the chances are the failures are not independent, but due to some external cause [eg. like the case fan breaking and the box toasting itself.] In which case, the comparative chance of whatever it is affecting three or four drives at once renders the whole argument pointless. -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
On 9 January 2011 21:03, Matthew Seaman m.sea...@infracaninophile.co.uk wrote: So you sacrifice performance 100% of the time based on the very unlikely possibility of drives 1+2 or 3+4 failing simultaneously, compared to the similarly unlikely possibility of drives 1+3 or 1+4 or 2+3 or 2+4 But this is not what you first wrote You said the effect were identical. they are not. Now if you want to favour performance over redundancy that's ultimately up to the user... Plus, honestly, the difference in performance between raidz and raid10 is also close to bein insignificant. failing simultaneously?[*] That's not a trade-off worth making IMHO. If the data is that valuable, you should be making copies of it to some independent machine all the time and backing up at frequent intervals, which backups you keep off-site in disaster-proof storage. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
Hi, all, Am 09.01.2011 um 11:03 schrieb Matthew Seaman: [*] All of this mathematics is pretty suspect, because if two drives fail simultaneously in a machine, the chances are the failures are not independent, but due to some external cause [eg. like the case fan breaking and the box toasting itself.] In which case, the comparative chance of whatever it is affecting three or four drives at once renders the whole argument pointless. I assume you are familiar with these papers? http://queue.acm.org/detail.cfm?id=1317403 http://queue.acm.org/detail.cfm?id=1670144 Short version: as hard disk sizes increase to 2 TB and beyond while the URE rate stays in the order of 1 to 10^14 blocks read, the probability of encountering an URE during rebuild of a single parity RAID approaches 1. Best regards, Patrick -- punkt.de GmbH * Kaiserallee 13a * 76133 Karlsruhe Tel. 0721 9109 0 * Fax 0721 9109 100 i...@punkt.de http://www.punkt.de Gf: Jürgen Egeling AG Mannheim 108285 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
On 09/01/2011 10:24, Jean-Yves Avenard wrote: On 9 January 2011 21:03, Matthew Seaman m.sea...@infracaninophile.co.uk wrote: So you sacrifice performance 100% of the time based on the very unlikely possibility of drives 1+2 or 3+4 failing simultaneously, compared to the similarly unlikely possibility of drives 1+3 or 1+4 or 2+3 or 2+4 But this is not what you first wrote What I said was: Note: raidz2 on 4 disks doesn't really win you anything over 2 x mirror pairs of disks, and the RAID10 mirror is going to be rather more performant. You said the effect were identical. they are not. Which is certainly not saying the effects are identical. It's saying the difference is too small to worry about. Plus, honestly, the difference in performance between raidz and raid10 is also close to bein insignificant. That's not my experience. It depends on what sort of workload you have. If you're streaming very large files, I'd expect RAID10 and RAIDz to be about equal. If you're doing lots of randomly distributed small IOs, then RAID10 is going to win hands down. Cheers Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
Re: New ZFSv28 patchset for 8-STABLE
On 01/09/2011 10:00 AM, Attila Nagy wrote: On 12/16/2010 01:44 PM, Martin Matuska wrote: Hi everyone, following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am providing a ZFSv28 testing patch for 8-STABLE. Link to the patch: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz I've got an IO hang with dedup enabled (not sure it's related, I've started to rewrite all data on pool, which makes a heavy load): The processes are in various states: 65747 1001 1 54 10 28620K 24360K tx-tx 0 6:58 0.00% cvsup 80383 1001 1 54 10 40616K 30196K select 1 5:38 0.00% rsync 1501 www 1 440 7304K 2504K zio-i 0 2:09 0.00% nginx 1479 www 1 440 7304K 2416K zio-i 1 2:03 0.00% nginx 1477 www 1 440 7304K 2664K zio-i 0 2:02 0.00% nginx 1487 www 1 440 7304K 2376K zio-i 0 1:40 0.00% nginx 1490 www 1 440 7304K 1852K zfs 0 1:30 0.00% nginx 1486 www 1 440 7304K 2400K zfsvfs 1 1:05 0.00% nginx And everything which wants to touch the pool is/becomes dead. Procstat says about one process: # procstat -k 1497 PIDTID COMM TDNAME KSTACK 1497 100257 nginx-mi_switch sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter syscall Xfast_syscall No, it's not related. One of the disks in the RAIDZ2 pool went bad: (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error (da4:arcmsr0:0:4:0): SCSI status: Check Condition (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) and it seems it froze the whole zpool. Removing the disk by hand solved the problem. I've seen this previously on other machines with ciss. I wonder why ZFS didn't throw it out of the pool. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: New ZFSv28 patchset for 8-STABLE
On 01/01/2011 08:09 PM, Artem Belevich wrote: On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagyb...@fsn.hu wrote: What I see: - increased CPU load - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased hard disk load (IOPS graph) ... Any ideas on what could cause these? I haven't upgraded the pool version and nothing was changed in the pool or in the file system. The fact that L2 ARC is full does not mean that it contains the right data. Initial L2ARC warm up happens at a much higher rate than the rate L2ARC is updated after it's been filled initially. Even accelerated warm-up took almost a day in your case. In order for L2ARC to warm up properly you may have to wait quite a bit longer. My guess is that it should slowly improve over the next few days as data goes through L2ARC and those bits that are hit more often take residence there. The larger your data set, the longer it will take for L2ARC to catch the right data. Do you have similar graphs from pre-patch system just after reboot? I suspect that it may show similarly abysmal L2ARC hit rates initially, too. I've finally found the time to read the v28 patch and figured out the problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use the prefetched data on the L2ARC devices. This is a major hit in my case. Enabling this again restored the previous hit rates and lowered the load on the hard disks significantly. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Panic 8.2 PRERELEASE WRITE_DMA48
The last half year I've been installing FreeBSD on several machines. I installed it on my main desktop system a few weeks ago which normally runs Linux, but I get this panic under heavy disk I/O. It even happened during the initial sysinstall, allthough I also have completed several buildworlds without problems. I can trigger it easily by accessing /usr (UFS) and a linux ext partition simultaneously, eg by copying large files to the /usr partition. Just bought a serial cable to enable the serial console of the various FreeBSD installations, which is of good use for this problem, because a crash dump is not written. Full boot output in the attachment Sun Jan 9 10:11:17 CET 2011 unknown: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=274799820^M ata2: timeout waiting to issue command^M ata2: error issuing WRITE_DMA48 command^M g_vfs_done():ad4s2f[WRITE(offset=28915105792, length=131072)]error = 6^M /usr: got error 6 while accessing filesystem^M panic: softdep_deallocate_dependencies: unrecovered I/O error^M cpuid = 0^M KDB: stack backtrace:^M #0 0xc08e0f77 at kdb_backtrace+0x47^M #1 0xc08b2037 at panic+0x117^M #2 0xc0ae2ecd at softdep_deallocate_dependencies+0x3d^M #3 0xc0925590 at brelse+0x90^M #4 0xc092829a at bufdone_finish+0x3fa^M #5 0xc092830d at bufdone+0x4d^M #6 0xc092bdf9 at cluster_callback+0x89^M #7 0xc09282f7 at bufdone+0x37^M #8 0xc0850ad5 at g_vfs_done+0x85^M #9 0xc09224d9 at biodone+0xb9^M #10 0xc084da69 at g_io_schedule_up+0x79^M #11 0xc084e0a8 at g_up_procbody+0x68^M #12 0xc0886fc1 at fork_exit+0x91^M #13 0xc0bcc144 at fork_trampoline+0x8^M Uptime: 2h56m27s^M Physical memory: 1515 MB^M Dumping 177 MB:ata2: timeout waiting to issue command^M ata2: error issuing WRITE_DMA command^M ^M ** DUMP FAILED (ERROR 5) **^M Automatic reboot in 15 seconds - press a key on the console to abort^M Rebooting...^M crash_script Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
On 09/01/2011 10:14, Patrick M. Hausen wrote: I assume you are familiar with these papers? http://queue.acm.org/detail.cfm?id=1317403 http://queue.acm.org/detail.cfm?id=1670144 Short version: as hard disk sizes increase to 2 TB and beyond while the URE rate stays in the order of 1 to 10^14 blocks read, the probability of encountering an URE during rebuild of a single parity RAID approaches 1. Yes. Rotating magnetic media seems to be bumping up against some intrinsic performance/reliability limits to the year-on-year doubling of capacity. Having to add more and more extra drives to ensure the same level of reliability is not a wining proposition in the long term. Roll on solid state storage. I particularly like the sound of HP and Hynix's memristor technology. If memristors pan out, then they are going to replace both D-RAM and hard drives, and eventually replace transistors as the basic building block for electronic logic circuits. Five to ten years from now, hardware design is going to be very different, and the software that runs on it will have to be radically redesigned to match. Think what that means. * You don't have to *save* a file, ever. If it's in memory, it's in persistent storage. * The effect on RDBMS performance is going to be awesome -- none of that time consuming waiting for sync-to-disk. * A computer should be able to survive a power outage of a few seconds and carry on where it left off, without specially going into hibernation mode. * Similarly, reboot will be at the flick of a switch -- pretty much instant on. * Portables will look a lot more like iPads or other tablet devices, and will have battery lifetimes of several days. About the only significant difference is one will have a hinge down the middle and a built-in keyboard, while the other will only have the touch screen. Oh, and let's not forget the beneficial effects of *no moving parts* and *lower power consumption* on system reliability. Now all we need are the telcos to lay multi-Gb/s capacity fibre to every house and business, and things will start to get very interesting indeed. Cheers Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
Re: New ZFSv28 patchset for 8-STABLE
On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote: On 01/09/2011 10:00 AM, Attila Nagy wrote: On 12/16/2010 01:44 PM, Martin Matuska wrote: Hi everyone, following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am providing a ZFSv28 testing patch for 8-STABLE. Link to the patch: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz I've got an IO hang with dedup enabled (not sure it's related, I've started to rewrite all data on pool, which makes a heavy load): The processes are in various states: 65747 1001 1 54 10 28620K 24360K tx-tx 0 6:58 0.00% cvsup 80383 1001 1 54 10 40616K 30196K select 1 5:38 0.00% rsync 1501 www 1 440 7304K 2504K zio-i 0 2:09 0.00% nginx 1479 www 1 440 7304K 2416K zio-i 1 2:03 0.00% nginx 1477 www 1 440 7304K 2664K zio-i 0 2:02 0.00% nginx 1487 www 1 440 7304K 2376K zio-i 0 1:40 0.00% nginx 1490 www 1 440 7304K 1852K zfs 0 1:30 0.00% nginx 1486 www 1 440 7304K 2400K zfsvfs 1 1:05 0.00% nginx And everything which wants to touch the pool is/becomes dead. Procstat says about one process: # procstat -k 1497 PIDTID COMM TDNAME KSTACK 1497 100257 nginx-mi_switch sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter syscall Xfast_syscall No, it's not related. One of the disks in the RAIDZ2 pool went bad: (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error (da4:arcmsr0:0:4:0): SCSI status: Check Condition (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) and it seems it froze the whole zpool. Removing the disk by hand solved the problem. I've seen this previously on other machines with ciss. I wonder why ZFS didn't throw it out of the pool. Hold on a minute. An unrecoverable read error does not necessarily mean the drive is bad, it could mean that the individual LBA that was attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad block was encountered). I would check SMART stats on the disk (since these are probably SATA given use of arcmsr(4)) and provide those. *That* will tell you if the disk is bad. I'll help you decode the attributes values if you provide them. My understanding is that a single LBA read failure should not warrant ZFS marking the disk UNAVAIL in the pool. It should have incremented the READ error counter and that's it. Did you receive a *single* error for the disk and then things went catatonic? If the entire system got wedged (a soft wedge, e.g. kernel is still alive but nothing's happening in userland), that could be a different problem -- either with ZFS or arcmsr(4). Does ZFS have some sort of timeout value internal to itself where it will literally mark a disk UNAVAIL in the case that repeated I/O transactions takes too long? What is its error recovery methodology? Speaking strictly about Solaris 10 and ZFS: I have seen many, many times a system soft wedge after repeated I/O errors (read or write) are spewed out on the console for a single SATA disk (via AHCI), but only when the disk is used as a sole root filesystem disk (no mirror/raidz). My impression is that ZFS isn't the problem in this scenario. In most cases, post-mortem debugging on my part shows that disks encountered some CRC errors (indicating cabling issues, etc.), sometimes as few as 2, but something else went crazy -- or possibly ZFS couldn't mark the disk UNAVAIL (if it has that logic) because it's a single disk associated with root. Hardware in this scenario are Hitachi SATA disks with an ICH ESB2 controller, software is Solaris 10 (Generic_142901-06) with ZFS v15. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Panic 8.2 PRERELEASE WRITE_DMA48
On Sun, Jan 09, 2011 at 12:33:10PM +0100, Tom Vijlbrief wrote: The last half year I've been installing FreeBSD on several machines. I installed it on my main desktop system a few weeks ago which normally runs Linux, but I get this panic under heavy disk I/O. It even happened during the initial sysinstall, allthough I also have completed several buildworlds without problems. I can trigger it easily by accessing /usr (UFS) and a linux ext partition simultaneously, eg by copying large files to the /usr partition. Just bought a serial cable to enable the serial console of the various FreeBSD installations, which is of good use for this problem, because a crash dump is not written. Full boot output in the attachment Sun Jan 9 10:11:17 CET 2011 unknown: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=274799820^M ata2: timeout waiting to issue command^M ata2: error issuing WRITE_DMA48 command^M g_vfs_done():ad4s2f[WRITE(offset=28915105792, length=131072)]error = 6^M /usr: got error 6 while accessing filesystem^M panic: softdep_deallocate_dependencies: unrecovered I/O error^M cpuid = 0^M KDB: stack backtrace:^M #0 0xc08e0f77 at kdb_backtrace+0x47^M #1 0xc08b2037 at panic+0x117^M #2 0xc0ae2ecd at softdep_deallocate_dependencies+0x3d^M #3 0xc0925590 at brelse+0x90^M #4 0xc092829a at bufdone_finish+0x3fa^M #5 0xc092830d at bufdone+0x4d^M #6 0xc092bdf9 at cluster_callback+0x89^M #7 0xc09282f7 at bufdone+0x37^M #8 0xc0850ad5 at g_vfs_done+0x85^M #9 0xc09224d9 at biodone+0xb9^M #10 0xc084da69 at g_io_schedule_up+0x79^M #11 0xc084e0a8 at g_up_procbody+0x68^M #12 0xc0886fc1 at fork_exit+0x91^M #13 0xc0bcc144 at fork_trampoline+0x8^M Uptime: 2h56m27s^M Physical memory: 1515 MB^M Dumping 177 MB:ata2: timeout waiting to issue command^M ata2: error issuing WRITE_DMA command^M ^M ** DUMP FAILED (ERROR 5) **^M Automatic reboot in 15 seconds - press a key on the console to abort^M Rebooting...^M Can you please provide output from the following commands (after installing ports/sysutils/smartmontools, which should be version 5.40 or later (in case you haven't updated your ports tree)): $ dmesg $ smartctl -a /dev/ad4 The SMART output should act as a verifier as to whether or not you really do have a bad block on your disk (which is what READ/WRITE_DMA48 can sometimes indicate). You may also want to boot the machine in single user mode and do a manual fsck /dev/ad4s2f. It's been proven in the past that background_fsck doesn't manage to address all issues. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: New ZFSv28 patchset for 8-STABLE
On 01/09/2011 01:18 PM, Jeremy Chadwick wrote: On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote: On 01/09/2011 10:00 AM, Attila Nagy wrote: On 12/16/2010 01:44 PM, Martin Matuska wrote: Hi everyone, following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am providing a ZFSv28 testing patch for 8-STABLE. Link to the patch: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz I've got an IO hang with dedup enabled (not sure it's related, I've started to rewrite all data on pool, which makes a heavy load): The processes are in various states: 65747 1001 1 54 10 28620K 24360K tx-tx 0 6:58 0.00% cvsup 80383 1001 1 54 10 40616K 30196K select 1 5:38 0.00% rsync 1501 www 1 440 7304K 2504K zio-i 0 2:09 0.00% nginx 1479 www 1 440 7304K 2416K zio-i 1 2:03 0.00% nginx 1477 www 1 440 7304K 2664K zio-i 0 2:02 0.00% nginx 1487 www 1 440 7304K 2376K zio-i 0 1:40 0.00% nginx 1490 www 1 440 7304K 1852K zfs 0 1:30 0.00% nginx 1486 www 1 440 7304K 2400K zfsvfs 1 1:05 0.00% nginx And everything which wants to touch the pool is/becomes dead. Procstat says about one process: # procstat -k 1497 PIDTID COMM TDNAME KSTACK 1497 100257 nginx-mi_switch sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter syscall Xfast_syscall No, it's not related. One of the disks in the RAIDZ2 pool went bad: (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error (da4:arcmsr0:0:4:0): SCSI status: Check Condition (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) and it seems it froze the whole zpool. Removing the disk by hand solved the problem. I've seen this previously on other machines with ciss. I wonder why ZFS didn't throw it out of the pool. Hold on a minute. An unrecoverable read error does not necessarily mean the drive is bad, it could mean that the individual LBA that was attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad block was encountered). I would check SMART stats on the disk (since these are probably SATA given use of arcmsr(4)) and provide those. *That* will tell you if the disk is bad. I'll help you decode the attributes values if you provide them. You are right, and I gave incorrect information. There are a lot more errors for that disk in the logs, and the zpool was frozen. I tried to offline the given disk. That helped in the ciss case, where the symptom is the same, or something similar, like there is no IO for ages, then something small and nothing for long seconds/minutes, and there are no errors logged. zpool status reported no errors, and the dmesg was clear too. There I could find the bad disk by watching gstat output and there I saw when the very small amount of IO was done, there was one disk with response times well above a second, while the others responded quickly. There the zpool offline helped. Here not, the command just got hang, like everything else. So what I did then: got into the areca-cli and searched for errors. One disk was set to failed and it seemed to be the cause. I've removed it (and did a camcontrol rescan, but I'm not sure it was necessary or not), and suddenly the zpool offline finished and everything went back to normal. But there are two controllers in the system and now I see that the above disk is on ctrl 1, while the one I have removed is on ctrl 2. I was misleaded by their same position. So now I have an offlined disk (which produces read errors, but I couldn't see them in the zpool output) and another, which is shown as failed in the RAID controller and got removed by hand (and solved the situation): NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 label/disk20-01 ONLINE 0 0 0 label/disk20-02 ONLINE 0 0 0 label/disk20-03 ONLINE 0 0 0 label/disk20-04 ONLINE 0 0 0 label/disk20-05 OFFLINE 0 0 0 label/disk20-06 ONLINE 0 0 0 label/disk20-07 ONLINE 0 0 0 label/disk20-08 ONLINE 0 0 0 label/disk20-09 ONLINE 0 0 0 label/disk20-10 ONLINE 0 0 0 label/disk20-11 ONLINE 0 0 0 label/disk20-12 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 label/disk21-01 ONLINE 0 0 0 label/disk21-02 ONLINE 0 0 0 label/disk21-03 ONLINE 0 0 0 label/disk21-04 ONLINE 0 0 0
Re: New ZFSv28 patchset for 8-STABLE
Once upon a time, this was a known problem with the arcmsr driver not correctly interacting with ZFS, resulting in this behavior. Since I'm presuming that the arcmsr driver update which was intended to fix this behavior (in my case, at least) is in your nightly build, it's probably worth pinging the arcmsr driver maintainer about this. - Rich On Sun, Jan 9, 2011 at 7:18 AM, Jeremy Chadwick free...@jdc.parodius.com wrote: On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote: On 01/09/2011 10:00 AM, Attila Nagy wrote: On 12/16/2010 01:44 PM, Martin Matuska wrote: Hi everyone, following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am providing a ZFSv28 testing patch for 8-STABLE. Link to the patch: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz I've got an IO hang with dedup enabled (not sure it's related, I've started to rewrite all data on pool, which makes a heavy load): The processes are in various states: 65747 1001 1 54 10 28620K 24360K tx-tx 0 6:58 0.00% cvsup 80383 1001 1 54 10 40616K 30196K select 1 5:38 0.00% rsync 1501 www 1 44 0 7304K 2504K zio-i 0 2:09 0.00% nginx 1479 www 1 44 0 7304K 2416K zio-i 1 2:03 0.00% nginx 1477 www 1 44 0 7304K 2664K zio-i 0 2:02 0.00% nginx 1487 www 1 44 0 7304K 2376K zio-i 0 1:40 0.00% nginx 1490 www 1 44 0 7304K 1852K zfs 0 1:30 0.00% nginx 1486 www 1 44 0 7304K 2400K zfsvfs 1 1:05 0.00% nginx And everything which wants to touch the pool is/becomes dead. Procstat says about one process: # procstat -k 1497 PID TID COMM TDNAME KSTACK 1497 100257 nginx - mi_switch sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter syscall Xfast_syscall No, it's not related. One of the disks in the RAIDZ2 pool went bad: (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error (da4:arcmsr0:0:4:0): SCSI status: Check Condition (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) and it seems it froze the whole zpool. Removing the disk by hand solved the problem. I've seen this previously on other machines with ciss. I wonder why ZFS didn't throw it out of the pool. Hold on a minute. An unrecoverable read error does not necessarily mean the drive is bad, it could mean that the individual LBA that was attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad block was encountered). I would check SMART stats on the disk (since these are probably SATA given use of arcmsr(4)) and provide those. *That* will tell you if the disk is bad. I'll help you decode the attributes values if you provide them. My understanding is that a single LBA read failure should not warrant ZFS marking the disk UNAVAIL in the pool. It should have incremented the READ error counter and that's it. Did you receive a *single* error for the disk and then things went catatonic? If the entire system got wedged (a soft wedge, e.g. kernel is still alive but nothing's happening in userland), that could be a different problem -- either with ZFS or arcmsr(4). Does ZFS have some sort of timeout value internal to itself where it will literally mark a disk UNAVAIL in the case that repeated I/O transactions takes too long? What is its error recovery methodology? Speaking strictly about Solaris 10 and ZFS: I have seen many, many times a system soft wedge after repeated I/O errors (read or write) are spewed out on the console for a single SATA disk (via AHCI), but only when the disk is used as a sole root filesystem disk (no mirror/raidz). My impression is that ZFS isn't the problem in this scenario. In most cases, post-mortem debugging on my part shows that disks encountered some CRC errors (indicating cabling issues, etc.), sometimes as few as 2, but something else went crazy -- or possibly ZFS couldn't mark the disk UNAVAIL (if it has that logic) because it's a single disk associated with root. Hardware in this scenario are Hitachi SATA disks with an ICH ESB2 controller, software is Solaris 10 (Generic_142901-06) with ZFS v15. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list
Re: New ZFSv28 patchset for 8-STABLE
On Sun, Jan 09, 2011 at 01:42:13PM +0100, Attila Nagy wrote: On 01/09/2011 01:18 PM, Jeremy Chadwick wrote: On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote: On 01/09/2011 10:00 AM, Attila Nagy wrote: On 12/16/2010 01:44 PM, Martin Matuska wrote: Hi everyone, following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am providing a ZFSv28 testing patch for 8-STABLE. Link to the patch: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz I've got an IO hang with dedup enabled (not sure it's related, I've started to rewrite all data on pool, which makes a heavy load): The processes are in various states: 65747 1001 1 54 10 28620K 24360K tx-tx 0 6:58 0.00% cvsup 80383 1001 1 54 10 40616K 30196K select 1 5:38 0.00% rsync 1501 www 1 440 7304K 2504K zio-i 0 2:09 0.00% nginx 1479 www 1 440 7304K 2416K zio-i 1 2:03 0.00% nginx 1477 www 1 440 7304K 2664K zio-i 0 2:02 0.00% nginx 1487 www 1 440 7304K 2376K zio-i 0 1:40 0.00% nginx 1490 www 1 440 7304K 1852K zfs 0 1:30 0.00% nginx 1486 www 1 440 7304K 2400K zfsvfs 1 1:05 0.00% nginx And everything which wants to touch the pool is/becomes dead. Procstat says about one process: # procstat -k 1497 PIDTID COMM TDNAME KSTACK 1497 100257 nginx-mi_switch sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter syscall Xfast_syscall No, it's not related. One of the disks in the RAIDZ2 pool went bad: (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error (da4:arcmsr0:0:4:0): SCSI status: Check Condition (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) and it seems it froze the whole zpool. Removing the disk by hand solved the problem. I've seen this previously on other machines with ciss. I wonder why ZFS didn't throw it out of the pool. Hold on a minute. An unrecoverable read error does not necessarily mean the drive is bad, it could mean that the individual LBA that was attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad block was encountered). I would check SMART stats on the disk (since these are probably SATA given use of arcmsr(4)) and provide those. *That* will tell you if the disk is bad. I'll help you decode the attributes values if you provide them. You are right, and I gave incorrect information. There are a lot more errors for that disk in the logs, and the zpool was frozen. I tried to offline the given disk. That helped in the ciss case, where the symptom is the same, or something similar, like there is no IO for ages, then something small and nothing for long seconds/minutes, and there are no errors logged. zpool status reported no errors, and the dmesg was clear too. There I could find the bad disk by watching gstat output and there I saw when the very small amount of IO was done, there was one disk with response times well above a second, while the others responded quickly. There the zpool offline helped. Here not, the command just got hang, like everything else. So what I did then: got into the areca-cli and searched for errors. One disk was set to failed and it seemed to be the cause. I've removed it (and did a camcontrol rescan, but I'm not sure it was necessary or not), and suddenly the zpool offline finished and everything went back to normal. But there are two controllers in the system and now I see that the above disk is on ctrl 1, while the one I have removed is on ctrl 2. I was misleaded by their same position. So now I have an offlined disk (which produces read errors, but I couldn't see them in the zpool output) and another, which is shown as failed in the RAID controller and got removed by hand (and solved the situation): NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 label/disk20-01 ONLINE 0 0 0 label/disk20-02 ONLINE 0 0 0 label/disk20-03 ONLINE 0 0 0 label/disk20-04 ONLINE 0 0 0 label/disk20-05 OFFLINE 0 0 0 label/disk20-06 ONLINE 0 0 0 label/disk20-07 ONLINE 0 0 0 label/disk20-08 ONLINE 0 0 0 label/disk20-09 ONLINE 0 0 0 label/disk20-10 ONLINE 0 0 0 label/disk20-11 ONLINE 0 0 0 label/disk20-12 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 label/disk21-01 ONLINE 0 0 0 label/disk21-02
Re: Panic 8.2 PRERELEASE WRITE_DMA48
I've run many fscks on /usr in single user because I had soft update inconsistencies, no DMA errors during those repairs. smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-PRERELEASE i386] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F1 DT series Device Model: SAMSUNG HD103UJ Serial Number:S13PJ9BQC02902 Firmware Version: 1AA01113 User Capacity:1,000,204,886,016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is:Sun Jan 9 16:40:24 2011 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (11811) seconds. Offline data collection capabilities:(0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 2) minutes. Extended self-test routine recommended polling time:( 198) minutes. Conveyance self-test routine recommended polling time:( 21) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051Pre-fail Always - 0 3 Spin_Up_Time0x0007 078 078 011Pre-fail Always - 7580 4 Start_Stop_Count0x0032 100 100 000Old_age Always - 399 5 Reallocated_Sector_Ct 0x0033 100 100 010Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 253 253 051Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015Pre-fail Offline - 10097 9 Power_On_Hours 0x0032 100 100 000Old_age Always - 2375 10 Spin_Retry_Count0x0033 100 100 051Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 392 13 Read_Soft_Error_Rate0x000e 100 100 000Old_age Always - 0 183 Runtime_Bad_Block 0x0032 100 100 000Old_age Always - 0 184 End-to-End_Error0x0033 100 100 000Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 057 052 000Old_age Always - 43 (Min/Max 42/45) 194 Temperature_Celsius 0x0022 056 050 000Old_age Always - 44 (Min/Max 42/46) 195 Hardware_ECC_Recovered 0x001a 100 100 000Old_age Always - 20728126 196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x003e 100 100 000Old_age Always - 1 200
Specifying root mount options on diskless boot.
Daniel Braniss writes... I have it pxebooting nicely and running with an NFS root but it then reports locking problems: devd, syslogd, moused (and maybe others) lock their PID file to protect against multiple instances. Unfortunately, these daemons all start before statd/lockd and so the locking fails and reports operation not supported. Are you mounting /var via nfs? We have been running FreeBSD diskless for several years, and have never run into this problem - but we use a memory filesystem. The memory filesystem can be quite small. Our methods are documented at http://www.nber.org/sys-admin/FreeBSD-diskless.html If that isn't the problem, can you guess what we are doing differently to avoid it? I note that the response to your message from danny offers the ability to pass arguments to the nfs mount command, but also seems to offer a fix for the fact that classes are not supported under PXE: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/90368 I hope danny will offer a patch to mainline code - it would be an important improvement (and already promised in the documentation). (I am sorry if this doesn't thread properly - I just joined the list after seeing the message). The thread is available at: http://lists.freebsd.org/pipermail/freebsd-stable/2011-January/060854.html Daniel Feenberg ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Panic 8.2 PRERELEASE WRITE_DMA48
On Sun, Jan 09, 2011 at 04:41:43PM +0100, Tom Vijlbrief wrote: I've run many fscks on /usr in single user because I had soft update inconsistencies, no DMA errors during those repairs. There's no 1:1 ratio between running fsck on a filesystem and seeing a DMA error. I should explain what I mean by that: just because you receive a read or write error from a disk during operation doesn't mean fsck will induce it. fsck simply checks filesystem tables and so on for integrity, it doesn't do the equivalent of a bad block scan, nor does it check (read) every data block referenced by an inode. So if you have a filesystem which has a bad block somewhere within a data block, fsck almost certainly won't catch this. ZFS, on the other hand (specifically a zpool scrub), would/should induce such. The reason I advocated booting into single-user and running a fsck manually is because there's confirmation that background fsck doesn't catch/handle all filesystem consistency errors that a foreground fsck does. This is why I continue to advocate background_fsck=no in rc.conf(5). That's for another discussion though. Let's review the disk: === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F1 DT series Device Model: SAMSUNG HD103UJ Serial Number:S13PJ9BQC02902 Firmware Version: 1AA01113 User Capacity:1,000,204,886,016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is:Sun Jan 9 16:40:24 2011 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled ... ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051Pre-fail Always - 0 3 Spin_Up_Time0x0007 078 078 011Pre-fail Always - 7580 4 Start_Stop_Count0x0032 100 100 000Old_age Always - 399 5 Reallocated_Sector_Ct 0x0033 100 100 010Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 253 253 051Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015Pre-fail Offline - 10097 9 Power_On_Hours 0x0032 100 100 000Old_age Always - 2375 10 Spin_Retry_Count0x0033 100 100 051Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 392 13 Read_Soft_Error_Rate0x000e 100 100 000Old_age Always - 0 183 Runtime_Bad_Block 0x0032 100 100 000Old_age Always - 0 184 End-to-End_Error0x0033 100 100 000Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 057 052 000Old_age Always - 43 (Min/Max 42/45) 194 Temperature_Celsius 0x0022 056 050 000Old_age Always - 44 (Min/Max 42/46) 195 Hardware_ECC_Recovered 0x001a 100 100 000Old_age Always - 20728126 196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x003e 100 100 000Old_age Always - 1 200 Multi_Zone_Error_Rate 0x000a 100 100 000Old_age Always - 0 201 Soft_Read_Error_Rate0x000a 100 100 000Old_age Always - 0 Your drive looks fine. Attribute 195 isn't anything to worry about (vendor-specific encoding makes this number appear large). Attribute 199 indicates one CRC error, but again nothing to worry about -- but could explain a single error during the lifetime of the drive (impossible to determine when it happened). SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 2361 - # 2 Short offline Completed without error 00% 2205 - # 3 Short offline Completed without error 00% 2138 - # 4 Extended offlineCompleted without error 00% 2109 - # 5 Short offline Completed without error 00% 2105 - # 6 Short offline Completed without error 00% 2092 - # 7 Short offline Completed without error 00%
Re; NFS performance
It has been suggested that I move this thread to freebsd-stable. The thread so far (deficient NFS performance in FreeBSD 8): http://lists.freebsd.org/pipermail/freebsd-hackers/2011-January/034006.html I updated my kernel to FreeBSD 8.2-PRERELEASE. This improved my throughput, but still not to the level I got from 7.3-STABLE. Here's an updated table from my original message: Observed bytes per second (dd if=filename of=/dev/null bs=65536): Source machine:mattapan scollay sullivan Destination machine: wonderland/7.3-STABLE 870K5.2M 1.8M wonderland/8.1-STABLE 496K690K 420K wonderland/8.2-PRERELEASE 800K1.2M 447K Furthermore, I was still able to induce the NFS server not responding message with 8.2-PRERELEASE. So I applied the patch from Rick Macklem. The throughput did not change, but I haven't seen the NFS server not responding message yet. As to an earlier question about NFS options: I'm not setting any, so they are whatever the automounter uses by default. -- George ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Specifying root mount options on diskless boot.
Daniel Braniss writes... I have it pxebooting nicely and running with an NFS root but it then reports locking problems: devd, syslogd, moused (and maybe others) lock their PID file to protect against multiple instances. Unfortunately, these daemons all start before statd/lockd and so the locking fails and reports operation not supported. Are you mounting /var via nfs? You can use the nolockd mount option to make locking happen locally in the client. (Only a problem if the file(s) being locked are concurrently shared with other clients.) I don't know if this would fix your diskless problem. rick ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Re; NFS performance
It has been suggested that I move this thread to freebsd-stable. The thread so far (deficient NFS performance in FreeBSD 8): http://lists.freebsd.org/pipermail/freebsd-hackers/2011-January/034006.html I updated my kernel to FreeBSD 8.2-PRERELEASE. This improved my throughput, but still not to the level I got from 7.3-STABLE. Here's an updated table from my original message: Observed bytes per second (dd if=filename of=/dev/null bs=65536): Source machine: mattapan scollay sullivan Destination machine: wonderland/7.3-STABLE 870K 5.2M 1.8M wonderland/8.1-STABLE 496K 690K 420K wonderland/8.2-PRERELEASE 800K 1.2M 447K Furthermore, I was still able to induce the NFS server not responding message with 8.2-PRERELEASE. So I applied the patch from Rick Macklem. The throughput did not change, but I haven't seen the NFS server not responding message yet. So, did the patch get rid of the 1min + stalls you reported earlier? Beyond that, all I can suggest is trying to fiddle with some of the options on the net device driver, such as rxcsum, txcsum and tso. (I think tso has had some issues for some drivers, but I don't know any specifics.) When I've seen poor NFS perf. it has usually been a problem at the network device driver level. (If you have a different kind of network card handy, you could try swapping them. Basically one with a different net chipset so that it uses a different net device driver.) rick ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Panic 8.2 PRERELEASE WRITE_DMA48
2011/1/9 Jeremy Chadwick free...@jdc.parodius.com: errno 6 is device not configured. ad4 is on a Silicon Image controller (thankfully a reliable model). Sadly AHCI (ahci.ko) isn't in use here; I would advocate switching to it (your device names will change however) and see if these errors continue (they'll appear as SCSI CAM errors though). ahci_load=yes in /boot/loader.conf should be enough. smartmontools does know to talk ATA to /dev/adaX (that's not a typo) disks. Made that change, ahci is loaded [...@swanbsd /usr/home/tom]$ kldstat Id Refs AddressSize Name 1 16 0xc040 bd9998 kernel 21 0xc0fda000 88a8 snd_emu10k1.ko 33 0xc0fe3000 579b0sound.ko 41 0xc103b000 4df90c nvidia.ko 51 0xc151b000 c108 ahci.ko [...@swanbsd /usr/home/tom]$ cat /boot/loader.conf snd_emu10k1_load=YES nvidia_load=YES ahci_load=YES #console=comconsole [...@swanbsd /usr/home/tom]$ But the device naming is unchanged: [...@swanbsd /usr/home/tom]$ ls /dev/a* /dev/acd0 /dev/ad0s2 /dev/ad4s1 /dev/ad4s2e /dev/apm0 /dev/acd1 /dev/ad0s5 /dev/ad4s2 /dev/ad4s2f /dev/ata /dev/acpi /dev/ad0s6 /dev/ad4s2a /dev/ad4s3 /dev/atkbd0 /dev/ad0/dev/ad0s7 /dev/ad4s2b /dev/ad4s4 /dev/audit /dev/ad0s1 /dev/ad4/dev/ad4s2d /dev/ad4s5 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Panic 8.2 PRERELEASE WRITE_DMA48
On Sun, Jan 09, 2011 at 09:02:16PM +0100, Tom Vijlbrief wrote: 2011/1/9 Jeremy Chadwick free...@jdc.parodius.com: errno 6 is device not configured. ad4 is on a Silicon Image controller (thankfully a reliable model). Sadly AHCI (ahci.ko) isn't in use here; I would advocate switching to it (your device names will change however) and see if these errors continue (they'll appear as SCSI CAM errors though). ahci_load=yes in /boot/loader.conf should be enough. smartmontools does know to talk ATA to /dev/adaX (that's not a typo) disks. Made that change, ahci is loaded [...@swanbsd /usr/home/tom]$ kldstat Id Refs AddressSize Name 1 16 0xc040 bd9998 kernel 21 0xc0fda000 88a8 snd_emu10k1.ko 33 0xc0fe3000 579b0sound.ko 41 0xc103b000 4df90c nvidia.ko 51 0xc151b000 c108 ahci.ko [...@swanbsd /usr/home/tom]$ cat /boot/loader.conf snd_emu10k1_load=YES nvidia_load=YES ahci_load=YES #console=comconsole [...@swanbsd /usr/home/tom]$ But the device naming is unchanged: [...@swanbsd /usr/home/tom]$ ls /dev/a* /dev/acd0 /dev/ad0s2 /dev/ad4s1 /dev/ad4s2e /dev/apm0 /dev/acd1 /dev/ad0s5 /dev/ad4s2 /dev/ad4s2f /dev/ata /dev/acpi /dev/ad0s6 /dev/ad4s2a /dev/ad4s3 /dev/atkbd0 /dev/ad0/dev/ad0s7 /dev/ad4s2b /dev/ad4s4 /dev/audit /dev/ad0s1 /dev/ad4/dev/ad4s2d /dev/ad4s5 I'm sorry, I gave you incorrect advice; I'm used to Intel controllers with AHCI, not Silicon Image controllers. Silicon Image controllers have their own driver: siis(4). Please change ahci_load=yes to siis_load=yes. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
On 6 January 2011 22:26, Chris Forgeron cforge...@acsi.ca wrote: You know, these days I'm not as happy with SSD's for ZIL. I may blog about some of the speed results I've been getting over the last 6mo-1yr that I've been running them with ZFS. I think people should be using hardware RAM drives. You can get old Gigabyte i-RAM drives with 4 gig of memory for the cost of a 60 gig SSD, and it will trounce the SSD for speed. (I'm making an updated comment on my previous comment. Sorry for the topic drift, but I think this is important to consider) I decided to do some tests between my Gigabyte i-RAM and OCZ Vertex 2 SSD. I've found that they are both very similar for Random 4K-aligned Write speed (I was receiving around 17,000 IOPS on both, slightly faster ms access time for the i-RAM). Now, if you're talking 512b aligned writes (which is what ZFS is unless you've tweaked the ashift value) you're going to win with an i-RAM device. The OCZ Drops down to ~6000 IOPS for 512b random writes. Please note, that's on a used Vertex 2. A fresh Vertex 2 was giving me 28,000 IOPS on 4k aligned writes - Faster than the i-RAM. But with more time, it will be slower than the i-RAM due to SSD fade. I'm seriously considering trading in my ZIL SSD's for i-RAM devices, they are around the same price if you can still find them, and they won't degrade like an SSD does. ZIL doesn't need much storage space. I think 12 gig (3 I-RAM's) would do nicely, and would give me an aggregate IOPS close to a ddrdrive for under $500. I did some testing with SSD Fade recently, here's the link to my blog on it if anyone cares for more detail - http://christopher-technicalmusings.blogspot.com/2011/01/ssd-fade-its-real-and-why-you-may-not.html I'm still using SSDs for my ZIL, but I think I'll be switching over to some sort of RAM device shortly. I wish the i-RAM in 3.5 format had proper SATA power connectors on the back so it could plug into my SAS backplane like the OCZ 3.5 SSDs do. As it stands, I'd have to rig something, as my SAN head doesn't have any PCI controller slots for the other i-RAM format. -Original Message- From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Markiyan Kushnir Sent: Friday, January 07, 2011 8:10 AM To: Jeremy Chadwick Cc: Chris Forgeron; freebsd-stable@freebsd.org; Artem Belevich; Jean-Yves Avenard Subject: Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks 2011/1/7 Jeremy Chadwick free...@jdc.parodius.com: On Fri, Jan 07, 2011 at 12:29:17PM +1100, Jean-Yves Avenard wrote: On 6 January 2011 22:26, Chris Forgeron cforge...@acsi.ca wrote: You know, these days I'm not as happy with SSD's for ZIL. I may blog about some of the speed results I've been getting over the last 6mo-1yr that I've been running them with ZFS. I think people should be using hardware RAM drives. You can get old Gigabyte i-RAM drives with 4 gig of memory for the cost of a 60 gig SSD, and it will trounce the SSD for speed. I'd put your SSD to L2ARC (cache). Where do you find those though. I've looked and looked and all references I could find was that battery-powered RAM card that Sun used in their test setup, but it's not publicly available.. DDRdrive: http://www.ddrdrive.com/ http://www.engadget.com/2009/05/05/ddrdrives-ram-based-ssd-is-snappy-costly/ ACard ANS-9010: http://techreport.com/articles.x/16255 GC-RAMDISK (i-RAM) products: http://us.test.giga-byte.com/Products/Storage/Default.aspx Be aware these products are absurdly expensive for what they offer (the cost isn't justified), not to mention in some cases a bottleneck is imposed by use of a SATA-150 interface. I'm also not sure if all of them offer BBU capability. In some respects you might be better off just buying more RAM for your system and making md(4) memory disks that are used by L2ARC (cache). I've mentioned this in the past (specifically back in the days when the ARC piece of ZFS on FreeBSD was causing havok, and asked if one could work around the complexity by using L2ARC with md(4) drives instead). Once you have got extra RAM, why not just reserve it directly to ARC (via vm.kmem_size[_max] and vfs.zfs.arc_max)? Markiyan. I tried this, but couldn't get rc.d/mdconfig2 to do what I wanted on startup WRT the aforementioned. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list
Re: usb errors with 8 stable
It's looking more like a hardware failure. I connected the phone to a friends HP Pavilion dv8xxx running FreeBSD 8 r217175 and it detected it created da devices. I've upgraded to r217175 and still doesn't work. When I get back to lab, I'll test hardware and go from there. Thanks for the suggestions, and I'll post an update when I know more. Beach Geek On Jan 5, 2011 12:38 AM, Jeremy Chadwick free...@jdc.parodius.com wrote: On Tue, Jan 04, 2011 at 11:37:48PM -0600, Beach Geek wrote: Compaq Presario 5xxx 2GHz FreeBSD 8 ... I would start by reviewing the commits for RELENG_8 between the two timeframes and try to narrow down which commit may have caused your problem. http://www.freshbsd.org/?branch=RELENG_8project=freebsd -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Panic 8.2 PRERELEASE WRITE_DMA48
2011/1/9 Jeremy Chadwick free...@jdc.parodius.com: I'm sorry, I gave you incorrect advice; I'm used to Intel controllers with AHCI, not Silicon Image controllers. Silicon Image controllers have their own driver: siis(4). Please change ahci_load=yes to siis_load=yes. Tried it, but no change, the siis driver does not support the 3512 according to its man page, so I'm probably stuck with the default driver. Did the full disk scan, smartctl -t select,0-max /dev/ad4. It completed this night, no errors, so the disk is fine. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Panic 8.2 PRERELEASE WRITE_DMA48
2011/1/9 Jeremy Chadwick free...@jdc.parodius.com: Not to get off topic, but what is causing this? It looks like you have a cron job or something very aggressive doing a smartctl -t short /dev/ad4 or equivalent. If you have such, please disable this immediately. You shouldn't be doing SMART tests with such regularity; it accomplishes absolutely nothing, especially the short tests. Let the drive operate normally, otherwise run smartd and watch logs instead. I have this default entry (from the author of that file) in smartd.conf and enabled it on many machines over the years. Is it a bad practice? # First (primary) ATA/IDE hard disk. Monitor all attributes, enable # automatic online data collection, automatic Attribute autosave, and # start a short self-test every day between 2-3am, and a long self test # Saturdays between 3-4am. /dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03) Thanks for all your feedback ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Panic 8.2 PRERELEASE WRITE_DMA48
On Mon, Jan 10, 2011 at 07:13:57AM +0100, Tom Vijlbrief wrote: 2011/1/9 Jeremy Chadwick free...@jdc.parodius.com: Not to get off topic, but what is causing this? It looks like you have a cron job or something very aggressive doing a smartctl -t short /dev/ad4 or equivalent. If you have such, please disable this immediately. You shouldn't be doing SMART tests with such regularity; it accomplishes absolutely nothing, especially the short tests. Let the drive operate normally, otherwise run smartd and watch logs instead. I have this default entry (from the author of that file) in smartd.conf and enabled it on many machines over the years. Is it a bad practice? # First (primary) ATA/IDE hard disk. Monitor all attributes, enable # automatic online data collection, automatic Attribute autosave, and # start a short self-test every day between 2-3am, and a long self test # Saturdays between 3-4am. /dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03) I'll have to talk to Bruce Allen about that. Those entries in smartd.conf are pretty old (meaning they've existed for a very long time, and chances are Bruce hasn't gone back to revamp them or reconsider the logic/justification behind them). I'm an opponent of running SMART tests automatically, given what some do to drives. It's important to remember that most SMART tests can be done while the drive is in operation, and some of theses tests stress the drive, which could potentially cause timeouts or other I/O anomalies (data loss is unlikely, but odd errors may occur; it all depends on the firmware). This is especially important WRT long tests. For example, on newer 2TB Western Digital Caviar Black drives, a long test does something that I haven't heard (yes, heard) any other drive do -- it emits a noise that's almost identical to that of a head crash. It could be scanning a very specific region of LBAs (possibly out-of-range sectors, e.g. spares) repetitively, but it sounds nothing like a selective LBA scan. Honestly it does sound like a head crash. Is this something you'd really want to be running every 7 days? I've always advocated that people run smartd only if they want to monitor attributes -- which ultimately are the most important things to keep an eye on anyway. It's even more important to know how to read them. :-) 90% of drives out there update their attributes at set intervals or when the SMART READ DATA command is encountered. And honestly I've never seen a SMART short test do anything useful, on any drive I've used (SATA or SCSI; WD, Seagate, Maxtor, Hitachi, Fujitsu). Long test are different in this regard. I'm fully aware that the terms short and long are vague in nature and don't really tell a person what the drive is doing behind the scenes. Sadly that's the nature of SMART; they're just tests that are defined on a per-vendor (or per-disk-model!) basis. But as my 2nd paragraph above implies, the behaviour is not consistent. So when people ask me how do I monitor my disks reliably with SMART then?, I tell them to either do it by hand (which is what I do), or run smartd(8) and keep an eye on their logs. This requires some tuning, and familiarity with what attribute means what, and again on a per-drive or per-vendor basis. It's great that there's no actual standard for these, isn't it? :-) -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org