wd0 read timeouts - how to proceed?
Must be the holiday season *sigh* my OpenBSD server is suddenly giving the occassional read-timeout on the /var slice of the main harddisk: --- wd0(pciide0:0:0): timeout type: ata c_bcount: 65536 c_skip: 0 wd0g: device timeout reading fsbn 17002464 of 17002464-17002591 (wd0 bn 67334928; cn 66800 tn 8 sn 24), retrying wd0: soft error (corrected) --- Is this the actual disk or the controller/other hardware? Either way it needs a fix. My problem is this is a live system that is not close by. I would very much prefer to 'fix' this remotely to buy some time to replace the machine completely. I do have offsite backups of essential data but not a spare system in the rack at this very moment. Not to mention I would like to avoid spending X-mas alone in the datacenter. There is a second harddisk installed, with OpenBSD formatted slices, but of different proportions. This (larger) disk is unused, so data / layout may be wiped, so it seems like smart idea to copy the data at least (I do have offsite backups of essential data but not a spare system in the rack at this very moment) Can I just copy /var (wd0g) to /var2 (wd1i) and remount or should I proceed otherwise or would copy/remounting /var simply not work on a live system? Or, possibly, I could 'clone' the whole wd0 disk to wd1 and use that instead of wd1? I understood you will need to boot in single user mode for this [1] and or have identical disks [2], or is there another (remote-safe) way? Any advice is highly appreciated! Thanks, and happy holidays, Matt [1] http://unixsadm.blogspot.com/2007/08/cloning-disk-in-openbsd.html [2] http://monkey.org/openbsd/archive/tech/0112/msg00079.html
Re: wd0 read timeouts - how to proceed?
On Fri, Dec 24, 2010 at 11:00:48AM +0100, Webcharge wrote: Must be the holiday season *sigh* my OpenBSD server is suddenly giving the occassional read-timeout on the /var slice of the main harddisk: There is a second harddisk installed, with OpenBSD formatted slices, but of different proportions. This (larger) disk is unused, so data / layout may be wiped, so it seems like smart idea to copy the data at least (I do have offsite backups of essential data but not a spare system in the rack at this very moment) Can I just copy /var (wd0g) to /var2 (wd1i) and remount or should I proceed otherwise or would copy/remounting /var simply not work on a live system? If the system is quiet, you can try 'sync; sync; dd ...; fsck', but something like 'tar cpf - | tar xpf -' is more likely to get you a somewhat consistent view. Change /etc/fstab and reboot (you *can* try mounting the new /var over the old one, but you'll want to play with fstat -n to see which processes are still accessing the old /var.) Of course, this isn't guaranteed to work. In particular, if something is actually writing to /var, your view won't be consistent. Even more in particular, don't try this with running databases. Joachim
Re: wd0 read timeouts - how to proceed?
2010/12/24 Joachim Schipper joac...@joachimschipper.nl: something like 'tar cpf - | tar xpf -' is more likely to get you a somewhat consistent view. POSIX pax(1) with -rw options should work slightly faster (and it's already faster to type ;) ). -- WBR, Vadim Zhukov
Re: wd0 read timeouts - how to proceed?
On Fri, Dec 24, 2010 at 5:00 AM, Webcharge webcha...@gmx.net wrote: Is this the actual disk or the controller/other hardware? If the hardware is smart aware installing smartmontools and running smartctl may give you a clue.
Re: wd0 read timeouts - how to proceed?
On 12/24/10 17:09, Chris Smith wrote: On Fri, Dec 24, 2010 at 5:00 AM, Webchargewebcha...@gmx.net wrote: Is this the actual disk or the controller/other hardware? If the hardware is smart aware installing smartmontools and running smartctl may give you a clue. atactl(8) works just fine.