Re: [zfs-discuss] zpool replace not concluding + duplicate drive label
On 28/10/2011, at 3:06 PM, Daniel Carosone wrote: > On Thu, Oct 27, 2011 at 10:49:22AM +1100, afree...@mac.com wrote: >> Hi all, >> >> I'm seeing some puzzling behaviour with my RAID-Z. >> > > Indeed. Start with zdb -l on each of the disks to look at the labels in more > detail. > > -- > Dan. I'm reluctant to include a monstrous wall of text so I've placed the output at http://dl.dropbox.com/u/19420697/zdb.out. Immediately I'm struck by the sad dearth of information on da6, the similarity of the da0 + da0/old subtree to the zpool status information and my total lack of knowledge on how to use this data in any beneficial fashion. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace not concluding + duplicate drive label
On Thu, Oct 27, 2011 at 10:49:22AM +1100, afree...@mac.com wrote: > Hi all, > > I'm seeing some puzzling behaviour with my RAID-Z. > Indeed. Start with zdb -l on each of the disks to look at the labels in more detail. -- Dan. pgpRTwLfC9flo.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace
Hi Doug, The "vms" pool was created in a non-redundant way, so there is no way to get the data off of it unless you can put back the original c0t3d0 disk. If you can still plug in the disk, you can always do a zpool replace on it afterwards. If not, you'll need to restore from backup, preferably to a pool with raidz or mirroring so zfs can repair faults automatically. On Mon, 15 Aug 2011, Doug Schwabauer wrote: Help - I've got a bad disk in a zpool and need to replace it. I've got an extra drive that's not being used, although it's still marked like it's in a pool. So I need to get the "xvm" pool destroyed, c0t5d0 marked as available, and replace c0t3d0 with c0t5d0. root@kc-x4450a # zpool status -xv pool: vms state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAME STATE READ WRITE CKSUM vms UNAVAIL 0 3 0 insufficient replicas c0t2d0 ONLINE 0 0 0 c0t3d0 UNAVAIL 0 6 0 experienced I/O failures c0t4d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: vms:<0x5> vms:<0xb> root@kc-x4450a # zpool replace -f vms c0t3d0 c0t5d0 cannot replace c0t3d0 with c0t5d0: pool I/O is currently suspended root@kc-x4450a # zpool import pool: xvm id: 14176680653869308477 state: DEGRADED status: The pool was last accessed by another system. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. see: http://www.sun.com/msg/ZFS-8000-EY config: xvm DEGRADED mirror-0 DEGRADED c0t4d0 FAULTED corrupted data c0t5d0 ONLINE Thanks! -Doug Regards, markm___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace lockup / replace process now stalled, how to fix?
For the record, in case anyone else experiences this behaviour: I tried various things which failed, and finally as a last ditch effort, upgraded my freebsd, giving me zpool v14 rather than v13 - and now it's resilvering as it should. Michael On Monday 17 May 2010 09:26:23 Michael Donaghy wrote: > Hi, > > I recently moved to a freebsd/zfs system for the sake of data integrity, > after losing my data on linux. I've now had my first hard disk failure; > the bios refused to even boot with the failed drive (ad18) connected, so I > removed it. I have another drive, ad16, which had enough space to replace > the failed one, so I partitioned it and attempted to use "zpool replace" > to replace the failed partitions for new ones, i.e. "zpool replace tank > ad18s1d ad16s4d". This seemed to simply hang, with no processor or disk > use; any "zpool status" commands also hung. Eventually I attempted to > reboot the system, which also eventually hung; after waiting a while, > having no other option, rightly or wrongly, I hard-rebooted. Exactly the > same behaviour happened with the other zpool replace. > > Now, my zpool status looks like: > arcueid ~ $ zpool status > pool: tank > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > ad4s1d ONLINE 0 0 0 > ad6s1d ONLINE 0 0 0 > ad9s1d ONLINE 0 0 0 > ad17s1dONLINE 0 0 0 > replacing DEGRADED 0 0 0 > ad18s1d UNAVAIL 0 9.62K 0 cannot open > ad16s4d ONLINE 0 0 0 > ad20s1dONLINE 0 0 0 > raidz2 DEGRADED 0 0 0 > ad4s1e ONLINE 0 0 0 > ad6s1e ONLINE 0 0 0 > ad17s1eONLINE 0 0 0 > replacing DEGRADED 0 0 0 > ad18s1e UNAVAIL 0 11.2K 0 cannot open > ad16s4e ONLINE 0 0 0 > ad20s1eONLINE 0 0 0 > > errors: No known data errors > > It looks like the replace has taken in some sense, but ZFS doesn't seem to > be resilvering as it should. Attempting to zpool offline doesn't work: > arcueid ~ # zpool offline tank ad18s1d > cannot offline ad18s1d: no valid replicas > Attempting to scrub causes a similar hang to before. Data is still readable > (from the zvol which is the only thing actually on this filesystem), > although slowly. > > What should I do to recover this / trigger a proper replace of the failed > partitions? > > Many thanks, > Michael > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace leaves pool degraded after resilvering
2009.06 is v111b, but you're running v111a. I don't know, but perhaps the a->b transition addressed this issue, among others? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace leaves pool degraded after resilvering
I forgot to mention this is a SunOS biscotto 5.11 snv_111a i86pc i386 i86pc version. Maurilio. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
[EMAIL PROTECTED] said: > Thanks for the tips. I'm not sure if they will be relevant, though. We > don't talk directly with the AMS1000. We are using a USP-VM to virtualize > all of our storage and we didn't have to add anything to the drv > configuration files to see the new disk (mpxio was already turned on). We > are using the Sun drivers and mpxio and we didn't require any tinkering to > see the new LUNs. Yes, the fact that the USP-VM was recognized automatically by Solaris drivers is a good sign. I suggest that you check to see what queue-depth and disksort values you ended up with from the automatic settings: echo "*ssd_state::walk softstate |::print -t struct sd_lun un_throttle" \ | mdb -k The "ssd_state" would be "sd_state" on an x86 machine (Solaris-10). The "un_throttle" above will show the current max_throttle (queue depth); Replace it with "un_min_throttle" to see the min, and "un_f_disksort_disabled" to see the current queue-sort setting. The HDS docs for 9500 series suggested 32 as the max_throttle to use, and the default setting (Solaris-10) was 256 (hopefully with the USP-VM you get something more reasonable). And while 32 did work for us, i.e. no operations were ever lost as far as I could tell, the array back-end -- the drives themselves, and the internal SATA shelf connections, have an actual queue depth of four for each array controller. The AMS1000 has the same limitation for SATA shelves, according to our HDS engineer. In short, Solaris, especially with ZFS, functions much better if it does not try to send more FC operations to the array than the actual physical devices can handle. We were actually seeing NFS client operations hang for minutes at a time when the SAN-hosted NFS server was making its ZFS devices busy -- and this was true even if clients were using different devices than the busy ones. We do not see these hangs after making the described changes, and I believe this is because the OS is no longer waiting around for a response from devices that aren't going to respond in a reasonable amount of time. Yes, having the USP between the host and the AMS1000 will affect things; There's probably some huge cache in there somewhere. But unless you've got cache of hundreds of GB in size, at some point a resilver operation is going to end up running at the speed of the actual back-end device. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
Thanks for the tips. I'm not sure if they will be relevant, though. We don't talk directly with the AMS1000. We are using a USP-VM to virtualize all of our storage and we didn't have to add anything to the drv configuration files to see the new disk (mpxio was already turned on). We are using the Sun drivers and mpxio and we didn't require any tinkering to see the new LUNs. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
[EMAIL PROTECTED] said: > I think we found the choke point. The silver lining is that it isn't the > T2000 or ZFS. We think it is the new SAN, an Hitachi AMS1000, which has > 7200RPM SATA disks with the cache turned off. This system has a very small > cache, and when we did turn it on for one of the replacement LUNs we saw a > 10x improvement - until the cache filled up about 1 minute later (was using > zpool iostat). Oh well. We have experience with a T2000 connected to the HDS 9520V, predecessor to the AMS arrays, with SATA drives, and it's likely that your AMS1000 SATA has similar characteristics. I didn't see if you're using Sun's drivers to talk to the SAN/array, but we are using Solaris-10 (and Sun drivers + MPXIO), and since the Hitachi storage isn't automatically recognized (sd/ssd, scsi_vhci), it took a fair amount of tinkering to get parameters adjusted to work well with the HDS storage. The combination that has given us best results with ZFS is: (a) Tell the array to ignore SYNCHRONIZE_CACHE requests from the host. (b) Balance drives within each AMS disk shelf across both array controllers. (c) Set the host's max queue depth to 4 for the SATA LUN's (sd/ssd driver). (d) Set the host's disable_disksort flag (sd/ssd driver) for HDS LUN's. Here's the reference we used for setting the parameters in Solaris-10: http://wikis.sun.com/display/StorageDev/Parameter+Configuration Note that the AMS uses read-after-write verification on SATA drives, so you only have half the IOP's for writes that the drives are capable of handling. We've found that small RAID volumes (e.g. a two-drive mirror) are unbelievably slow, so you'd want to go toward having more drives per RAID group, if possible. Honestly, if I recall correctly what I saw in your "iostat" listings earlier, your situation is not nearly as "bad" as with our older array. You don't seem to be driving those HDS LUN's to the extreme busy states that we have seen on our 9520V. It was not unusual for us to see LUN's at 100% busy, 100% wait, with 35 ops total in the "actv" and "wait" columns, and I don't recall seeing any 100%-busy devices in your logs. But getting the FC queue-depth (max-throttle) setting to match what the array's back-end I/O can handle greatly reduced the long "zpool status" and other I/O-related hangs that we were experiencing. And disabling the host-side FC queue-sorting greatly improved the overall latency of the system when busy. Maybe it'll help yours too. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
I think we found the choke point. The silver lining is that it isn't the T2000 or ZFS. We think it is the new SAN, an Hitachi AMS1000, which has 7200RPM SATA disks with the cache turned off. This system has a very small cache, and when we did turn it on for one of the replacement LUNs we saw a 10x improvement - until the cache filled up about 1 minute later (was using zpool iostat). Oh well. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
It's something we've considered here as well. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
Would any of this have to do with the system being a T2000? Would ZFS resilvering be affected by single threadedness, slowish US-T1 clock speed or lack of strong FPU performance? On 12/1/08, Alan Rubin <[EMAIL PROTECTED]> wrote: > We will be considering it in the new year, but that will not happen in time > to affect our current SAN migration. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- -- Matt Walburn http://mattwalburn.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
We will be considering it in the new year, but that will not happen in time to affect our current SAN migration. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
Have you considered moving to 10/08 ? ZFS resilver performance is much improved in this release, and I suspect that code might help you. You can easily test upgrading with Live Upgrade. I did the transition using LU and was very happy with the results. For example, I added a disk to a mirror and resilvering the new disk took about 6 min for almost 300GB, IIRC. Blake On Mon, Dec 1, 2008 at 11:04 PM, Alan Rubin <[EMAIL PROTECTED]> wrote: > I had posted at the Sun forums, but it was recommended to me to try here as > well. For reference, please see > http://forums.sun.com/thread.jspa?threadID=5351916&tstart=0. > > In the process of a large SAN migration project we are moving many large > volumes from the old SAN to the new. We are making use of the 'replace' > function to replace the old volumes with similar or larger new volumes. This > process is moving very slowly, sometimes as slow as only moving one > percentage of data every 10 minutes. Is there any way to streamline this > method? The system is Solaris 10 08/07. How much is dependent on the activity > of the box? How about on the architecture of the box? The primary system in > question at this point is a T2000 with 8GB of RAM and a 4-core CPU. This > server has 6 4Gb fibre channel connections to our SAN environment. At times > this server is quite busy because it is our backup server, but performance > seems no better when backup operations have ceased their daily activities. > > Our pools are only stripes. Would we expect better performance from a mirror > or raidz pool? It is worrisome that if the environment were compromised by a > failed disk that it could take so long to replace and correct the usual > redundancies (if it was a mirror or raidz pool). > > I have previously applied the kernel change described here: > http://blogs.digitar.com/jjww/?itemid=52 > > I just moved a 1TB volume which took approx. 27h. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace not working
Marc, Thanks - you were right - I had two identical drives and I mixed them up. It's going through the resilver process now... I expect it will run all night. Breandan On Jul 27, 2008, at 11:20 PM, Marc Bevand wrote: > It looks like you *think* you are trying to add the new drive, when > you are in > fact re-adding the old (failing) one. A new drive should never show > up as > ONLINE in a pool with no action from your part, if only because it > contains no > partition and no vdev label with the right pool GUID. > > If I am right, try to add the other drive. > > If I am wrong, you somehow managed to confuse ZFS.. You can prevent > ZFS from > thinking c2d1 is already part of the pool by deleting the partition > table on > it: > $ dd if=/dev/zero of=/dev/rdsk/c2d1p0 bs=512 count=1 > $ zpool import > (it should show you the pool is now ready to be imported) > $ zpool import tank > $ zpool replace tank c2d1 > > At this point it should be resilvering... > > -marc > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace not working
It looks like you *think* you are trying to add the new drive, when you are in fact re-adding the old (failing) one. A new drive should never show up as ONLINE in a pool with no action from your part, if only because it contains no partition and no vdev label with the right pool GUID. If I am right, try to add the other drive. If I am wrong, you somehow managed to confuse ZFS.. You can prevent ZFS from thinking c2d1 is already part of the pool by deleting the partition table on it: $ dd if=/dev/zero of=/dev/rdsk/c2d1p0 bs=512 count=1 $ zpool import (it should show you the pool is now ready to be imported) $ zpool import tank $ zpool replace tank c2d1 At this point it should be resilvering... -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss