Re: [zfs-discuss] HP Proliant DL360 G7
On Jul 2, 2012, at 7:57 PM, Richard Elling wrote: FYI, HP also sells an 8-port IT-style HBA (SC-08Ge), but it is hard to locate with their configurators. There might be a more modern equivalent cleverly hidden somewhere difficult to find. -- richard Richard, Do you know if the HBAs in HP controllers be swapped out with any well characterized (by nexenta) HBAs like the 9211-8e or do they require a specific 'controller HBA' like the SC-08Ge? IE, does it void the warranty if you open up the controller and stick a third party card in there? Did you ever try to 'bypass' the controllers at all and just plug into an expander? I prefer HP hardware also but the controller is getting in the way. Ill be asking HP the same questions in the next few weeks with any luck but your opinion and experiences are on another level compared to HPs pre-sales department... not that theyre bad but in this realm youre the man :) Thanks, Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HP Proliant DL360 G7
Good call Saso. Sigh... I guess I wait to hear from HP on supported IT mode HBAs in their D2000s or other jbods. On Tue, Jan 8, 2013 at 11:40 AM, Sašo Kiselkov skiselkov...@gmail.comwrote: On 01/08/2013 04:27 PM, mark wrote: On Jul 2, 2012, at 7:57 PM, Richard Elling wrote: FYI, HP also sells an 8-port IT-style HBA (SC-08Ge), but it is hard to locate with their configurators. There might be a more modern equivalent cleverly hidden somewhere difficult to find. -- richard Richard, Do you know if the HBAs in HP controllers be swapped out with any well characterized (by nexenta) HBAs like the 9211-8e or do they require a specific 'controller HBA' like the SC-08Ge? IE, does it void the warranty if you open up the controller and stick a third party card in there? Did you ever try to 'bypass' the controllers at all and just plug into an expander? I prefer HP hardware also but the controller is getting in the way. Ill be asking HP the same questions in the next few weeks with any luck but your opinion and experiences are on another level compared to HPs pre-sales department... not that theyre bad but in this realm youre the man :) I know you didn't ask me, but I can tell you my experience: it depends on what you mean by warranty. If you mean as in warranty on sales of goods (as mandated by law), then no, sticking a different HBA in your servers does not void your warranty (unless this is expressly labeled on the product - manufacturers typically also put protective labels on screws then). When it comes to support services, though, such as phone support and firmware updates, then yes, using a third-party HBA can make these difficult and/or impossible. HP storage enclosure and drive firmware, for example, can only be flashed through an HP-branded SmartArray card. Depending on what software you are running on the machines it can make no difference at all, or a lot of difference. For instance, if you're running proprietary storage controller software on the server (think something like NexentaStor, but from the HW vendor), then your custom HBA might simply be flat out unsupported and the only response you'll get from the vendor support team is stick the card we shipped it with back in. OTOH if you're running something not HW vendor-specific (like the aforementioned NexentaStor, or any other Illumos variant), and the HW vendor at least gives lip service to supporting your platform (always tell the support folk you're running Solaris), then chances are that your support contract will be just as valid as before. I've had drives fail on Dell machines and each time support was happy when I just told them drive dead, running Solaris, here's the log output, send a new one please. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 11/16/12 17:15, Peter Jeremy wrote: I have been tracking down a problem with zfs diff that reveals itself variously as a hang (unkillable process), panic or error, depending on the ZFS kernel version but seems to be caused by corruption within the pool. I am using FreeBSD but the issue looks to be generic ZFS, rather than FreeBSD-specific. The hang and panic are related to the rw_enter() in opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk() There is probably nothing wrong with the snapshots. This is a bug in ZFS diff. The ZPL parent pointer is only guaranteed to be correct for directory objects. What you probably have is a file that was hard linked multiple times and the parent pointer (i.e. directory) was recycled and is now a file The error is: Unable to determine path or stats for object 2128453 in tank/beckett/home@20120518: Invalid argument A scrub reports no issues: root@FB10-64:~ # zpool status pool: tank state: ONLINE status: The pool is formatted using a legacy on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on software that does not support feature flags. scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 ada2 ONLINE 0 0 0 errors: No known data errors But zdb says that object is the child of a plain file - which isn't sane: root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2128453 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=0:266a0efa00:200 DVA[1]=0:31b07fbc00:200 [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2128453116K 1.50K 1.50K 1.50K 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path???object#2128453 uid 1000 gid 1000 atime Fri Mar 23 16:34:52 2012 mtime Sat Oct 22 16:13:42 2011 ctime Sun Oct 23 21:09:02 2011 crtime Sat Oct 22 16:13:42 2011 gen 2237174 mode100444 size1089 parent 2242171 links 1 pflags 4080004 xattr 0 rdev0x root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2242171 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=0:266a0efa00:200 DVA[1]=0:31b07fbc00:200 [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2242171316K 128K 25.4M 25.5M 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 203 path/jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png uid 1000 gid 1000 atime Fri Mar 23 16:41:53 2012 mtime Mon Oct 24 21:15:56 2011 ctime Mon Oct 24 21:15:56 2011 crtime Mon Oct 24 21:15:37 2011 gen 2286679 mode100644 size26625731 parent 7001490 links 1 pflags 4080004 xattr 0 rdev0x root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 7001490 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=0:266a0efa00:200 DVA[1]=0:31b07fbc00:200 [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 7001490116K512 1K512 100.00 ZFS directory 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path/jashank/Pictures/sch/pdm-a4-11 uid 1000 gid 1000 atime Thu May 17 03:38:32 2012 mtime Mon Oct 24 21:15:37 2011 ctime Mon Oct 24 21:15:37 2011 crtime Fri Oct 14 22:17:44 2011 gen 2088407 mode40755 size6 parent 6370559 links 2 pflags 4080144 xattr 0 rdev0x microzap: 512 bytes, 4 entries
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 11/19/12 1:14 PM, Jim Klimov wrote: On 2012-11-19 20:58, Mark Shellenbaum wrote: There is probably nothing wrong with the snapshots. This is a bug in ZFS diff. The ZPL parent pointer is only guaranteed to be correct for directory objects. What you probably have is a file that was hard linked multiple times and the parent pointer (i.e. directory) was recycled and is now a file Interesting... do the ZPL files in ZFS keep pointers to parents? The parent pointer for hard linked files is always set to the last link to be created. $ mkdir dir.1 $ mkdir dir.2 $ touch dir.1/a $ ln dir.1/a dir.2/a.linked $ rm -rf dir.2 Now the parent pointer for a will reference a removed directory. The parent pointer is a single 64 bit quantity that can't track all the possible parents a hard linked file could have. Now when the original dir.2 object number is recycled you could have a situation where the parent pointer for points to a non-directory. The ZPL never uses the parent pointer internally. It is only used by zfs diff and other utility code to translate object numbers to full pathnames. The ZPL has always set the parent pointer, but it is more for debugging purposes. How in the COW transactiveness could the parent directory be removed, and not the pointer to it from the files inside it? Is this possible in current ZFS, or could this be a leftover in the pool from its history with older releases? Thanks, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Trick to keeping NFS file references in kernel memory for Dtrace?
Hey all, So I have a couple of storage boxes (NexentaCore Illumian) and have been playing with some DTrace scripts to monitor NFS usage. Initially I ran into the (seemingly common) problem of basically everything showing up as 'Unknown', and then after some searching online I found a workaround was to do a 'find' on the file system from the remote end and it would refresh the kernels knowledge of the files. This works.. however it doesn't stay for good. It seems to sometimes last a couple of hours (and sometimes much less) and then we are back to receiving Unknown's. Has anyone else come across something similar? Does anyone know what may be causing the kernel to lose the references? There is plenty of memory in the main system (72gb with ARC sitting ~53gb and 11gb 'free'), so I don't think a OOM situation is causing it. Otherwise does anyone have any other tips for monitoring usage? I wonder how they have it all working in Fishworks gear as some of the analytics demos show you being able to drill down on through file activity in real time. Any advice or suggestions greatly appreciated. Cheers, Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zpool recovery after too many failed disks
RAIDz set, lost a disk, replaced it... lost another disk during resilver. Replaced it, ran another resilver, and now it shows all disks with too many errors. Safe to say this is getting rebuilt and restored, or is there hope to recover some of the data? I assume this is the case because rpool/filemover has errors, is that fixable? # zpool status -v pool: rpool state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 4h51m with 190449 errors on Sat Aug 25 05:45:12 2012 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 455K 0 0 raidz1 DEGRADED 455K 0 0 c3t0d0DEGRADED 0 0 0 too many errors c2t1d0DEGRADED 0 0 0 too many errors replacing UNAVAIL 0 0 0 insufficient replicas c2t0d0s0/o FAULTED 0 0 0 too many errors c2t0d0 FAULTED 0 0 0 too many errors c3t1d0DEGRADED 0 0 0 too many errors c4t0d0DEGRADED 0 0 0 too many errors c4t1d0DEGRADED 0 0 0 too many errors errors: Permanent errors have been detected in the following files: rpool/filemover:0x1 # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool6.64T 0 29.9K /rpool rpool/filemover 6.64T 323G 6.32T - Thanks Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with ESX NFS store on ZFS
Thank you, it was the NFS ACL I had wrong! Fixed now and working on all 3 nodes. I changed below and it works now, very simple can't believe I missed that zfs get sharenfs pool1/nas/vol1 sharenfs rw,nosuid,root=192.168.1.52 local zfs get sharenfs pool1/nas/vol1 sharenfs rw,nosuid,root=192.168.1.52:192.168.1.51:192.168.1.53 local -Original Message- From: Jim Klimov [mailto:jimkli...@cos.ru] Sent: Wednesday, February 29, 2012 1:44 PM To: Mark Wolek Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Problem with ESX NFS store on ZFS 2012-02-29 21:15, Mark Wolek wrote: Running Solaris 11 with ZFS and the VM's on this storage can only be opened and run on 1 ESX host, if I move the files to another host I get access denied, even though root has full permissions to the files. Any ideas or does it ring any bells for anyone before I contact VMware or something? Probably, NFS UID mappng is faulty, or the NFS server ACL does not allow for another server. For UID mapping, in particular see the domain name settings: /etc/defaultdomain /etc/resolv.conf (search, domain lines) /etc/default/nfs or appropriate SMF settings (NFSMAPID_DOMAIN) For NFS ACL see the sharenfs property: # zfs set sharenfs='rw=esx:cvs:.domain.com:.jumbo.domain.com:@192.168.127.0/24,root=esx:cvs:192.168.127.99' pool/esxfiles Critical fields are 'rw', 'ro' and 'root' lists of hosts or subnets of clients which have appropriate types of access. For hosts not in 'root' list, their allowed 'ro' or 'rw' access as root user will be remapped to nobody. You might also want 'anon=0,sec=sys' which seem to be appended by default on my installations of Solaris, not sure if it is the default in Sol11. Note that clients' hostnames can be resolved via /etc/hosts, DNS or LDAP, as configured in your /etc/nsswitch.conf, and sometimes via /etc/inet/ipnodes as well as a fallback mechanism. Your server only gets one shot at resolving the client's name, and if it is not literally the same as in NFS ACL, access is denied. You might want to fall back to domain-based or subnet-based ACLs (may require the @ character). For pointers to server-side ACL denials see the server's dmesg with entries resembling this: Feb 29 19:35:01 thumper mountd[10782]: [ID 770583 daemon.error] esx.demo.domain.com denied access to /esxfiles/vm5 In particular, the entry produces the client's hostname as the server resolved it, so you can see if your ACL (or naming service) was misconfigured. HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Problem with ESX NFS store on ZFS
Running Solaris 11 with ZFS and the VM's on this storage can only be opened and run on 1 ESX host, if I move the files to another host I get access denied, even though root has full permissions to the files. Any ideas or does it ring any bells for anyone before I contact VMware or something? Thanks Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup
You can see the original ARC case here: http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.alt On 8 Dec 2011, at 16:41, Ian Collins wrote: On 12/ 9/11 12:39 AM, Darren J Moffat wrote: On 12/07/11 20:48, Mertol Ozyoney wrote: Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware. The only vendor i know that can do this is Netapp In fact , most of our functions, like replication is not dedup aware. For example, thecnicaly it's possible to optimize our replication that it does not send daya chunks if a data chunk with the same chechsum exists in target, without enabling dedup on target and source. We already do that with 'zfs send -D': -D Perform dedup processing on the stream. Deduplicated streams cannot be received on systems that do not support the stream deduplication feature. Is there any more published information on how this feature works? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] First zone creation - getting ZFS error
I'm running OI 151a. I'm trying to create a zone for the first time, and am getting an error about zfs. I'm logged in as me, then su - to root before running these commands. I have a pool called datastore, mounted at /datastore Per the wiki document http://wiki.openindiana.org/oi/Building+in+zones, I first created the zfs file system (note that the command syntax in the document appears to be wrong, so I did the options I wanted separately): zfs create datastore/zones zfs set compression=on datastore/zones zfs set mountpoint=/zones datastore/zones zfs list shows: NAME USED AVAIL REFER MOUNTPOINT datastore 28.5M 7.13T 57.9K /datastore datastore/dbdata28.1M 7.13T 28.1M /datastore/dbdata datastore/zones 55.9K 7.13T 55.9K /zones rpool 27.6G 201G45K /rpool rpool/ROOT 2.89G 201G31K legacy rpool/ROOT/openindiana 2.89G 201G 2.86G / rpool/dump 12.0G 201G 12.0G - rpool/export5.53M 201G32K /export rpool/export/home 5.50M 201G32K /export/home rpool/export/home/mcreamer 5.47M 201G 5.47M /export/home/mcreamer rpool/swap 12.8G 213G 137M - Then I went about creating the zone: zonecfg -z zonemaster create set autoboot=true set zonepath=/zones/zonemaster set ip-type=exclusive add net set physical=vnic0 end exit That all goes fine, then... zoneadm -z zonemaster install which returns... ERROR: the zonepath must be a ZFS dataset. The parent directory of the zonepath must be a ZFS dataset so that the zonepath ZFS dataset can be created properly. Since the zfs dataset datastore/zones is created, I don't understand what the error is trying to get me to do. Do I have to do: zfs create datastore/zones/zonemaster before I can create a zone in that path? That's not in the documentation, so I didn't want to do anything until someone can point out my error for me. Thanks for your help! -- Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Log disk with all ssd pool?
Still kicking around this idea and didn't see it addressed in any of the threads before the forum closed. If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down? Thinking rotate MLC drives with sandforce controllers every few years to avoid losing a drive to sorry no more writes aloud scenarios. Thanks Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Log disk with all ssd pool?
Having the log disk slowed it down a lot in your tests (when it wasn't a SSD), 30MB/s vs 7. Is this is also a 100% write / 100% sequential workload? Forcing sync? It's gotten to the point where I can buy a 120G SSD for less or the same price as a 146G SAS disk...Sure the MLC drives have limited lifetime, but at $150 (and dropping) just replace them every few years to be safe, work out a rotation/rebuild cycle, it's tempting... I suppose if we do end up buying all SSD's it becomes really easy to test if we should use a log or not! From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Neil Perrin Sent: Friday, October 28, 2011 11:38 AM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Log disk with all ssd pool? On 10/28/11 00:54, Neil Perrin wrote: On 10/28/11 00:04, Mark Wolek wrote: Still kicking around this idea and didn't see it addressed in any of the threads before the forum closed. If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down? Thinking rotate MLC drives with sandforce controllers every few years to avoid losing a drive to sorry no more writes aloud scenarios. Thanks Mark Interesting question. I don't think there's a straightforward answer. Oracle uses write optimised log devices and read optimised cache devices in it's appliances. However, assuming all the SSDs are the same then I suspect neither a log nor a cache device would help: Log If there is a log then it is solely used, and can be written to in parallel with periodic TXG commit writes to the other pool devices. If that log were part of the pool then the ZIL code will spread the load among all pool devices, but will compete with TXG commit writes. My gut feeling is that this would be the higher performing option though. I think, a long time ago, I experimented with designating one disk out of the pool as a log and saw degradation on synchronous performance. That seems to be the equivalent to your SSD question. Cache Similarly for cache devices the read would compete at TXG commit writes, but otherwise performance ought to be higher. Neil. Did some quick tests with disks to check if my memory was correct. 'sb' is a simple problem to spawn a number of threads to fill a file of a certain size with specified sized non zero writes. Bandwidth is also important. 1. Simple 2 disk system. 32KB synchronous writes filling 1GB with 20 threads zpool create whirl 2 disks; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 20 Elapsed time 95s 10.8MB/s zpool create whirl disk log disk ; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 20 Elapsed time 151s 6.8MB/s 2. Higher end 6 disk system. 32KB synchronous writes filling 1GB with 100 threads zpool create whirl 6 disks; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 33s 31MB/s zpool create whirl 5 disks log 1disk; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 147s 7.0MB/s and for interest: zpool create whirl 5 disk log SSD; zfs set recordsize=32k whirl st1 -n /whirl/f -f 1073741824 -b 32768 -t 100 Elapsed time 8s 129MB/s 3. Higher end smaller writes 2K synchronous writes filling 128MB with 100 threads zpool create whirl 6 disks: zfs set recordsize=1k whirl st1 -n /whirl/f -f 134217728 -b 2048 -t 100 Elapsed time 16s 8.2MB/s zpool create whirl 5 disks log 1 disk zfs set recordsize=1k whirl ds8 -n /whirl/f -f 134217728 -b 2048 -t 100 Elapsed time 24s 5.5MB/s ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] File contents changed with no ZFS error
Why don't you see which byte differs, and how it does? Maybe that would suggest the failure mode. Is it the same byte data in all affected files, for instance? Mark Sent from my iPhone On Oct 22, 2011, at 2:08 PM, Robert Watzlavick rob...@watzlavick.com wrote: On Oct 22, 2011, at 13:14, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: How can you outrule the possibility of something changed the file. Intentionally, not as a form of filesystem corruption. I suppose that's possible but seems unlikely. One byte on a file changed on the disk with no corresponding change in the mod time seems unlikely. I did access that file for read sometime I'm the past few months but again, if it had accidentally been written to, the time would have been updated. If you have snapshots on your ZFS filesystem, you can use zhist (or whatever technique you want) to see in which snapshot(s) it changed, and find all the unique versions of it. 'Course that will only give you any valuable information if you have different versions of the file in different snapshots. I only have one or two snapshots but I'll look. Thanks, -Bob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about btrfs and zfs
On Oct 18, 2011, at 11:09 AM, Nico Williams wrote: On Tue, Oct 18, 2011 at 9:35 AM, Brian Wilson wrote: I just wanted to add something on fsck on ZFS - because for me that used to make ZFS 'not ready for prime-time' in 24x7 5+ 9s uptime environments. Where ZFS doesn't have an fsck command - and that really used to bug me - it does now have a -F option on zpool import. To me it's the same functionality for my environment - the ability to try to roll back to a 'hopefully' good state and get the filesystem mounted up, leaving the corrupted data objects corrupted. [...] Yes, that's exactly what it is. There's no point calling it fsck because fsck fixes individual filesystems, while ZFS fixups need to happen at the volume level (at volume import time). It's true that this should have been in ZFS from the word go. But it's there now, and that's what matters, IMO. Doesn't a scrub do more than what 'fsck' does? It's also true that this was never necessary with hardware that doesn't lie, but it's good to have it anyways, and is critical for personal systems such as laptops. IIRC, fsck was seldom needed at my former site once UFS journalling became available. Sweet update. Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirror Gone
On 27 Sep 2011, at 18:29, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Tony MacDoodle Now: mirror-0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0ONLINE 0 0 0 c1t5d0ONLINE 0 0 0 There is only one way for this to make sense: You did not have mirror-1 in the first place. An easy way to tell is taking a look at the zpool history command for this pool. What does that show? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] All drives intact but vdev UNAVAIL in raidz1
On Tue, 6 Sep 2011, Tyler Benster wrote: It seems quite likely that all of the data is intact, and that something different is preventing me from accessing the pool. What can I do to recover the pool? I have downloaded the Solaris 11 express livecd if that would be of any use. Try running zdb -l on the disk and see if the labels are still there. Also, could you show us the output of 'zpools status'? Normally zfs would not hang if one disk of a raidz group is missing, but it might do that if one toplevel is missing. If the zdb command shows all four labels to be correct, then you can try a zpool scrub and see if that resilvers the data for you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace
Hi Doug, The vms pool was created in a non-redundant way, so there is no way to get the data off of it unless you can put back the original c0t3d0 disk. If you can still plug in the disk, you can always do a zpool replace on it afterwards. If not, you'll need to restore from backup, preferably to a pool with raidz or mirroring so zfs can repair faults automatically. On Mon, 15 Aug 2011, Doug Schwabauer wrote: Help - I've got a bad disk in a zpool and need to replace it. I've got an extra drive that's not being used, although it's still marked like it's in a pool. So I need to get the xvm pool destroyed, c0t5d0 marked as available, and replace c0t3d0 with c0t5d0. root@kc-x4450a # zpool status -xv pool: vms state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAME STATE READ WRITE CKSUM vms UNAVAIL 0 3 0 insufficient replicas c0t2d0 ONLINE 0 0 0 c0t3d0 UNAVAIL 0 6 0 experienced I/O failures c0t4d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: vms:0x5 vms:0xb root@kc-x4450a # zpool replace -f vms c0t3d0 c0t5d0 cannot replace c0t3d0 with c0t5d0: pool I/O is currently suspended root@kc-x4450a # zpool import pool: xvm id: 14176680653869308477 state: DEGRADED status: The pool was last accessed by another system. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. see: http://www.sun.com/msg/ZFS-8000-EY config: xvm DEGRADED mirror-0 DEGRADED c0t4d0 FAULTED corrupted data c0t5d0 ONLINE Thanks! -Doug Regards, markm___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
Shouldn't the choice of RAID type also be based on the i/o requirements? Anyway, with RAID-10, even a second failed disk is not catastophic, so long as it is not the counterpart of the first failed disk, no matter the no. of disks. (With 2-way mirrors.) But that's why we do backups, right? Mark Sent from my iPhone On Aug 6, 2011, at 7:01 AM, Orvar Korvar knatte_fnatte_tja...@yahoo.com wrote: Ok, so mirrors resilver faster. But, it is not uncommon that another disk shows problem during resilver (for instance r/w errors), this scenario would mean your entire raid is gone, right? If you are using mirrors, and one disk crashes and you start resilver. Then the other disk shows r/w errors because of the increased load - then you are screwed? Because large disks take long time to resilver, possibly weeks? In that case, it would be preferable to use mirrors with 3 disks in each vdev. Trimorrs. Each vdev should be one raidz3. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [illumos-Developer] zfs refratio property
minor quibble: compressratio uses a lowercase x for the description text whereas the new prop uses an uppercase X On 6 Jun 2011, at 21:10, Eric Schrock wrote: Webrev has been updated: http://dev1.illumos.org/~eschrock/cr/zfs-refratio/ - Eric -- Eric Schrock Delphix 275 Middlefield Road, Suite 50 Menlo Park, CA 94025 http://www.delphix.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS acl inherit problem
On 6/1/11 12:51 AM, lance wilson wrote: The problem is that nfs clients that connect to my solaris 11 express server are not inheriting the acl's that are set for the share. They create files that don't have any acl assigned to them, just the normal unix file permissions. Can someone please provide some additional things to test so that I can get this sorted out. This is the output of a normal ls -al drwxrwxrwx+ 5 root root 11 2011-05-31 11:14 acltest The compact version is ls -Vd drwxrwxrwx+ 5 root root 11 May 31 11:14 /smallstore/acltest user:root:rwxpdDaARWcCos:fd-:allow everyone@:rwxpdDaARWc--s:fd-:allow The parent share has the following permissions drwxr-xr-x+ 5 root root 5 May 30 22:26 /smallstore/ user:root:rwxpdDaARWcCos:fd-:allow everyone@:r-x---a-R-c---:fd-:allow owner@:rwxpdDaARWcCos:fd-:allow This is the acl for the files created by a ubuntu client. There is no acl inheritance occurring. -rw-r--r-- 1 1000 1000 0 May 31 22:20 /smallstore/acltest/ubuntu_file owner@:rw-p--aARWcCos:---:allow group@:r-a-R-c--s:---:allow everyone@:r-a-R-c--s:---:allow Looks like the linux client did a chmod(2) after creating the file. what happens when you create a file locally in that directory on the solaris system? This is the acl for files created by a user from a windows client. There is full acl inheritance. -rwxrwxrwx+ 1 ljw staff 0 May 31 22:22 /smallstore/acltest/windows_file user:root:rwxpdDaARWcCos:--I:allow everyone@:rwxpdDaARWc--s:--I:allow The acl inheritance is on at both the share and directory levels so it should be passing them to files that are created. smallstore aclinherit restricted default smallstore/acltest aclinherit passthrough local Again any help would be most appreciated. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another zfs issue
Yeah, this is a known problem. The DTL on the toplevel shows an outage, and is preventing the removal of the spare even though removing the spare won't make the outage worse. Unfortunately, for opensolaris anyway, there is no workaround. You could try doing a full scrub, replacing any disks that show errors, and waiting for the resilver to complete. That may clean up the DTL enough to detach the spare. On 1 Jun 2011, at 20:20, Roy Sigurd Karlsbakk wrote: Hi all I have this pool that has been suffering from some bad backplanes etc. Currently it's showing up ok, but after a resilver, a spare is stuck. raidz2-5 ONLINE 0 0 4 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 1 0 0 c4t3d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spare-4ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t44d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 So, the VDEV seems ok, the pool reports two data errors, which is sad, but not a showstopper, however, trying to detach the spare from that vdev doesn's seem to easy roy@dmz-backup:~$ sudo zpool detach dbpool c4t44d0 cannot detach c4t44d0: no valid replicas iostat -en shows some issues with drives in that pool, but none on the two in the spare mirror 0 0 0 0 c4t1d0 0 82 131 213 c4t2d0 0 0 0 0 c4t3d0 0 0 0 0 c4t4d0 0 0 0 0 c4t5d0 0 0 0 0 c4t6d0 0 0 0 0 c4t7d0 0 0 0 0 c4t44d0 Is there a good explaination why I can't detach this mirror from the VDEV? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommended eSATA PCI cards
Hi Rich, With the Ultra 20M2 there is a very cheap/easy alternative that might work for you (until you need to expand past 2 more external devices anyway) Pick up an eSATA pci bracket cable adapter, something like this- http://www.newegg.com/Product/Product.aspx?Item=N82E16812226003cm_re=eSATA-_-12-226-003-_-Product (I haven't used this specific product but it was the first example I found) The U20M2 has slots for just 2 internal SATA drives but the motherboard has a total of 4 SATA connectors so there are two that normally go unused. Connect these to the bracket and connect your external eSATA enclosures to these. You'll get two eSATA ports without needing to use any PCI slots and I believe that if you use the very bottom pci slot opening you won't even block any of the actual pci slots from future use. -Mark D. On 05/ 6/11 12:04 PM, Rich Teer wrote: Hi all, I'm looking at replacing my old D1000 array with some new external drives, most likely these: http://www.g-technology.com/products/g-drive.cfm . In the immediate term, I'm planning to use USB 2.0 connections, but the drive I'm considering also supports eSATA, which is MUCH faster than USB, but also (I think, please correct me if I'm wrong) more reliable. Neither of the machines I'll be using as my server (currently an SB1000 but will be an Ultra 20 M2 soon; this is my home network, very light workload) has an integrated eSATA port, so I must turn to add-on PCI cards. What are people recommending? I need to attach at least two drives (I'll be mirroring them), preferably three or more. The machines are currently running SXCE snv_b130, with an upgrade to Solaris Express 11 not too far away. Thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540 no next-gen product?
On Apr 8, 2011, at 2:37 AM, Ian Collins i...@ianshome.com wrote: On 04/ 8/11 06:30 PM, Erik Trimble wrote: On 4/7/2011 10:25 AM, Chris Banal wrote: While I understand everything at Oracle is top secret these days. Does anyone have any insight into a next-gen X4500 / X4540? Does some other Oracle / Sun partner make a comparable system that is fully supported by Oracle / Sun? http://www.oracle.com/us/products/servers-storage/servers/previous-products/index.html What do X4500 / X4540 owners use if they'd like more comparable zfs based storage and full Oracle support? I'm aware of Nexenta and other cloned products but am specifically asking about Oracle supported hardware. However, does anyone know if these type of vendors will be at NAB this year? I'd like to talk to a few if they are... The move seems to be to the Unified Storage (aka ZFS Storage) line, which is a successor to the 7000-series OpenStorage stuff. http://www.oracle.com/us/products/servers-storage/storage/unified-storage/index.html Which is not a lot of use to those of us who use X4540s for what they were intended: storage appliances. Can you elaborate briefly on what exactly the problem is? I don't follow? What else would an X4540 or a 7xxx box be used for, other than a storage appliance? Guess I'm slow. :-) Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540 no next-gen product?
On Apr 8, 2011, at 3:29 AM, Ian Collins i...@ianshome.com wrote: On 04/ 8/11 08:08 PM, Mark Sandrock wrote: On Apr 8, 2011, at 2:37 AM, Ian Collinsi...@ianshome.com wrote: On 04/ 8/11 06:30 PM, Erik Trimble wrote: On 4/7/2011 10:25 AM, Chris Banal wrote: While I understand everything at Oracle is top secret these days. Does anyone have any insight into a next-gen X4500 / X4540? Does some other Oracle / Sun partner make a comparable system that is fully supported by Oracle / Sun? http://www.oracle.com/us/products/servers-storage/servers/previous-products/index.html What do X4500 / X4540 owners use if they'd like more comparable zfs based storage and full Oracle support? I'm aware of Nexenta and other cloned products but am specifically asking about Oracle supported hardware. However, does anyone know if these type of vendors will be at NAB this year? I'd like to talk to a few if they are... The move seems to be to the Unified Storage (aka ZFS Storage) line, which is a successor to the 7000-series OpenStorage stuff. http://www.oracle.com/us/products/servers-storage/storage/unified-storage/index.html Which is not a lot of use to those of us who use X4540s for what they were intended: storage appliances. Can you elaborate briefly on what exactly the problem is? I don't follow? What else would an X4540 or a 7xxx box be used for, other than a storage appliance? Guess I'm slow. :-) No, I just wasn't clear - we use ours as storage/application servers. They run Samba, Apache and various other applications and P2V zones that access the large pool of data. Each also acts as a fail over box (both data and applications) for the other. You have built-in storage failover with an AR cluster; and they do NFS, CIFS, iSCSI, HTTP and WebDav out of the box. And you have fairly unlimited options for application servers, once they are decoupled from the storage servers. It doesn't seem like much of a drawback -- although it may be for some smaller sites. I see AR clusters going in in local high schools and small universities. Anything's a fraction of the price of a SAN, isn't it? :-) Mark They replaced several application servers backed by a SAN for a fraction the price of a new SAN. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540 no next-gen product?
On Apr 8, 2011, at 7:50 AM, Evaldas Auryla evaldas.aur...@edqm.eu wrote: On 04/ 8/11 01:14 PM, Ian Collins wrote: You have built-in storage failover with an AR cluster; and they do NFS, CIFS, iSCSI, HTTP and WebDav out of the box. And you have fairly unlimited options for application servers, once they are decoupled from the storage servers. It doesn't seem like much of a drawback -- although it may be for some smaller sites. I see AR clusters going in in local high schools and small universities. Which is all fine and dandy if you have a green field, or are willing to re-architect your systems. We just wanted to add a couple more x4540s! Hi, same here, it's a sad news that Oracle decided to stop x4540s production line. Before, ZFS geeks had choice - buy 7000 series if you want quick out of the box storage with nice GUI, or build your own storage with x4540 line, which by the way has brilliant engineering design, the choice is gone now. Okay, so what is the great advantage of an X4540 versus X86 server plus disk array(s)? Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540 no next-gen product?
On Apr 8, 2011, at 9:39 PM, Ian Collins i...@ianshome.com wrote: On 04/ 9/11 03:20 AM, Mark Sandrock wrote: On Apr 8, 2011, at 7:50 AM, Evaldas Aurylaevaldas.aur...@edqm.eu wrote: On 04/ 8/11 01:14 PM, Ian Collins wrote: You have built-in storage failover with an AR cluster; and they do NFS, CIFS, iSCSI, HTTP and WebDav out of the box. And you have fairly unlimited options for application servers, once they are decoupled from the storage servers. It doesn't seem like much of a drawback -- although it may be for some smaller sites. I see AR clusters going in in local high schools and small universities. Which is all fine and dandy if you have a green field, or are willing to re-architect your systems. We just wanted to add a couple more x4540s! Hi, same here, it's a sad news that Oracle decided to stop x4540s production line. Before, ZFS geeks had choice - buy 7000 series if you want quick out of the box storage with nice GUI, or build your own storage with x4540 line, which by the way has brilliant engineering design, the choice is gone now. Okay, so what is the great advantage of an X4540 versus X86 server plus disk array(s)? One less x86 box (even more of an issue now we have to mortgage the children for support), a lot less $. Not to mention an existing infrastructure built using X4540s and me looking a fool explaining to the client they can't get any more so the systems we have spent two years building up are a dead end. One size does not fit all, choice is good for business. I'm not arguing. If it were up to me, we'd still be selling those boxes. Mark -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540 no next-gen product?
On Apr 8, 2011, at 11:19 PM, Ian Collins i...@ianshome.com wrote: On 04/ 9/11 03:53 PM, Mark Sandrock wrote: I'm not arguing. If it were up to me, we'd still be selling those boxes. Maybe you could whisper in the right ear? I wish. I'd have a long list if I could do that. Mark :) -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] trouble replacing spare disk
Hi, I have a SunFire X4540 with 19TB in a RAID-Z configuration; here's my zpool status: pool: raid state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: resilver in progress for 84h11m, 99.47% done, 0h27m to go config: NAME STATE READ WRITE CKSUM raid UNAVAIL 0 0 451 insufficient replicas raidz1 UNAVAIL 0 0 902 insufficient replicas c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 UNAVAIL47294 0 cannot open c5t3d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 spareDEGRADED 7 0 66.8M c5t2d0 FAULTED 11 2 0 too many errors replacing DEGRADED 0 0 0 c5t7d0 FAULTED 13 0 0 too many errors c5t6d0 ONLINE 0 0 0 202G resilvered c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 spareDEGRADED 0 0 0 c0t1d0 FAULTED 0 0 0 too many errors c4t7d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c4t7d0 INUSE currently in use c5t7d0 INUSE currently in use c5t6d0 INUSE currently in use c5t4d0 AVAIL errors: 911 data errors, use '-v' for a list It looks like the resilver has got stuck; Oracle have sent out a replacement disk today and are asking me to replace c5t7d0. If I am understanding the documentation correctly, I believe I need to do the following: zpool offline raid c5t7d0 cfgadm -c unconfigure c5::dsk/c5t7d0 before physically replacing the disk. However, I get the following messages when trying to do this: # zpool offline raid c5t7d0 cannot offline c5t7d0: device is reserved as a hot spare # cfgadm -c unconfigure c5::dsk/c5t7d0 cfgadm: Hardware specific failure: failed to unconfigure SCSI device: Device busy I also tried a detach: # zpool detach raid c5t7d0 cannot detach c5t7d0: pool I/O is currently suspended And I also tried using the last available spare to try and free up the disk I need to replace: # zpool replace raid c5t2d0 c5t4d0 Cannot replace c5t2d0 with c5t4d0: device has already been replaced with a spare I am new to ZFS, how would I go about safely removing the affected drive in the software, before physically replacing it? I'm also not sure at exactly which juncture to do a 'zpool clear' and 'zpool scrub'? I'd appreciate any guidance - thanks in advance, Mark Mark Mahabir Systems Manager, X-Ray and Observational Astronomy Dept. of Physics Astronomy, University of Leicester, LE1 7RH Tel: +44(0)116 252 5652 email: mark.maha...@leicester.ac.uk Elite Without Being Elitist Times Higher Awards Winner 2007, 2008, 2009, 2010 Follow us on Twitter http://twitter.com/uniofleicsnews ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any use for extra drives?
On Mar 24, 2011, at 7:23 AM, Anonymous wrote: Generally, you choose your data pool config based on data size, redundancy, and performance requirements. If those are all satisfied with your single mirror, the only thing left for you to do is think about splitting your data off onto a separate pool due to better performance etc. (Because there are things you can't do with the root pool, such as striping and raidz) That's all there is to it. To split, or not to split. Thanks for the update. I guess there's not much to do for this box since it's a development machine and doesn't have much need for extra redundancy although if I would have had some extra 500s I would have liked to stripe the root pool. I see from your answer that's not possible anyway. Cheers. If you plan to generate a lot of data, why use the root pool? You can put the /home and /proj filesystems (/export/...) on a separate pool, thus off-loading the root pool. My two cents, Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any use for extra drives?
On Mar 24, 2011, at 5:42 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Nomen Nescio Hi ladies and gents, I've got a new Solaris 10 development box with ZFS mirror root using 500G drives. I've got several extra 320G drives and I'm wondering if there's any way I can use these to good advantage in this box. I've got enough storage for my needs with the 500G pool. At this point I would be looking for a way to speed things up if possible or add redundancy if necessary but I understand I can't use these smaller drives to stripe the root pool, so what would you suggest? Thanks. Generally, you choose your data pool config based on data size, redundancy, and performance requirements. If those are all satisfied with your single mirror, the only thing left for you to do is think about splitting your data off onto a separate pool due to better performance etc. (Because there are things you can't do with the root pool, such as striping and raidz) That's all there is to it. To split, or not to split. I'd just put /export/home on this second set of drives, as a striped mirror. Same as I would have done in the old days under SDS. :-) Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot replace c10t0d0 with c10t0d0: device is too small
The fix for 6991788 would probably let the 40mb drive work, but it would depend on the asize of the pool. On Fri, 4 Mar 2011, Cindy Swearingen wrote: Hi Robert, We integrated some fixes that allowed you to replace disks of equivalent sizes, but 40 MB is probably beyond that window. Yes, you can do #2 below and the pool size will be adjusted down to the smaller size. Before you do this, I would check the sizes of both spares. If both spares are equivalent smaller sizes, you could use those to build the replacement pool with the larger disks and then put the extra larger disks on the shelf. Thanks, Cindy On 03/04/11 09:22, Robert Hartzell wrote: In 2007 I bought 6 WD1600JS 160GB sata disks and used 4 to create a raidz storage pool and then shelved the other two for spares. One of the disks failed last night so I shut down the server and replaced it with a spare. When I tried to zpool replace the disk I get: zpool replace tank c10t0d0 cannot replace c10t0d0 with c10t0d0: device is too small The 4 original disk partition tables look like this: Current partition table (original): Total disk sectors available: 312560317 + 16384 (reserved sectors) Part TagFlag First Sector Size Last Sector 0usrwm34 149.04GB 312560350 1 unassignedwm 0 0 0 2 unassignedwm 0 0 0 3 unassignedwm 0 0 0 4 unassignedwm 0 0 0 5 unassignedwm 0 0 0 6 unassignedwm 0 0 0 8 reservedwm 3125603518.00MB 312576734 Spare disk partition table looks like this: Current partition table (original): Total disk sectors available: 312483549 + 16384 (reserved sectors) Part TagFlag First Sector Size Last Sector 0usrwm34 149.00GB 312483582 1 unassignedwm 0 0 0 2 unassignedwm 0 0 0 3 unassignedwm 0 0 0 4 unassignedwm 0 0 0 5 unassignedwm 0 0 0 6 unassignedwm 0 0 0 8 reservedwm 3124835838.00MB 312499966 So it seems that two of the disks are slightly different models and are about 40mb smaller then the original disks. I know I can just add a larger disk but I would rather user the hardware I have if possible. 1) Is there anyway to replace the failed disk with one of the spares? 2) Can I recreate the zpool using 3 of the original disks and one of the slightly smaller spares? Will zpool/zfs adjust its size to the smaller disk? 3) If #2 is possible would I still be able to use the last still shelved disk as a spare? If #2 is possible I would probably recreate the zpool as raidz2 instead of the current raidz1. Any info/comments would be greatly appreciated. Robert -- Robert Hartzell b...@rwhartzell.net RwHartzell.Net, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Investigating a hung system
Hi, I'm investigating a hung system. The machine is running snv_159 and was running a full build of Solaris 11. You cannot get any response from the console and you cannot ssh in, but it responds to ping. The output from ::arc shows: arc_meta_used = 3836 MB arc_meta_limit= 3836 MB arc_meta_max = 3951 MB Is it normal for arc_meta_used == arc_meta_limit? Does this explain the hang? Thanks, Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and Virtual Disks
Hi I wanted to get some expert advice on this. I have an ordinary hardware SAN from Promise Tech that presents the LUNs via iSCSI. I would like to use that if possible with my VMware environment where I run several Solaris / OpenSolaris virtual machines. My question is regarding the virtual disks. 1. Should I create individual iSCSI LUNs and present those to the VMware ESXi host as iSCSI storage, and then create virtual disks from there on each Solaris VM? - or - 2. Should I (assuming this is possible), let the Solaris VM mount the iSCSI LUNs directly (that is, NOT show them as VMware storage but let the VM connect to the iSCSI across the network.) ? Part of the issue is I have no idea if having a hardware RAID 5 or 6 disk set will create a problem if I then create a bunch of virtual disks and then use ZFS to create RAIDZ for the VM to use. Seems like that might be asking for trouble. This environment is completely available to mess with (no data at risk), so I'm willing to try any option you guys would recommend. Thanks! -- Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and spindle speed (7.2k / 10k / 15k)
On Feb 2, 2011, at 8:10 PM, Eric D. Mudama wrote: All other things being equal, the 15k and the 7200 drive, which share electronics, will have the same max transfer rate at the OD. Is that true? So the only difference is in the access time? Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best choice - file system for system
Why do you say fssnap has the same problem? If it write locks the file system, it is only for a matter of seconds, as I recall. Years ago, I used it on a daily basis to do ufsdumps of large fs'es. Mark On Jan 30, 2011, at 5:41 PM, Torrey McMahon wrote: On 1/30/2011 5:26 PM, Joerg Schilling wrote: Richard Ellingrichard.ell...@gmail.com wrote: ufsdump is the problem, not ufsrestore. If you ufsdump an active file system, there is no guarantee you can ufsrestore it. The only way to guarantee this is to keep the file system quiesced during the entire ufsdump. Needless to say, this renders ufsdump useless for backup when the file system also needs to accommodate writes. This is why there is a ufs snapshot utility. You'll have the same problem. fssnap_ufs(1M) write locks the file system when you run the lock command. See the notes section of the man page. http://download.oracle.com/docs/cd/E19253-01/816-5166/6mbb1kq1p/index.html#Notes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best choice - file system for system
iirc, we would notify the user community that the FS'es were going to hang briefly. Locking the FS'es is the best way to quiesce it, when users are worldwide, imo. Mark On Jan 31, 2011, at 9:45 AM, Torrey McMahon wrote: A matter of seconds is a long time for a running Oracle database. The point is that if you have to keep writing to a UFS filesystem - when the file system also needs to accommodate writes - you're still out of luck. If you can quiesce the apps, great, but if you can't then you're still stuck. In other words, fssnap_ufs doesn't solve the quiesce problem. On 1/31/2011 10:24 AM, Mark Sandrock wrote: Why do you say fssnap has the same problem? If it write locks the file system, it is only for a matter of seconds, as I recall. Years ago, I used it on a daily basis to do ufsdumps of large fs'es. Mark On Jan 30, 2011, at 5:41 PM, Torrey McMahon wrote: On 1/30/2011 5:26 PM, Joerg Schilling wrote: Richard Ellingrichard.ell...@gmail.com wrote: ufsdump is the problem, not ufsrestore. If you ufsdump an active file system, there is no guarantee you can ufsrestore it. The only way to guarantee this is to keep the file system quiesced during the entire ufsdump. Needless to say, this renders ufsdump useless for backup when the file system also needs to accommodate writes. This is why there is a ufs snapshot utility. You'll have the same problem. fssnap_ufs(1M) write locks the file system when you run the lock command. See the notes section of the man page. http://download.oracle.com/docs/cd/E19253-01/816-5166/6mbb1kq1p/index.html#Notes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A few questions
On Dec 18, 2010, at 12:23 PM, Lanky Doodle wrote: Now this is getting really complex, but can you have server failover in ZFS, much like DFS-R in Windows - you point clients to a clustered ZFS namespace so if a complete server failed nothing is interrupted. This is the purpose of an Amber Road dual-head cluster (7310C, 7410C, etc.) -- not only the storage pool fails over, but also the server IP address fails over, so that NFS, etc. shares remain active, when one storage head goes down. Amber Road uses ZFS, but the clustering and failover are not related to the filesystem type. Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A few questions
Erik, just a hypothetical what-if ... In the case of resilvering on a mirrored disk, why not take a snapshot, and then resilver by doing a pure block copy from the snapshot? It would be sequential, so long as the original data was unmodified; and random access in dealing with the modified blocks only, right. After the original snapshot had been replicated, a second pass would be done, in order to update the clone to 100% live data. Not knowing enough about the inner workings of ZFS snapshots, I don't know why this would not be doable. (I'm biased towards mirrors for busy filesystems.) I'm supposing that a block-level snapshot is not doable -- or is it? Mark On Dec 20, 2010, at 1:27 PM, Erik Trimble wrote: On 12/20/2010 9:20 AM, Saxon, Will wrote: -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey Sent: Monday, December 20, 2010 11:46 AM To: 'Lanky Doodle'; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] A few questions From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Lanky Doodle I believe Oracle is aware of the problem, but most of the core ZFS team has left. And of course, a fix for Oracle Solaris no longer means a fix for the rest of us. OK, that is a bit concerning then. As good as ZFS may be, i'm not sure I want to committ to a file system that is 'broken' and may not be fully fixed, if at all. ZFS is not broken. It is, however, a weak spot, that resilver is very inefficient. For example: On my server, which is made up of 10krpm SATA drives, 1TB each... My drives can each sustain 1Gbit/sec sequential read/write. This means, if I needed to resilver the entire drive (in a mirror) sequentially, it would take ... 8,000 sec = 133 minutes. About 2 hours. In reality, I have ZFS mirrors, and disks are around 70% full, and resilver takes 12-14 hours. So although resilver is broken by some standards, it is bounded, and you can limit it to something which is survivable, by using mirrors instead of raidz. For most people, even using 5-disk, or 7-disk raidzN will still be fine. But you start getting unsustainable if you get up to 21-disk radiz3 for example. This argument keeps coming up on the list, but I don't see where anyone has made a good suggestion about whether this can even be 'fixed' or how it would be done. As I understand it, you have two basic types of array reconstruction: in a mirror you can make a block-by-block copy and that's easy, but in a parity array you have to perform a calculation on the existing data and/or existing parity to reconstruct the missing piece. This is pretty easy when you can guarantee that all your stripes are the same width, start/end on the same sectors/boundaries/whatever and thus know a piece of them lives on all drives in the set. I don't think this is possible with ZFS since we have variable stripe width. A failed disk d may or may not contain data from stripe s (or transaction t). This information has to be discovered by looking at the transaction records. Right? Can someone speculate as to how you could rebuild a variable stripe width array without replaying all the available transactions? I am no filesystem engineer but I can't wrap my head around how this could be handled any better than it already is. I've read that resilvering is throttled - presumably to keep performance degradation to a minimum during the process - maybe this could be a tunable (e.g. priority: low, normal, high)? Do we know if resilvers on a mirror are actually handled differently from those on a raidz? Sorry if this has already been explained. I think this is an issue that everyone who uses ZFS should understand completely before jumping in, because the behavior (while not 'wrong') is clearly NOT the same as with more conventional arrays. -Will the problem is NOT the checksum/error correction overhead. that's relatively trivial. The problem isn't really even variable width (i.e. variable number of disks one crosses) slabs. The problem boils down to this: When ZFS does a resilver, it walks the METADATA tree to determine what order to rebuild things from. That means, it resilvers the very first slab ever written, then the next oldest, etc. The problem here is that slab age has nothing to do with where that data physically resides on the actual disks. If you've used the zpool as a WORM device, then, sure, there should be a strict correlation between increasing slab age and locality on the disk. However, in any reasonable case, files get deleted regularly. This means that the probability that for a slab B, written immediately after slab A, it WON'T be physically near slab A. In the end, the problem is that using metadata order, while reducing the total amount of work to do in the resilver
Re: [zfs-discuss] A few questions
On Dec 20, 2010, at 2:05 PM, Erik Trimble wrote: On 12/20/2010 11:56 AM, Mark Sandrock wrote: Erik, just a hypothetical what-if ... In the case of resilvering on a mirrored disk, why not take a snapshot, and then resilver by doing a pure block copy from the snapshot? It would be sequential, so long as the original data was unmodified; and random access in dealing with the modified blocks only, right. After the original snapshot had been replicated, a second pass would be done, in order to update the clone to 100% live data. Not knowing enough about the inner workings of ZFS snapshots, I don't know why this would not be doable. (I'm biased towards mirrors for busy filesystems.) I'm supposing that a block-level snapshot is not doable -- or is it? Mark Snapshots on ZFS are true snapshots - they take a picture of the current state of the system. They DON'T copy any data around when created. So, a ZFS snapshot would be just as fragmented as the ZFS filesystem was at the time. But if one does a raw (block) copy, there isn't any fragmentation -- except for the COW updates. If there were no updates to the snapshot, then it becomes a 100% sequential block copy operation. But even with COW updates, presumably the large majority of the copy would still be sequential i/o. Maybe for the 2nd pass, the filesystem would have to be locked, so the operation would ever complete, but if this is fairly short in relation to the overall resilvering time, then it could still be a win in many cases. I'm probably not explaining it well, and may be way off, but it seemed an interesting notion. Mark The problem is this: Let's say I write block A, B, C, and D on a clean zpool (what kind, it doesn't matter). I now delete block C. Later on, I write block E. There is a probability (increasing dramatically as times goes on), that the on-disk layout will now look like: A, B, E, D rather than A, B, [space], D, E So, in the first case, I can do a sequential read to get A B, but then must do a seek to get D, and a seek to get E. The fragmentation problem is mainly due to file deletion, NOT to file re-writing. (though, in ZFS, being a C-O-W filesystem, re-writing generally looks like a delete-then-write process, rather than a modify process). -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A few questions
It well may be that different methods are optimal for different use cases. Mechanical disk vs. SSD; mirrored vs. raidz[123]; sparse vs. populated; etc. It would be interesting to read more in this area, if papers are available. I'll have to take a look. ... Or does someone have pointers? Mark On Dec 20, 2010, at 6:28 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Erik Trimble In the case of resilvering on a mirrored disk, why not take a snapshot, and then resilver by doing a pure block copy from the snapshot? It would be sequential, So, a ZFS snapshot would be just as fragmented as the ZFS filesystem was at the time. I think Mark was suggesting something like dd copy device 1 onto device 2, in order to guarantee a first-pass sequential resilver. And my response would be: Creative thinking and suggestions are always a good thing. In fact, the above suggestion is already faster than the present-day solution for what I'm calling typical usage, but there are an awful lot of use cases where the dd solution would be worse... Such as a pool which is largely sequential already, or largely empty, or made of high IOPS devices such as SSD. However, there is a desire to avoid resilvering unused blocks, so I hope a better solution is possible... The fundamental requirement for a better optimized solution would be a way to resilver according to disk ordering... And it's just a question for somebody that actually knows the answer ... How terrible is the idea of figuring out the on-disk order? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with a failed replace.
On Mon, 6 Dec 2010, Curtis Schiewek wrote: Hi Mark, I've tried running zpool attach media ad24 ad12 (ad12 being the new disk) and I get no response. I tried leaving the command run for an extended period of time and nothing happens. What version of solaris are you running? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs ignoring spares?
On 5 Dec 2010, at 16:06, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote: Hot spares are dedicated spares in the ZFS world. Until you replace the actual bad drives, you will be running in a degraded state. The idea is that spares are only used in an emergency. You are degraded until your spares are no longer in use. --Tim Thanks for the clarification. Wouldn't it be nice if ZFS could fail over to a spare and then allow the replacement as the new spare, as with what is done with most commercial hardware RAIDs? If you use zpool detach to remove the disk that went bad, the spare is promoted to a proper member of the pool. Then, when you replace the bad disk, you can use zpool add to add it into the pool as a new spare. Admittedly, this is all a manual procedure. It's unclear if you were asking for this to be fully automated. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with a failed replace.
On Fri, 3 Dec 2010, Curtis Schiewek wrote: NAME STATE READ WRITE CKSUM media DEGRADED 0 0 0 raidz1 ONLINE 0 0 0 ad8ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad4ONLINE 0 0 0 ad6ONLINE 0 0 0 raidz1 DEGRADED 0 0 0 ad22 ONLINE 0 0 0 ad26 ONLINE 0 0 0 replacing UNAVAIL 0 66.4K 0 insufficient replicas ad24 FAULTED 0 75.1K 0 corrupted data ad18 FAULTED 0 75.2K 0 corrupted data ad24 ONLINE 0 0 0 What happens if you try zpool detach media ad24? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with a failed replace.
On Fri, 3 Dec 2010, Curtis Schiewek wrote: cannot detach ad24: no valid replicas I bet that's an instance of CR 6909724. If you have another disk you can spare, you can do a zpool attach media ad24 newdisk, wait for it to finish resilvering, and then zfs should automatically clean up ad24 ad18 for you. On Fri, Dec 3, 2010 at 1:38 PM, Mark J Musante mark.musa...@oracle.comwrote: On Fri, 3 Dec 2010, Curtis Schiewek wrote: NAME STATE READ WRITE CKSUM media DEGRADED 0 0 0 raidz1 ONLINE 0 0 0 ad8ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad4ONLINE 0 0 0 ad6ONLINE 0 0 0 raidz1 DEGRADED 0 0 0 ad22 ONLINE 0 0 0 ad26 ONLINE 0 0 0 replacing UNAVAIL 0 66.4K 0 insufficient replicas ad24 FAULTED 0 75.1K 0 corrupted data ad18 FAULTED 0 75.2K 0 corrupted data ad24 ONLINE 0 0 0 What happens if you try zpool detach media ad24? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
On Fri, 19 Nov 2010 07:16:20 PST, Günther wrote: i have the same problem with my 2HE supermicro server (24x2,5, connected via 6x mini SAS 8087) and no additional mounting possibilities for 2,5 or 3,5 drives. brbr on those machines i use one sas port (4 drives) of an old adaptec 3805 (i have used them in my pre zfs-times) to build a raid-1 + hotfix for esxi to boot from. the other 20 slots are connected to 3 lsi sas controller for pass-through - so i have 4 sas controller in these machines. brbr maybee the new ssd-drives mounted on a pci-e (ex ocz revo drive) may be an alternative. have anyone used them already with esxi? brbr gea Hey - just as a side note.. Depending on what motherboard you use, you may be able to use this: MCP-220-82603-0N - Dual 2.5 fixed HDD tray kit for SC826 (for E-ATX X8 DP MB) I haven't used one yet myself but am currently planning a SMC build and contacted their support as I really did not want to have my system drives hanging off the controller. As far as I can tell from a picture they sent, it mounts on top of the motherboard itself somewhere where there is normally open space, and it can hold two 2.5 drives. So maybe give in touch with their support and see if you can use something similar. Cheers, Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
On Nov 2, 2010, at 12:10 AM, Ian Collins wrote: On 11/ 2/10 08:33 AM, Mark Sandrock wrote: I'm working with someone who replaced a failed 1TB drive (50% utilized), on an X4540 running OS build 134, and I think something must be wrong. Last Tuesday afternoon, zpool status reported: scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go and a week being 168 hours, that put completion at sometime tomorrow night. However, he just reported zpool status shows: scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go so it's looking more like 2011 now. That can't be right. How is the pool configured? Both 10 and 12 disk RAIDZ-2. That, plus too much other io must be the problem. I'm thinking 5 x (7-2) would be better, assuming he doesn't want to go RAID-10. Thanks much for all the helpful replies. Mark I look after a very busy x5400 with 500G drives configured as 8 drive raidz2 and these take about 100 hours to resilver. The workload on this box is probably worst case for resivering, it receives a steady stream of snapshots. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
Edward, I recently installed a 7410 cluster, which had added Fiber Channel HBAs. I know the site also has Blade 6000s running VMware, but no idea if they were planning to run fiber to those blades (or even had the option to do so). But perhaps FC would be an option for you? Mark On Nov 12, 2010, at 9:03 AM, Edward Ned Harvey wrote: Since combining ZFS storage backend, via nfs or iscsi, with ESXi heads, I’m in love. But for one thing. The interconnect between the head storage. 1G Ether is so cheap, but not as fast as desired. 10G ether is fast enough, but it’s overkill and why is it so bloody expensive? Why is there nothing in between? Is there something in between? Is there a better option? I mean … sata is cheap, and it’s 3g or 6g, but it’s not suitable for this purpose. But the point remains, there isn’t a fundamental limitation that *requires* 10G to be expensive, or *requires* a leap directly from 1G to 10G. I would very much like to find a solution which is a good fit… to attach ZFS storage to vmware. What are people using, as interconnect, to use ZFS storage on ESX(i)? Any suggestions? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool split how it works?
On Wed, 10 Nov 2010, Darren J Moffat wrote: On 10/11/2010 11:18, sridhar surampudi wrote: I was wondering how zpool split works or implemented. Or are you really asking about the implementation details ? If you want to know how it is implemented then you need to read the source code. Also or you can read the blog entry I wrote up after it was put back: http://blogs.sun.com/mmusante/entry/seven_years_of_good_luck ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
Hello, I'm working with someone who replaced a failed 1TB drive (50% utilized), on an X4540 running OS build 134, and I think something must be wrong. Last Tuesday afternoon, zpool status reported: scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go and a week being 168 hours, that put completion at sometime tomorrow night. However, he just reported zpool status shows: scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go so it's looking more like 2011 now. That can't be right. I'm hoping for a suggestion or two on this issue. I'd search the archives, but they don't seem searchable. Or am I wrong about that? Thanks. Mark (subscription pending) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZPOOL_CONFIG_IS_HOLE
You should only see a HOLE in your config if you removed a slog after having added more stripes. Nothing to do with bad sectors. On 14 Oct 2010, at 06:27, Matt Keenan wrote: Hi, Can someone shed some light on what this ZPOOL_CONFIG is exactly. At a guess is it a bad sector of the disk, non writable and thus ZFS marks it as a hole ? cheers Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs unmount versus umount?
On Thu, 30 Sep 2010, Linder, Doug wrote: Is there any technical difference between using zfs unmount to unmount a ZFS filesystem versus the standard unix umount command? I always use zfs unmount but some of my colleagues still just use umount. Is there any reason to use one over the other? No, they're identical. If you use 'zfs umount' the code automatically maps it to 'unmount'. It also maps 'recv' to 'receive' and '-?' to call into the usage function. Here's the relevant code from main(): /* * The 'umount' command is an alias for 'unmount' */ if (strcmp(cmdname, umount) == 0) cmdname = unmount; /* * The 'recv' command is an alias for 'receive' */ if (strcmp(cmdname, recv) == 0) cmdname = receive; /* * Special case '-?' */ if (strcmp(cmdname, -?) == 0) usage(B_TRUE); ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs unmount versus umount?
On Thu, 30 Sep 2010, Darren J Moffat wrote: * It can be applied recursively down a ZFS hierarchy True. * It will unshare the filesystems first Actually, because we use the zfs command to do the unmount, we end up doing the unshare on the filesystem first. See the opensolaris code for details: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libzfs/common/libzfs_mount.c#zfs_unmount ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot access dataset
On Mon, 20 Sep 2010, Valerio Piancastelli wrote: After a crash i cannot access one of my datasets anymore. ls -v cts brwxrwxrwx+ 2 root root 0, 0 ott 18 2009 cts zfs list sas/mail-cts NAME USED AVAIL REFER MOUNTPOINT sas/mail-cts 149G 250G 149G /sas/mail-cts as you can see, the space is referenced by this dataset, but i cannot access the directory /sas/mail-cts Is the dataset mounted? i.e. what does 'zfs get mounted sas/mail-cts' show? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot access dataset
On Mon, 20 Sep 2010, Valerio Piancastelli wrote: Yes, it is mounted r...@disk-00:/volumes/store# zfs get sas/mail-ccts NAME PROPERTY VALUESOURCE sas/mail-cts mounted yes - OK - so the next question would be where the data is. I assume when you say you cannot access the dataset, it means when you type ls -l /sas/mail-cts it shows up as an empty directory. Is that true? With luck, the data will still be in a snapshot. Given that the dataset has 149G referenced, it could be all there. Does 'zfs list -rt snapshot sas/mail-cts' list any? If so, you can try using the most recent snapshot by looking in /sas/mail-cts/.zfs/snapshot/snapshot name and seeing if all your data are there. If it looks good, you can zfs rollback to that snapshot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving rppol in laptop to spare SSD drive.
Hi Steve, Couple of options. Create a new boot environment on the SSD, and this will copy the data over. Or zfs send -R rp...@backup | zfs recv altpool I'd use the alt boot environment, rather than the send and receive. Cheers, -Mark. On 19/09/2010, at 5:37 PM, Steve Arkley wrote: Hello folks, I ordered a bunch of 128Gb SSD's the other day, placed 2 in PC, another in a windoz laptop and I thought I'd place one in my opensolaris laptop, should be straightforward or so I thought. The problem I seem to be running into is that the partition the rpool is on is 130Gb, SSD once sliced up is only about 120Gb. I pulled the main disk from the latop and put it in a caddy, put the new ssd in the drive bay and booted from cdrom. I imported the rpool and created an altpool on the ssd drive. zfs pool list shows both pools. altpool size 119G avail 119G rpool size 130G used 70G I created a snapshot of the rpool and tried to send it to the other disk but it fails with file too large. zfs send -R rp...@backup altpool warning: cannot send 'rpool/bu...@backup': file too large. is there anyway to get the data over onto the other drive at all? Thanks Steve. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss inline: oracle_sig_logo.gif Mark Farmer | Sales Consultant Phone: +61730317106 | Mobile: +61414999143 Oracle Systems ORACLE Australia | 300 Ann St | Brisbane inline: green-for-email-sig_0.gif Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] performance leakage when copy huge data
On Thu, 9 Sep 2010 14:05:51 +, Markus Kovero markus.kov...@nebula.fi wrote: On Sep 9, 2010, at 8:27 AM, Fei Xu twinse...@hotmail.com wrote: This might be the dreaded WD TLER issue. Basically the drive keeps retrying a read operation over and over after a bit error trying to recover from a read error themselves. With ZFS one really needs to disable this and have the drives fail immediately. Check your drives to see if they have this feature, if so think about replacing the drives in the source pool that have long service times and make sure this feature is disabled on the destination pool drives. -Ross It might be due tler-issues, but I'd try to pin greens down to SATA1-mode (use jumper, or force via controller). It might help a bit with these disks, although these are not really suitable disks for any use in any raid configurations due tler issue, which cannot be disabled in later firmware versions. Yours Markus Kovero Just to clarify - do you mean TLER should be off or on? TLER = Time Limited Error Recovery so the drive only takes a max time (eg: 7 seconds) to retrieve data or returns an error. So you say 'cannot be disabled' but I think you mean 'cannot be ENABLED' ? I've been doing a lot of research for a new storage box at work, and from reading a lot of the info available in the Storage forum on hardforum.com, the experts there seem to recommend NOT having TLER enabled when using ZFS as ZFS can be configured for its timeouts, etc, and the main reason to use TLER is when using those drives with hardware RAID cards which will kick a drive out of the array if it takes longer than 10 seconds. Can anyone else here comment if they have had experience with the WD drives and ZFS and if they have TLER enabled or disabled? Cheers, Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] onnv_142 - vfs_mountroot: cannot mount root
Did you run installgrub before rebooting? On Tue, 7 Sep 2010, Piotr Jasiukajtis wrote: Hi, After upgrade from snv_138 to snv_142 or snv_145 I'm unable to boot the system. Here is what I get. Any idea why it's not able to import rpool? I saw this issue also on older builds on a different machines. -- Piotr Jasiukajtis | estibi | SCA OS0072 http://estseg.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] new labelfix needed
On Wed, 1 Sep 2010, Benjamin Brumaire wrote: your point have only a rethoric meaning. I'm not sure what you mean by that. I was asking specifically about your situation. You want to run labelfix on /dev/rdsk/c0d1s4 - what happened to that slice that requires a labelfix? Is there something that zfs might be doing to cause the problem? Is there something that zfs could be doing to mitigate the problem? BTW zfsck would be a great improvement to ZFS. What specifically would zfsck do that is not done by scrub? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
What does 'zpool import' show? If that's empty, what about 'zpool import -d /dev'? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
On Thu, 2 Sep 2010, Dominik Hoffmann wrote: I think, I just destroyed the information on the old raidz members by doing zpool create BackupRAID raidz /dev/disk0s2 /dev/disk1s2 /dev/disk2s2 It should have warned you that two of the disks were already formatted with a zfs pool. Did it not do that? If so, perhaps these aren't the same disks you were using in your pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] new labelfix needed
On Mon, 30 Aug 2010, Benjamin Brumaire wrote: As this feature didn't make it into zfs it would be nice to have it again. Better to spend time fixing the problem that requires a 'labelfix' as a workaround, surely. What's causing the need to fix vdev labels? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pool died during scrub
On Mon, 30 Aug 2010, Jeff Bacon wrote: All of this would be ok... except THOSE ARE THE ONLY DEVICES THAT WERE PART OF THE POOL. How can it be missing a device that didn't exist? The device(s) in question are probably the logs you refer to here: I can't obviously use b134 to import the pool without logs, since that would imply upgrading the pool first, which is hard to do if it's not imported. The stack trace you show is indicative of a memory corruption that may have gotten out to disk. In other words, ZFS wrote data to ram, ram was corrupted, then the checksum was calculated and the result was written out. Do you have a core dump from the panic? Also, what kind of DRAM does this system use? If you're lucky, then there's no corruption and instead it's a stale config that's causing the problem. Try removing /etc/zfs/zpool.cache and then doing an zpool import -a ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool status and format/kernel disagree about root disk
On Fri, 27 Aug 2010, Rainer Orth wrote: zpool status thinks rpool is on c1t0d0s3, while format (and the kernel) correctly believe it's c11t0d0(s3) instead. Any suggestions? Try removing the symlinks or using 'devfsadm -C' as suggested here: https://defect.opensolaris.org/bz/show_bug.cgi?id=14999 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] VM's on ZFS - 7210
Hey thanks for the replies everyone. Saddly most of those options will not work, since we are using a SUN Unified Storage 7210, the only option is to buy the SUN SSD's for it, which is about $15k USD for a pair. We also don't have the ability to shut off ZIL or any of the other options that one might have under OpenSolaris itself :( It sounds like I do want to change to a RAID10 mirror instead of RAIDz. It sounds like enabling write-cash without the ZIL in place might work but would lead to corruption should something crash. So the question is with a proper ZIL SSD from SUN, and a RAID10... would I be able to support all the VM's or would it still be pushing the limits a 44 disk pool? Today there are 30 VM's, 25 are Windows 2008 and 5 are Cent OS 5. A couple are DB servers that see very light load. The only thing that see's any real load is a build server which we get a lot of complaints about. I did some testing and posted my results a month ago, using OpenSolaris and 5 disks with my personal Intel SSD and saw good results, but I don't know how it will scale :( -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] VM's on ZFS - 7210
It does, its on a pair of large APC's. Right now we're using NFS for our ESX Servers. The only iSCSI LUN's I have are mounted inside a couple Windows VM's. I'd have to migrate all our VM's to iSCSI, which I'm willing to do if it would help and not cause other issues. So far the 7210 Appliance has been very stable. I like the zilstat script. I emailed a support tech I am working with on another issue to ask if one of the built in Analytics DTrace scripts will get that data. I found one called L2ARC Eligibility: 3235 true, 66 false. This makes it sound like we would benefit from a READZilla, not quite what I had expected... I'm sure I don't know what I'm looking at anyways :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] VM's on ZFS - 7210
We are using a 7210, 44 disks I believe, 11 stripes of RAIDz sets. When I installed I selected the best bang for the buck on the speed vs capacity chart. We run about 30 VM's on it, across 3 ESX 4 servers. Right now, its all running NFS, and it sucks... sooo slow. iSCSI was no better. I am wondering how I can increase the performance, cause they want to add more vm's... the good news is most are idleish, but even idle vm's create a lot of random chatter to the disks! So a few options maybe... 1) Change to iSCSI mounts to ESX, and enable write-cache on the LUN's since the 7210 is on a UPS. 2) get a Logzilla SSD mirror. (do ssd's fail, do I really need a mirror?) 3) reconfigure the NAS to a RAID10 instead of RAIDz Obviously all 3 would be ideal , though with a SSD can I keep using NFS for the same performance since the R_SYNC's would be satisfied with the SSD? I am dreadful of getting the OK to spend the $$,$$$ SSD's and then not get the performance increase we want. How would you weight these? I noticed in testing on a 5 disk OpenSolaris, that changing from a single RAIDz pool to RAID10 netted a larger IOP increase then adding an Intel SSD as a Logzilla. That's not going to scale the same though with a 44 disk, 11 raidz striped RAID set. Some thoughts? Would simply moving to write-cache enabled iSCSI LUN's without a SSD speed things up a lot by itself? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Narrow escape with FAULTED disks
Well I do have a plan. Thanks to the portability of ZFS boot disks, I'll make two new OS disks on another machine with the next Nexcenta release, export the data pool and swap in the new ones. That way, I can at least manage a zfs scrub without killing the performance and get the Intel SSD's I have been testing to work properly. On the other hand, I could just use the spare 7210 Appliance boot disk I have lying about. Mark. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cant't detach spare device from pool
You need to let the resilver complete before you can detach the spare. This is a known problem, CR 6909724. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6909724 On 18 Aug 2010, at 14:02, Dr. Martin Mundschenk wrote: Hi! I had trouble with my raidz in the way, that some of the blockdevices where not found by the OSOL Box the other day, so the spare device was hooked on automatically. After fixing the problem, the missing device came back online, but I am unable to detach the spare device, even though all devices are online and functional. m...@iunis:~# zpool status tank pool: tank state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 1h5m, 1,76% done, 61h12m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c9t0d1 ONLINE 0 0 0 c9t0d3 ONLINE 0 0 0 15K resilvered c9t0d0 ONLINE 0 0 0 spare-3ONLINE 0 0 0 c9t0d2 ONLINE 0 0 0 37,5K resilvered c16t0d0 ONLINE 0 0 0 14,1G resilvered cache c18t0d0 ONLINE 0 0 0 spares c16t0d0 INUSE currently in use errors: No known data errors m...@iunis:~# zpool detach tank c16t0d0 cannot detach c16t0d0: no valid replicas How can I solve the Problem? Martin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool and filesystem version list, OpenSolaris builds list
I keep the pool version information up-to-date here: http://blogs.sun.com/mmusante/entry/a_zfs_taxonomy On Sun, 15 Aug 2010, Haudy Kazemi wrote: Hello, This is a consolidated list of ZFS pool and filesystem versions, along with the builds and systems they are found in. It is based on multiple online sources. Some of you may find it useful in figuring out where things are at across the spectrum of systems supporting ZFS including FreeBSD and FUSE. At the end of this message there is a list of the builds OpenSolaris releases and some OpenSolaris derivatives are based on. The list is sort-of but not strictly comma delimited, and of course may contain errata. -hk Solaris Nevada xx = snv_xx = onnv_xx ~= testing builds for Solaris 11 SXCE = Solaris Express Community Edition ZFS Pool Version, Where found (multiple), Notes about this version 1, Nevada/SXCE 36, Solaris 10 6/06, Initial ZFS on-disk format integrated on 10/31/05. During the next six months of internal use, there were a few on-disk format changes that did not result in a version number change, but resulted in a flag day since earlier versions could not read the newer changes. For '6389368 fat zap should use 16k blocks (with backwards compatibility)' and '6390677 version number checking makes upgrades challenging' 2, Nevada/SXCE 38, Solaris 10 10/06 (build 9), Ditto blocks (replicated metadata) for '6410698 ZFS metadata needs to be more highly replicated (ditto blocks)' 3, Nevada/SXCE 42, Solaris 10 11/06 (build 3), Hot spares and double parity RAID-Z for '6405966 Hot Spare support in ZFS' and '6417978 double parity RAID-Z a.k.a. RAID6' and '6288488 du reports misleading size on RAID-Z' 4, Nevada/SXCE 62, Solaris 10 8/07, zpool history for '6529406 zpool history needs to bump the on-disk version' and '6343741 want to store a command history on disk' 5, Nevada/SXCE 62, Solaris 10 10/08, gzip compression algorithm for '6536606 gzip compression for ZFS' 6, Nevada/SXCE 62, Solaris 10 10/08, FreeBSD 7.0, 7.1, 7.2, bootfs pool property for '4929890 ZFS boot support for the x86 platform' and '6479807 pools need properties' 7, Nevada/SXCE 68, Solaris 10 10/08, Separate intent log devices for '6339640 Make ZIL use NVRAM when available' 8, Nevada/SXCE 69, Solaris 10 10/08, Delegated administration for '6349470 investigate non-root restore/backup' 9, Nevada/SXCE 77, Solaris 10 10/08, refquota and refreservation properties for '6431277 want filesystem-only quotas' and '6483677 need immediate reservation' and '6617183 CIFS Service - PSARC 2006/715' 10, Nevada/SXCE 78, OpenSolaris 2008.05, Solaris 10 5/09 (Solaris 10 10/08 supports ZFS version 10 except for cache devices), Cache devices for '6536054 second tier (external) ARC' 11, Nevada/SXCE 94, OpenSolaris 2008.11, Solaris 10 10/09, Improved scrub/resilver performance for '6343667 scrub/resilver has to start over when a snapshot is taken' 12, Nevada/SXCE 96, OpenSolaris 2008.11, Solaris 10 10/09, added Snapshot properties for '6701797 want user properties on snapshot' 13, Nevada/SXCE 98, OpenSolaris 2008.11, Solaris 10 10/09, FreeBSD 7.3+, FreeBSD 8.0-RELEASE, Linux ZFS-FUSE 0.5.0, added usedby properties for '6730799 want user properties on snapshots' and 'PSARC/2008/518 ZFS space accounting enhancements' 14, Nevada/SXCE 103, OpenSolaris 2009.06, Solaris 10 10/09, FreeBSD 8-STABLE, 8.1-RELEASE, 9-CURRENT, added passthrough-x aclinherit property support for '6765166 Need to provide mechanism to optionally inherit ACE_EXECUTE' and 'PSARC 2008/659 New ZFS passthrough-x ACL inheritance rules' 15, Nevada/SXCE 114, added quota property support for '6501037 want user/group quotas on ZFS' and 'PSARC 2009/204 ZFS user/group quotas space accounting' 16, Nevada/SXCE 116, Linux ZFS-FUSE 0.6.0, added stmf property support for '6736004 zvols need an additional property for comstar support' 17, Nevada/SXCE 120, added triple-parity RAID-Z for '6854612 triple-parity RAID-Z' 18, Nevada/SXCE 121, Linux zfs-0.4.9, added ZFS snapshot holds for '6803121 want user-settable refcounts on snapshots' 19, Nevada/SXCE 125, added ZFS log device removal option for '6574286 removing a slog doesn't work' 20, Nevada/SXCE 128, added zle compression to support dedupe in version 21 for 'PSARC/2009/571 ZFS Deduplication Properties' 21, Nevada/SXCE 128, added deduplication properties for 'PSARC/2009/571 ZFS Deduplication Properties' 22, Nevada/SXCE 128a, Nexenta Core Platform Beta 2, Beta 3, added zfs receive properties for 'PSARC/2009/510 ZFS Received Properties' 23, Nevada 135, Linux ZFS-FUSE 0.6.9, added slim ZIL support for '6595532 ZIL is too talkative' 24, Nevada 137, added support for system attributes for '6716117 ZFS needs native system attribute infrastructure' and '6516171 zpl symlinks should have their own object type' 25, Nevada ??, Nexenta Core Platform RC1 26, Nevada 141, Linux zfs-0.5.0 ZFS Pool Version, OpenSolaris, Solaris 10, Description 1 snv_36 Solaris 10 6/06
Re: [zfs-discuss] Replaced pool device shows up in zpool status
On Mon, 16 Aug 2010, Matthias Appel wrote: Can anybody tell me how to get rid of c1t3d0 and heal my zpool? Can you do a zpool detach performance c1t3d0/o? If that works, then zpool replace performance c1t3d0 c1t0d0 should replace the bad disk with the new hot spare. Once the resilver completes, do a zpool detach performance c1t3d0 to remove the bad disk and promote the hot spare to a full member of the pool. Or, if that doesn't work, try the same thing with c1t3d0 and c1t3d0/o swapped around. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do I Import rpool to an alternate location?
On 16 Aug 2010, at 22:30, Robert Hartzell wrote: cd /mnt ; ls bertha export var ls bertha boot etc where is the rest of the file systems and data? By default, root filesystems are not mounted. Try doing a zfs mount bertha/ROOT/snv_134___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
That's a very good question actually. I would think that COMSTAR would stay because its used by the Fishworks appliance... however, COMSTAR is a competitive advantage for DIY storage solutions. Maybe they will rip it out of S11 and make it an add-on or something. That would suck. I guess the only real reason you can't yank COMSTAR is because its now the basis for iSCSI Target support. But again, there is nothing saying that Target support has to be part of the standard OS offering. Scary to think about. :) benr. That would be the sensible commercial decision, and kill off the competition in the storage market using OpenSolaris based product. I haven't found a linux that can reliably spin the 100Tb I currently have behind OpenSolaris and ZFS. Luckily b134 doesn't seem to have any major issues, and I'm currently looking into a USB boot/raidz root combination for 1U storage. I ran Red Hat 9 with updated packages for quite a few years. As long as the kernel is stable, and you can work through the hurdles, it can still do the job. Mark. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS development moving behind closed doors
On 8/13/10 8:56 PM -0600 Eric D. Mudama wrote: On Fri, Aug 13 at 19:06, Frank Cusack wrote: Interesting POV, and I agree. Most of the many distributions of OpenSolaris had very little value-add. Nexenta was the most interesting and why should Oracle enable them to build a business at their expense? These distributions are, in theory, the gateway drug where people can experiment inexpensively to try out new technologies (ZFS, dtrace, crossbow, comstar, etc.) and eventually step up to Oracle's big iron as their business grows. I've never understood how OpenSolaris was supposed to get you to Solaris. OpenSolaris is for enthusiasts and great great folks like Nexenta. Solaris lags so far behind it's not really an upgrade path. Fedora is a great beta test arena for what eventually becomes a commercial Enterprise offering. OpenSolaris was the Solaris equivalent. Losing the free bleeding edge testing community will no doubt impact on the Solaris code quality. It is now even more likely Solaris will revert to it's niche on SPARC over the next few years. Mark. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs replace problems please please help
On Tue, 10 Aug 2010, seth keith wrote: # zpool status pool: brick state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM brick UNAVAIL 0 0 0 insufficient replicas raidz1 UNAVAIL 0 0 0 insufficient replicas c13d0 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 c7d0 ONLINE 0 0 0 c4d1 ONLINE 0 0 0 replacing UNAVAIL 0 0 0 insufficient replicas c15t0d0 UNAVAIL 0 0 0 cannot open c11t0d0 UNAVAIL 0 0 0 cannot open c12d0 FAULTED 0 0 0 corrupted data c6d0 ONLINE 0 0 0 What I want is to remove c15t0d0 and c11t0d0 and replace with the original c6d1. Suggestions? Do the labels still exist on c6d1? e.g. what do you get from zdb -l /dev/rdsk/c6d1s0? If the label still exists, and the pool guid is the same as the labels on the other disks, you could try doing a zpool detach brick c15t0d0 (or c11t0d0), then export try re-importing. ZFS may find c6d1 at that point. There's no way to guarantee that'll work. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs replace problems please please help
On Wed, 11 Aug 2010, Seth Keith wrote: When I do a zdb -l /dev/rdsk/any device I get the same output for all my drives in the pool, but I don't think it looks right: # zdb -l /dev/rdsk/c4d0 What about /dev/rdsk/c4d0s0? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs replace problems please please help
On Wed, 11 Aug 2010, seth keith wrote: NAME STATE READ WRITE CKSUM brick DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c13d0 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 c7d0 ONLINE 0 0 0 c4d1 ONLINE 0 0 0 14607330800900413650 UNAVAIL 0 0 0 was /dev/dsk/c15t0d0s0 c11t1d0 ONLINE 0 0 0 c6d0 ONLINE 0 0 0 OK, that's good - your missing disk can be replaced with a brand new disk using zpool replace brick 14607330800900413650 disk name. Then wait for the resilver to complete and do a full scrub to be on the safe side. errors: 352808 data errors, use '-v' for a list I there someway I can take the original zpool label from the first 500GB drive I replaced and use it to fix up the other drives in the pool? No. The files with errors can only be restored from any backups you made. If there is an original disk that's not part of your pool, you might want to try making a backup of it, plug it in, and see if a zpool export/zpool import will find it. But it will only find it if zdb -l shows four valid labels. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs replace problems please please help
On Tue, 10 Aug 2010, seth keith wrote: first off I don't have the exact failure messages here, and I did not take good notes of the failures, so I will do the best I can. Please try and give me advice anyway. I have a 7 drive raidz1 pool with 500G drives, and I wanted to replace them all with 2TB drives. Immediately I ran into trouble. If I tired: zpool offline brick device Were you doing an in-place replace? i.e. pulling out the old disk and putting in the new one? I got a message like: insufficient replicas This means that there was a problem with the pool already. When ZFS opens a pool, it looks at the disks that are part of that pool. For raidz1, if more than one disk is unopenable, then the pool will report that there are no valid replicas, which is probably the error message you saw. If that's the case, then your pool already had one failed drive in, and you were attempting to disable a second drive. Do you have a copy of the output from zpool status brick from before you tried your experiment? I tried to zpool replace brick old device new device and I got something like: new device must be a single disk Unfortunately, this just means that we got back an EINVAL from the kernel, which could mean any one of a number of things, but probably there was an issue with calculating the drive size. I'd try plugging it separately and using 'format' to see how big solaris thinks the drive is. I finally got replace and offline to work by: zpool export brick [reboot] zpool import brick Probably didn't need to reboot there. now zpool offline brick old device zpool replace brick old device new device If you use this form for the replace command, you don't need to offline the old disk first. You only need to offline a disk if you're going to pull it out. And then you can do an in-place replace just by issuing zpool replace brick device-you-swapped This worked. zpool status showed replacing in progress, and then after about 26 hours of resilvering, everything looked fine. The old device was gone, and no errors in the pool. Now I tried to do it again with the next device. I missed the zpool offline part however. Immediately, I started getting disk errors on both the drive I was replacing and the first drive I replaced. Read errors? Write errors? Checksum errors? Sounds like a full scrub would have been a good idea prior to replacing the second disk. I have the two original drives, they are in good shape and should still have all the data on them, can I somehow put my original zpool back. How? Please help! You can try exporting the pool, plugging in the original drives, and then do a recovery on it. See the zpool manpage under zpool import for the recovery options and what the flags mean. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to identify user-created zfs filesystems?
You can use 'zpool history -l syspool' to show the username of the person who created the dataset. The history is in a ring buffer, so if too many pool operations have happened since the dataset was created, the information is lost. On Wed, 4 Aug 2010, Peter Taps wrote: Folks, In my application, I need to present user-created filesystems. For my test, I created a zfs pool called mypool and two file systems called cifs1 and cifs2. However, when I run zfs list, I see a lot more entries: # zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 1.31M 1.95G33K /volumes/mypool mypool/cifs11.12M 1.95G 1.12M /volumes/mypool/cifs1 mypool/cifs2 44K 1.95G44K /volumes/mypool/cifs2 syspool 3.58G 4.23G 35.5K legacy syspool/dump 716M 4.23G 716M - syspool/rootfs-nmu-000 1.85G 4.23G 1.36G legacy syspool/rootfs-nmu-001 53.5K 4.23G 1.15G legacy syspool/swap1.03G 5.19G 71.4M - I just need to present cifs1 and cifs2 to the user. Is there a property on the filesystem that I can use to determine user-created filesystems? Thank you in advance for your help. Regards, Peter -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Splitting root mirror to prep for re-install
You can also use the zpool split command and save yourself having to do the zfs send|zfs recv step - all the data will be preserved. zpool split rpool preserve does essentially everything up to and including the zpool export preserve commands you listed in your original email. Just don't try to boot off it. On 4 Aug 2010, at 20:58, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Chris Josephes I have a host running svn_133 with a root mirror pool that I'd like to rebuild with a fresh install on new hardware; but I still have data on the pool that I would like to preserve. So, after rebuilding, you don't want to restore the same OS that you're currently running. But there are some files you'd like to save for after you reinstall. Why not just copy them off somewhere, in a tarball or something like that? Given a rpool with disks c7d0s0 and c6d0s0, I think the following process will do what I need: 1. Run these commands # zpool detach rpool c6d0s0 # zpool create preserve c6d0s0 The only reason you currently have the rpool in a slice (s0) is because that's a requirement for booting. If you aren't planning to boot from the device after breaking it off the mirror ... Maybe just use the whole device instead of the slice. zpool create preserve c6d0 # zfs create export/home # zfs send rpool/export/home | zfs receive preserve/home # zfs send (other filesystems) # zpool export preserve These are not right. It should be something more like this: zfs create -o readonly=on preserve/rpool_export_home zfs snapshot rpool/export/h...@fubarsnap zfs send rpool/export/h...@fubarsnap | zfs receive -F preserve/rpool_export_home And finally zpool export preserve 2. Build out new host with svn_134, placing new root pool on c6d0s0 (or whatever it's called on the new SATA controller) Um ... I assume that's just a type-o ... Yes, install fresh. No, don't overwrite the existing preserve disk. For that matter, why break the mirror at all? Just install the OS again, onto a single disk, which implicitly breaks the mirror. Then when it's all done, use zpool import to import the other half of the mirror, which you didn't overwrite. 3. Run zpool import against preserve, copy over data that should be migrated. 4. Rebuild the mirror by destroying the preserve pool and attaching c7d0s0 to the rpool mirror. Am I missing anything? If you blow away the partition table of the 2nd disk (as I suggested above, but now retract) then you'll have to recreate the partition table of the second disk. So you only attach s0 to s0. After attaching, and resilvering, you'll want to installgrub on the 2nd disk, or else it won't be bootable after the first disk fails. See the ZFS Troubleshooting Guide for details. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] snapshot question
I'm trying to understand how snapshots work in terms of how I can use them for recovering and/or duplicating virtual machines, and how I should set up my file system. I want to use OpenSolaris as a storage platform with NFS/ZFS for some development VMs; that is, the VMs use the OpenSolaris box as their NAS for shared access. Should I set up a separate ZFS file system for each VM so I can individually snapshot each one on a regular basis, or does it matter? The goal would be to be able to take an individual VM back to a previous point in time without changing the others. Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] root pool expansion
On Wed, 28 Jul 2010, Gary Gendel wrote: Right now I have a machine with a mirrored boot setup. The SAS drives are 43Gs and the root pool is getting full. I do a backup of the pool nightly, so I feel confident that I don't need to mirror the drive and can break the mirror and expand the pool with the detached drive. I understand how to do this on a normal pool, but is there any restrictions for doing this on the root pool? Are there any grub issues? You cannot stripe a root pool. Best you could do in this instance is to create a new pool from the detached mirror. You may want to consider keepting the redundancy of the mirror config so that zfs can automatically repair any corruption it detects. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] VMGuest IOMeter numbers
Hello, first time posting. I've been working with zfs on and off with limited *nix experience for a year or so now, and have read a lot of things by a lot of you I'm sure. Still tons I don't understand/know I'm sure. We've been having awful IO latencies on our 7210 running about 40 VM's spread over 3 hosts, no SSD's / Intent Logs. I am trying to get some, but the price... so I had to work up some sort of PoC to show it would help. It so happens I just purchased 3, X25-M's for my own use, and could spare one for a few weeks (though I hate to think how many cells I burned testing), and we also happen to have a couple home built small ZFS servers around to test with. Pretty limited resources, we have a home built box with 5, 250gb 7200rpm sata disks, each connected to the Intel Server board's built in sata ports. I reduced the RAM to 2GB for the tests. The OS is on a sata disk in its own single disk pool. The X25-M was used as a ZIL Log for the SSD Tests. I created 5 VM's on a single ESX Host (Dell 1950) with a Data Store connected to the mini-thumper running 2009.06 snv_111b via NFS over a single GB link. Each VM runs Windows 2003 R2 on a 4GB C:\ vmdk. Dynamo runs 1 worker on the local C: vmdk on each guest and reports to my workstation, so the numbers below are totals of the dynamo on all 5 guests. Each test consisted of an 8k transfer with a 67% read, and 70% random pattern. The tests were run for 5 minutes each. Queue Depth, IOPS, Avg Latency (ms) RAID0 - 5 Disk 1, 326,15.3 2, 453,22 4, 516,38.7 8, 503,72.3 16, 526,152 32, 494,323 RAID0-4 Disk +SSD 1, 494,10.1 2, 579,17.2 4, 580,34.4 8, 603,66.3 16, 598,133.6 32, 600,266 RAIDz - 5 Disk 1, 144,34 2, 162,60 4, 184,108 8, 183,218 16, 175,455 32, 185,864 RAIDz - 4 Disk +SSD 1, 222,22 2, 201,50 4, 221,90 8, 219,181 16, 228,348 32, 228,700 RAID10 - 4 Disk 1, 159,31 2, 206,48 4, 236,84 8, 194,205 16, 243,328 32, 219,728 RAID10 - 4 Disk +SSD 1, 270,18 2, 332,30 4, 363,54 8, 320,124 16, 325,245 32, 333,479 (wonders how the formatting will turn out) Its interesting that going from a 5 disk RAIDz to a 4 disk Mirror (both with no log device) has a bigger increase then using X25-M Log with a 4 disk RAIDz. The increase in IO's adding the X25-M to the Mirror setup is nice, but smaller then I had expected, but the halving of the latencies is even nicer. I am curious how this would scale with a lot more disks, the SSD didn't increase performance as much as I had hoped, but its still nice to see... I'm thinking that's mostly due to my limit of 4-5 disks. I'm not sure how much difference there is between the X25-M and the SUN SSD's for the 7000 series. From what I've read so far the X25-E needs to have its write-cache forced off to function proper, where the X25-M seems to obey the flush commands? I was also curious if I would have seen a bigger increase with an SLC drive instead of the MLC... searching turns up so much old info. Comments welcome! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] invalid vdev configuration meltdown
On Thu, 15 Jul 2010, Tim Castle wrote: j...@opensolaris:~# zpool import -d /dev ...shows nothing after 20 minutes OK, then one other thing to try is to create a new directory, e.g. /mydev, and create in it symbolic links to only those drives that are part of your pool. Based on your label output, I see: path='/dev/ad6' path='/dev/ad4' path='/dev/ad16' path='/dev/ad18' path='/dev/ad8' path='/dev/ad10' I'm guessing /dev has many more entries in, and the zpool import command is hanging in its attempt to open each one of those. So try doing: # ln -s /dev/ad6 /mydev/ad6 ... # ln -s /dev/ad10 /mydev/ad10 This way, you can issue zpool import -d /mydev and the import code should *only* see the devices that are part of the pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] invalid vdev configuration meltdown
What does 'zpool import -d /dev' show? On Wed, 14 Jul 2010, Tim Castle wrote: My raidz1 (ZFSv6) had a power failure, and a disk failure. Now: j...@opensolaris:~# zpool import pool: files id: 3459234681059189202 state: UNAVAIL status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: files UNAVAIL insufficient replicas raidz1 UNAVAIL insufficient replicas c8d1s8 UNAVAIL corrupted data c9d0p0 ONLINE /dev/ad16 OFFLINE c9d1s8 UNAVAIL corrupted data /dev/ad8 UNAVAIL corrupted data c8d0p0 ONLINE j...@opensolaris:~# zpool import files cannot import 'files': pool may be in use from other system use '-f' to import anyway j...@opensolaris:~# zpool import -f files cannot import 'files': invalid vdev configuration ad16 is the dead drive. ad8 is fine but disconnected. I can only connect 4 sata drives to open solaris: my pci sata card isn't compatible. I created and used the pool with FreeNAS, which gives me the same error when all 5 drives are connected. So why do c8d1s8 c9d1s8 show up as slices? c9d0p0, c8d0p0, and ad8 when connected, show up as partitions. zdb -l returns the same thing for all 5 drives. Labels 0 and 1 are fine. 2 and 3 fail to unpack. j...@opensolaris:~# zdb -l /dev/dsk/c8d1s8 LABEL 0 version=6 name='files' state=0 txg=2123835 pool_guid=3459234681059189202 hostid=0 hostname='freenas.local' top_guid=18367164273662411813 guid=7276810192259058351 vdev_tree type='raidz' id=0 guid=18367164273662411813 nparity=1 metaslab_array=14 metaslab_shift=32 ashift=9 asize=6001199677440 children[0] type='disk' id=0 guid=7276810192259058351 path='/dev/ad6' devid='ad:STF602MR3GHBZP' whole_disk=0 DTL=1012 children[1] type='disk' id=1 guid=5425645052930513342 path='/dev/ad4' devid='ad:STF602MR3EZ0WP' whole_disk=0 DTL=1011 children[2] type='disk' id=2 guid=4766543340687449042 path='/dev/ad16' devid='ad:GTA000PAG7PGGA' whole_disk=0 DTL=1010 offline=1 children[3] type='disk' id=3 guid=16172918065436695818 path='/dev/ad18' devid='ad:WD-WCAU42121120' whole_disk=0 DTL=1009 children[4] type='disk' id=4 guid=3693181954889803829 path='/dev/ad8' devid='ad:STF602MR3EYWJP' whole_disk=0 DTL=1008 children[5] type='disk' id=5 guid=5419080715831351987 path='/dev/ad10' devid='ad:STF602MR3ESPYP' whole_disk=0 DTL=1007 LABEL 1 version=6 name='files' state=0 txg=2123835 pool_guid=3459234681059189202 hostid=0 hostname='freenas.local' top_guid=18367164273662411813 guid=7276810192259058351 vdev_tree type='raidz' id=0 guid=18367164273662411813 nparity=1 metaslab_array=14 metaslab_shift=32 ashift=9 asize=6001199677440 children[0] type='disk' id=0 guid=7276810192259058351 path='/dev/ad6' devid='ad:STF602MR3GHBZP' whole_disk=0 DTL=1012 children[1] type='disk' id=1 guid=5425645052930513342 path='/dev/ad4' devid='ad:STF602MR3EZ0WP' whole_disk=0 DTL=1011 children[2] type='disk' id=2 guid=4766543340687449042 path='/dev/ad16' devid='ad:GTA000PAG7PGGA' whole_disk=0 DTL=1010 offline=1 children[3] type='disk' id=3 guid=16172918065436695818 path='/dev/ad18' devid='ad:WD-WCAU42121120' whole_disk=0 DTL=1009 children[4] type='disk' id=4
[zfs-discuss] ZFS crash
I had an interesting dilemma recently and I'm wondering if anyone here can illuminate on why this happened. I have a number of pools, including the root pool, in on-board disks on the server. I also have one pool on a SAN disk, outside the system. Last night the SAN crashed, and shortly thereafter, the ZFS system executed a number of cron jobs, most of which involved running functions on the pool that was on the SAN. This caused a number of problems, most notably that when the SAN eventually came up, those cron jobs finished, and then crashed the system again. Only by [i]zfs destroy[/i] on the newly created zfs file system that the cron jobs created was the system able to boot up again. As long as those corrupted zfs file systems remained on the SAN disk, not even the rpool would boot up correctly. None of the zfs file systems would mount, and most services were disabled. Once I destroyed the newly created zfs file systems, everything instantly mounted and all services started. Question: why would those one zfs file systems prevent ALL pools from mounting, even when they are on different disks and file systems, and prevent all services from starting? I thought ZFS was more resistant to this sort of thing. I will have to edit my scripts and add SAN-checking to make sure it is up before they execute to prevent this from happening again. Luckily I still had all the raw data that the cron jobs were working with, so I was able to quickly re-create what the cron jobs did originally. Although this happened with Solaris 10, perhaps the discussion could be applicable to OpenSolaris as well (I use both). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fsck?
On Tue, 6 Jul 2010, Roy Sigurd Karlsbakk wrote: Hi all With several messages in here about troublesome zpools, would there be a good reason to be able to fsck a pool? As in, check the whole thing instead of having to boot into live CDs and whatnot? You can do this with zpool scrub. It visits every allocated block and verifies that everything is correct. It's not the same as fsck in that scrub can detect and repair problems with the pool still online and all datasets mounted, whereas fsck cannot handle mounted filesystems. If you really want to use it on an exported pool, you can use zdb, although it might take some time. Here's an example on a small empty pool: # zpool create -f mypool raidz c4t1d0s0 c4t2d0s0 c4t3d0s0 c4t4d0s0 c4t5d0s0 # zpool list mypool NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT mypool 484M 280K 484M 0% 1.00x ONLINE - # zpool export mypool # zdb -ebcc mypool Traversing all blocks to verify checksums and verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 48 bp logical:378368 avg: 7882 bp physical:39424 avg:821 compression: 9.60 bp allocated: 185344 avg: 3861 compression: 2.04 bp deduped: 0ref1: 0 deduplication: 1.00 SPA allocated: 185344 used: 0.04% # ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fsck?
On Tue, 6 Jul 2010, Roy Sigurd Karlsbakk wrote: what I'm saying is that there are several posts in here where the only solution is to boot onto a live cd and then do an import, due to metadata corruption. This should be doable from the installed system Ah, I understand now. A couple of things worth noting: - if the root filesystem in a boot pool cannot be mounted, it's problematic to access the tools necessary to repair it. So going to a livecd (or a network boot for that matter) is the best way forward. - if the tools available to failsafe are insufficient to repair a pool, then booting off a livecd/network is the only way forward. It is also worth pointing out here that the 134a build has the pool recovery code built-in. The -F option to zpool import only became available after build 128 or 129. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS on external iSCSI storage
I'm new with ZFS, but I have had good success using it with raw physical disks. One of my systems has access to an iSCSI storage target. The underlying physical array is in a propreitary disk storage device from Promise. So the question is, when building a OpenSolaris host to store its data on an external iSCSI device, is there anything conceptually wrong with creating a raidz pool from a group of raw LUNs carved from the iSCSI device? Thanks for your advice. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool import not working
I'm guessing that the virtualbox VM is ignoring write cache flushes. See this for more ifno: http://forums.virtualbox.org/viewtopic.php?f=8t=13661 On 12 Jun, 2010, at 5.30, zfsnoob4 wrote: Thanks, that works. But it only when I do a proper export first. If I export the pool then I can import with: zpool import -d / (test files are located in /) but if I destroy the pool, then I can no longer import it back, even though the files are still there. Is this normal? Thanks for your help. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] NOTICE: spa_import_rootpool: error 5
IHAC Who has an x4500(x86 box) who has a zfs root filesystem. They installed patches today, the latest solaris 10 x86 recommended patch cluster and the patching seemed to complete successfully. Then when they tried to reboot the box the machine would not boot? They get the following error NOTICE: spa_import_rootpool: error 5, Inc. All rights reserved. Cannot mount root on /p...@0,0/pci8086,2...@4/pci111d,8...@0/pci111d,8...@4/pci108e,2...@0/d...@0,0:a /p...@0,0/pci8086,2...@4/pci111d,8...@0/pci111d,8...@4/pci108e,2...@0/d...@1,0:a fstype zfs panic[cpu0]/thread=fbc28820: vfs_mountroot: cannot mount root fbc4b190 genunix:vfs_mountroot+323 () fbc4b1d0 genunix:main+a9 () fbc4b1e0 unix:_start+95 () skipping system dump - no dump device configured rebooting... The customer states that he backed out the kernel patch 142901-12 and then the x4500 boots successfully??? Has anyone seen this? It almost seems like the zfs root pool is not being seen upon reboot?? Any help on this would be greatly appreciated. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and IBM SDD Vpaths
Can you find the devices in /dev/rdsk? I see there is a path in /pseudo at least, but the zpool import command only looks in /dev. One thing you can try is doing this: # mkdir /tmpdev # ln -s /pseudo/vpat...@1:1 /tmpdev/vpath1a And then see if 'zpool import -d /tmpdev' finds the pool. On 29 May, 2010, at 19.53, morris hooten wrote: I have 6 zfs pools and after rebooting init 6 the vpath device path names have changed for some unknown reason. But I can't detach, remove and reattach to the new device namesANY HELP! please pjde43m01 - - - - FAULTED - pjde43m02 - - - - FAULTED - pjde43m03 - - - - FAULTED - poas43m01 - - - - FAULTED - poas43m02 - - - - FAULTED - poas43m03 - - - - FAULTED - One pool listed below as example pool: poas43m01 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAMESTATE READ WRITE CKSUM poas43m01 UNAVAIL 0 0 0 insufficient replicas vpath4c UNAVAIL 0 0 0 cannot open before 30. vpath1a IBM-2145- cyl 8190 alt 2 hd 64 sec 256 /pseudo/vpat...@1:1 31. vpath2a IBM-2145- cyl 13822 alt 2 hd 64 sec 256 /pseudo/vpat...@2:2 32. vpath3a IBM-2145- cyl 13822 alt 2 hd 64 sec 256 /pseudo/vpat...@3:3 33. vpath4a IBM-2145- cyl 13822 alt 2 hd 64 sec 256 /pseudo/vpat...@4:4 34. vpath5a IBM-2145- cyl 27646 alt 2 hd 64 sec 256 /pseudo/vpat...@5:5 35. vpath6a IBM-2145- cyl 27646 alt 2 hd 64 sec 256 /pseudo/vpat...@6:6 36. vpath7a IBM-2145- cyl 27646 alt 2 hd 64 sec 256 /pseudo/vpat...@7:7 after 30. vpath1a IBM-2145- cyl 8190 alt 2 hd 64 sec 256 /pseudo/vpat...@1:1 31. vpath8a IBM-2145- cyl 13822 alt 2 hd 64 sec 256 /pseudo/vpat...@8:8 32. vpath9a IBM-2145- cyl 13822 alt 2 hd 64 sec 256 /pseudo/vpat...@9:9 33. vpath10a IBM-2145- cyl 13822 alt 2 hd 64 sec 256 /pseudo/vpat...@10:10 34. vpath11a IBM-2145- cyl 27646 alt 2 hd 64 sec 256 /pseudo/vpat...@11:11 35. vpath12a IBM-2145- cyl 27646 alt 2 hd 64 sec 256 /pseudo/vpat...@12:12 36. vpath13a IBM-2145- cyl 27646 alt 2 hd 64 sec 256 /pseudo/vpat...@13:13 {usbderp...@root} zpool detach poas43m03 vpath2c cannot open 'poas43m03': pool is unavailable -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool vdev's
On 28 May, 2010, at 17.21, Vadim Comanescu wrote: In a stripe zpool configuration (no redundancy) is a certain disk regarded as an individual vdev or do all the disks in the stripe represent a single vdev ? In a raidz configuration im aware that every single group of raidz disks is regarded as a top level vdev but i was wondering how is it in the case i mentioned earlier. Thanks. In a stripe config, each disk is considered a top-level vdev. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot import pool from another system, device-ids different! please help!
On Mon, 24 May 2010, h wrote: i had 6 disks in a raidz1 pool that i replaced from 1TB drives to 2TB drives. i have installed the older 1TB drives in another system and would like to import the old pool to access some files i accidentally deleted from the new pool. Did you use the 'zpool replace' command to do the replace? If so, once the replace completes, the ZFS label on the original disk is overwritten to make it available for new pools. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs mount -a kernel panic
On Wed, 19 May 2010, John Andrunas wrote: ff001f45e830 unix:die+dd () ff001f45e940 unix:trap+177b () ff001f45e950 unix:cmntrap+e6 () ff001f45ea50 zfs:ddt_phys_decref+c () ff001f45ea80 zfs:zio_ddt_free+55 () ff001f45eab0 zfs:zio_execute+8d () ff001f45eb50 genunix:taskq_thread+248 () ff001f45eb60 unix:thread_start+8 () This shows you're using some recent bits that includes dedup. How recent is your build? The stack you show here is similar to that in CR 6915314, which we haven't been able to root-cause yet. Let me know if you get a chance to upload the core as Lori Alt outlined, and I can update our bug tracking system to reflect that. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very serious performance degradation
On Thu, 20 May 2010, Edward Ned Harvey wrote: Also, since you've got s0 on there, it means you've got some partitions on that drive. You could manually wipe all that out via format, but the above is pretty brainless and reliable. The s0 on the old disk is a bug in the way we're formatting the output. This was fixed in CR 6881631. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs mount -a kernel panic
Do you have a coredump? Or a stack trace of the panic? On Wed, 19 May 2010, John Andrunas wrote: Running ZFS on a Nexenta box, I had a mirror get broken and apparently the metadata is corrupt now. If I try and mount vol2 it works but if I try and mount -a or mount vol2/vm2 is instantly kernel panics and reboots. Is it possible to recover from this? I don't care if I lose the file listed below, but the other data in the volume would be really nice to get back. I have scrubbed the volume to no avail. Any other thoughts. zpool status -xv vol2 pool: vol2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM vol2ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: vol2/v...@snap-daily-1-2010-05-06-:/as5/as5-flat.vmdk -- John ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] MPT issues strikes back
Bruno Sousa on Tue, Apr 27, 2010 at 09:16:08AM +0200 wrote: Hi all, Yet another story regarding mpt issues, and in order to make a long story short everytime that a Dell R710 running snv_134 logs the information scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@4/pci1028,1...@0 (mpt0): , the system freezes and ony a hard-reset fixes the issue. Is there any sort of parameter to be used to minimize/avoid this issue? We had the same problem on a X4600, turned out to be a bad SSD and or connection at the location listed in the error message. Since removing that drive, we have not encounted that issue. You might want to look at http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=7acda35c626180d9cda7bd1df451?bug_id=6894775 too. -Mark Machine specs : Dell R710, 16 GB memory, 2 Intel Quad-Core E5506 SunOS san01 5.11 snv_134 i86pc i386 i86pc Solaris Dell Integrated SAS 6/i Controller ( mpt0 Firmware version v0.25.47.0 (IR) ) with 2 disks attached without raid Thanks in advance, Bruno -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]
On 23 Apr, 2010, at 7.06, Phillip Oldham wrote: I've created an OpenSolaris 2009.06 x86_64 image with the zpool structure already defined. Starting an instance from this image, without attaching the EBS volume, shows the pool structure exists and that the pool state is UNAVAIL (as expected). Upon attaching the EBS volume to the instance the status of the pool changes to ONLINE, the mount-point/directory is accessible and I can write data to the volume. Now, if I terminate the instance, spin-up a new one, and connect the same (now unattached) EBS volume to this new instance the data is no longer there with the EBS volume showing as blank. Could you share with us the zpool commands you are using? Regards, markm___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]
On 23 Apr, 2010, at 7.31, Phillip Oldham wrote: I'm not actually issuing any when starting up the new instance. None are needed; the instance is booted from an image which has the zpool configuration stored within, so simply starts and sees that the devices aren't available, which become available after I've attached the EBS device. Forgive my ignorance with EC2/EBS, but why doesn't the instance remember that there were EBS volumes attached? Why aren't they automatically attached prior to booting solaris within the instance? The error output from zpool status that you're seeing matches what I would expect if we are attempting to import the pool at boot, and the disks aren't present. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]
On 23 Apr, 2010, at 8.38, Phillip Oldham wrote: The instances are ephemeral; once terminated they cease to exist, as do all their settings. Rebooting an image keeps any EBS volumes attached, but this isn't the case I'm dealing with - its when the instance terminates unexpectedly. For instance, if a reboot operation doesn't succeed or if there's an issue with the data-centre. OK, I think if this issue can be addressed, it would be by people familiar with how EC2 EBS interact. The steps I see are: - start a new instance - attach the EBS volumes to it - log into the instance and zpool online the disks I know the last step can be automated with a script inside the instance, but I'm not sure about the other two steps. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss