[zfs-discuss] Should clearing the share property on a clone unshare the origin?
I've noticed on a Solaris 11 system that when I clone a filesystem and change the share property: #zfs clone -p -o atime=off filesystem@snapshot clone #zfs set -c share=name=old share clone #zfs set share=name=new NFS share clone #zfs set sharenfs=on clone The origin filesystem is no longer shared (the clone is successfully shared). The share and sharenfs properties on the origin filesystem are unchanged. I have to run zfs share on the origin filesystem to restore the share. Feature or a bug?? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment
Thank you for the link! Turns out that, even though I bought the WD20EARS and ST32000542AS expecting a 4096 physical blocksize, they report 512. The new drive I bought correctly identifies as 4096 byte blocksize! So...OI doesn't like it merging with the existing pool. Note: ST2000VX000-9YW1 reports physical blocksize of 4096B. The other drives that actually have 4096B blocks report 512B physical blocks. This is misleading, but they do it anyway. On Mon, Sep 24, 2012 at 4:32 PM, Timothy Coalson tsc...@mst.edu wrote: I'm not sure how to definitively check physical sector size on solaris/illumos, but on linux, hdparm -I (capital i) or smartctl -i will do it. OpenIndiana's smartctl doesn't output this information yet (and its smartctl doesn't work on SATA disks unless attached via a SAS chip). The issue is complicated by having both a logical and a physical sector size, and as far as I am aware, on current disks, logical is always 512, which may be what is being reported in what you ran. Some quick googling suggests that previously, it was not possible to use an existing utility to report the physical sector size on solaris, because someone wrote their own: http://solaris.kuehnke.de/archives/18-Checking-physical-sector-size-of-disks-on-Solaris.html So, if you want to make sure of the physical sector size, you could give that program a whirl (it compiled fine for me on oi_151a6, and runs, but it is not easy for me to attach a 4k sector disk to one of my OI machines, so I haven't confirmed its correctness), or temporarily transplant the spare in question to a linux machine (or live system) and use hdparm -I. Tim On Mon, Sep 24, 2012 at 2:37 PM, LIC mesh licm...@gmail.com wrote: Any ideas? On Mon, Sep 24, 2012 at 10:46 AM, LIC mesh licm...@gmail.com wrote: That's what I thought also, but since both prtvtoc and fdisk -G see the two disks as the same (and I have not overridden sector size), I am confused. * * *iostat -xnE:* c16t5000C5002AA08E4Dd0 Soft Errors: 0 Hard Errors: 323 Transport Errors: 489 Vendor: ATA Product: ST32000542AS Revision: CC34 Serial No: %FAKESERIAL% Size: 2000.40GB 2000398934016 bytes Media Error: 207 Device Not Ready: 0 No Device: 116 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c16t5000C5005295F727d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST2000VX000-9YW1 Revision: CV13 Serial No: %FAKESERIAL% Size: 2000.40GB 2000398934016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 *zpool status:* pool: rspool state: ONLINE scan: resilvered 719G in 65h28m with 0 errors on Fri Aug 24 04:21:44 2012 config: NAMESTATE READ WRITE CKSUM rspool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c16t5000C5002AA08E4Dd0 ONLINE 0 0 0 c16t5000C5002ABE78F5d0 ONLINE 0 0 0 c16t5000C5002AC49840d0 ONLINE 0 0 0 c16t50014EE057B72DD3d0 ONLINE 0 0 0 c16t50014EE057B69208d0 ONLINE 0 0 0 cache c4t2d0ONLINE 0 0 0 spares c16t5000C5005295F727d0AVAIL errors: No known data errors *root@nas:~# zpool replace rspool c16t5000C5002AA08E4Dd0 c16t5000C5005295F727d0* cannot replace c16t5000C5002AA08E4Dd0 with c16t5000C5005295F727d0: devices have different sector alignment On Mon, Sep 24, 2012 at 9:23 AM, Gregg Wonderly gregg...@gmail.comwrote: What is the error message you are seeing on the replace? This sounds like a slice size/placement problem, but clearly, prtvtoc seems to think that everything is the same. Are you certain that you did prtvtoc on the correct drive, and not one of the active disks by mistake? Gregg Wonderly As does fdisk -G: root@nas:~# fdisk -G /dev/rdsk/c16t5000C5002AA08E4Dd0 * Physical geometry for device /dev/rdsk/c16t5000C5002AA08E4Dd0 * PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ 608006080000255 252 512 You have new mail in /var/mail/root root@nas:~# fdisk -G /dev/rdsk/c16t5000C5005295F727d0 * Physical geometry for device /dev/rdsk/c16t5000C5005295F727d0 * PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ 608006080000255 252 512 On Mon, Sep 24, 2012 at 9:01 AM, LIC mesh licm...@gmail.com wrote: Yet another weird thing - prtvtoc shows both drives as having the same sector size, etc: root@nas:~# prtvtoc /dev/rdsk/c16t5000C5002AA08E4Dd0 * /dev/rdsk/c16t5000C5002AA08E4Dd0 partition map * * Dimensions: * 512 bytes/sector * 3907029168 sectors * 3907029101 accessible sectors * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First
Re: [zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
--- On Mon, 9/24/12, Richard Elling richard.ell...@gmail.com wrote: I'm hoping the answer is yes - I've been looking but do not see it ... none can hide from dtrace!# dtrace -qn 'dsl_dataset_stats:entry {this-ds = (dsl_dataset_t *)arg0;printf(%s\tcompressed size = %d\tuncompressed size=%d\n, this-ds-ds_dir-dd_myname, this-ds-ds_phys-ds_compressed_bytes, this-ds-ds_phys-ds_uncompressed_bytes)}'openindiana-1 compressed size = 3667988992 uncompressed size=3759321088 [zfs get all rpool/openindiana-1 in another shell] For reporting, the number is rounded to 2 decimal places. Ok. So the dedupratio I see for the entire pool is dedupe ratio for filesystems in this pool that have dedupe enabled ... yes ? Thank you - appreciated. Doesn't that mean that if I enabled dedupe on more than one filesystem, I can never know how much total, raw space each of those is using ? Because if the dedupe ratio is calculated across all of them, it's not the actual ratio for any one of them ... so even if I do the math, I can't decide what the total raw usage for one of them is ... right ? Correct. This is by design so that blocks shared amongst different datasets canbe deduped -- the common case for things like virtual machine images. Ok, but what about accounting ? If you have multiple deduped filesystems in a pool, you can *never know* how much space any single one of them is using ? That seems unbelievable... Ok - but from a performance point of view, I am only using ram/cpu resources for the deduping of just the individual filesystems I enabled dedupe on, right ? I hope that turning on dedupe for just one filesystem did not incur ram/cpu costs across the entire pool... It depends. -- richard Can you elaborate at all ? Dedupe can have fairly profound performance implications, and I'd like to know if I am paying a huge price just to get a dedupe on one little filesystem ... Thanks again. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
I'm hoping the answer is yes - I've been looking but do not see it ... Well, he is telling you to run the dtrace program as root in one window, and run the zfs get all command on a dataset in your pool in another window, to trigger the dataset_stats variable to be filled. none can hide from dtrace!# dtrace -qn 'dsl_dataset_stats:entry {this-ds = (dsl_dataset_t *)arg0;printf(%s\tcompressed size = %d\tuncompressed size=%d\n, this-ds-ds_dir-dd_myname, this-ds-ds_phys-ds_compressed_bytes, this-ds-ds_phys-ds_uncompressed_bytes)}'openindiana-1 compressed size = 3667988992 uncompressed size=3759321088 [zfs get all rpool/openindiana-1 in another shell] HTH -- Volker -- Volker A. Brandt Consulting and Support for Oracle Solaris Brandt Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANYEmail: v...@bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgröße: 46 Geschäftsführer: Rainer J.H. Brandt und Volker A. Brandt When logic and proportion have fallen sloppy dead ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Cold failover of COMSTAR iSCSI targets on shared storage
Hello all, With original old ZFS iSCSI implementation there was a shareiscsi property for the zvols to be shared out, and I believe all configuration pertinent to the iSCSI server was stored in the pool options (I may be wrong, but I'd expect that given that ZFS-attribute-based configs were deigned to atomically import and share pools over various protocols like CIFS and NFS). With COMSTAR which is more advanced and performant, all configs seem to be in the OS config files and/or SMF service properties - not in the pool in question. Does this mean that importing a pool with iSCSI zvols on a fresh host (LiveCD instance on the same box, or via failover of shared storage to a different host) will not be able to automagically share the iSCSI targets the same way as they were known in the initial OS that created and shared them - not until an admin defines the same LUNs and WWN numbers and such, manually? Is this a correct understanding (and does the problem exist indeed), or do I (hopefully) miss something? Thanks, //Jim Klimov ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
--- On Tue, 9/25/12, Volker A. Brandt v...@bb-c.de wrote: Well, he is telling you to run the dtrace program as root in one window, and run the zfs get all command on a dataset in your pool in another window, to trigger the dataset_stats variable to be filled. none can hide from dtrace!# dtrace -qn 'dsl_dataset_stats:entry {this-ds = (dsl_dataset_t *)arg0;printf(%s\tcompressed size = %d\tuncompressed size=%d\n, this-ds-ds_dir-dd_myname, this-ds-ds_phys-ds_compressed_bytes, this-ds-ds_phys-ds_uncompressed_bytes)}'openindiana-1 compressed size = 3667988992 uncompressed size=3759321088 [zfs get all rpool/openindiana-1 in another shell] Yes, he showed me that, I did it, it worked, and I thanked him. The reason it's hard to make the thread out in that last response is that his email is in rich text, or HTML of some kind, so there's no formatting, etc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting question about L2ARC
2012-09-11 16:29, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Dan Swartzendruber My first thought was everything is hitting in ARC, but that is clearly not the case, since it WAS gradually filling up the cache device. When things become colder in the ARC, they expire to the L2ARC (or simply expire, bypassing the L2ARC). So it's normal to start filling the L2ARC, even if you never hit anything in the L2ARC. Got me wondering: how many reads of a block from spinning rust suffice for it to ultimately get into L2ARC? Just one so it gets into a recent-read list of the ARC and then expires into L2ARC when ARC RAM is more needed for something else, and only when that L2ARC fills up does the block expire from these caches completely? Thanks, and sorry for a lame question ;) //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting question about L2ARC
On 9/25/2012 3:38 PM, Jim Klimov wrote: 2012-09-11 16:29, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Dan Swartzendruber My first thought was everything is hitting in ARC, but that is clearly not the case, since it WAS gradually filling up the cache device. When things become colder in the ARC, they expire to the L2ARC (or simply expire, bypassing the L2ARC). So it's normal to start filling the L2ARC, even if you never hit anything in the L2ARC. Got me wondering: how many reads of a block from spinning rust suffice for it to ultimately get into L2ARC? Just one so it gets into a recent-read list of the ARC and then expires into L2ARC when ARC RAM is more needed for something else, and only when that L2ARC fills up does the block expire from these caches completely? Good question. I don't remember if I posted my final status, but I put in 2 128GB SSDs and it's hitting them just fine. The working set seems to be right on 110GB. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cold failover of COMSTAR iSCSI targets on shared storage
On Sep 25, 2012, at 12:30 PM, Jim Klimov jimkli...@cos.ru wrote: Hello all, With original old ZFS iSCSI implementation there was a shareiscsi property for the zvols to be shared out, and I believe all configuration pertinent to the iSCSI server was stored in the pool options (I may be wrong, but I'd expect that given that ZFS-attribute-based configs were deigned to atomically import and share pools over various protocols like CIFS and NFS). With COMSTAR which is more advanced and performant, all configs seem to be in the OS config files and/or SMF service properties - not in the pool in question. Does this mean that importing a pool with iSCSI zvols on a fresh host (LiveCD instance on the same box, or via failover of shared storage to a different host) will not be able to automagically share the iSCSI targets the same way as they were known in the initial OS that created and shared them - not until an admin defines the same LUNs and WWN numbers and such, manually? Is this a correct understanding (and does the problem exist indeed), or do I (hopefully) miss something? That is pretty much how it works, with one small wrinkle -- the configuration is stored in SMF. So you can either do it the hard way (by hand), use a commercially-available HA solution (eg. RSF-1 from high-availability.com), or use SMF export/import. -- richard -- illumos Day ZFS Day, Oct 1-2, 2012 San Fransisco www.zfsday.com richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
On Sep 25, 2012, at 11:17 AM, Jason Usher jushe...@yahoo.com wrote: Ok - but from a performance point of view, I am only using ram/cpu resources for the deduping of just the individual filesystems I enabled dedupe on, right ? I hope that turning on dedupe for just one filesystem did not incur ram/cpu costs across the entire pool... It depends. -- richard Can you elaborate at all ? Dedupe can have fairly profound performance implications, and I'd like to know if I am paying a huge price just to get a dedupe on one little filesystem ... The short answer is: deduplication transforms big I/Os into small I/Os, but does not eliminate I/O. The reason is that the deduplication table has to be updated when you write something that is deduplicated. This implies that storage devices which are inexpensive in $/GB but expensive in $/IOPS might not be the best candidates for deduplication (eg. HDDs). There is some additional CPU overhead for the sha-256 hash that might or might not be noticeable, depending on your CPU. But perhaps the most important factor is your data -- is it dedupable and are the space savings worthwhile? There is no simple answer for that, but we generally recommend that you simulate dedup before committing to it. -- richard -- illumos Day ZFS Day, Oct 1-2, 2012 San Fransisco www.zfsday.com richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cold failover of COMSTAR iSCSI targets on shared storage
2012-09-26 0:21, Richard Elling пишет: Does this mean that importing a pool with iSCSI zvols on a fresh host (LiveCD instance on the same box, or via failover of shared storage to a different host) will not be able to automagically share the iSCSI targets the same way as they were known in the initial OS that created and shared them - not until an admin defines the same LUNs and WWN numbers and such, manually? Is this a correct understanding (and does the problem exist indeed), or do I (hopefully) miss something? That is pretty much how it works, with one small wrinkle -- the configuration is stored in SMF. So you can either do it the hard way (by hand), use a commercially-available HA solution (eg. RSF-1 from high-availability.com http://high-availability.com), or use SMF export/import. -- richard So if I wanted to make a solution where upon import of the pool with COMSTAR shared zvols, the new host is able to publish the same resources as the previous holder of the pool media, could I get away with some scripts (on all COMSTAR servers involved) which would: 1) Regularly svccfg export certain SMF service configs to a filesystem dataset on the pool in question. 2) Upon import of the pool, such scripts would svccfg import the SMF setup, svcadm refresh and maybe svcadm restart (or svcadm enable) the iSCSI SMF services and thus share the same zvols with same settings? Is this a correct understanding of doing shareiscsi for COMSTAR in the poor-man's HA setup? ;) Apparently, to be transparent for clients, this would also use VRRP or something like that to carry over the iSCSI targets' IP address(es), separate from general communications addressing of the hosts (the addressing info might also be in same dataset as SMF exports). Q: Which services are the complete list needed to set up the COMSTAR server from scratch? Thanks, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting question about L2ARC
On 09/25/2012 09:38 PM, Jim Klimov wrote: 2012-09-11 16:29, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Dan Swartzendruber My first thought was everything is hitting in ARC, but that is clearly not the case, since it WAS gradually filling up the cache device. When things become colder in the ARC, they expire to the L2ARC (or simply expire, bypassing the L2ARC). So it's normal to start filling the L2ARC, even if you never hit anything in the L2ARC. Got me wondering: how many reads of a block from spinning rust suffice for it to ultimately get into L2ARC? Just one so it gets into a recent-read list of the ARC and then expires into L2ARC when ARC RAM is more needed for something else, and only when that L2ARC fills up does the block expire from these caches completely? Thanks, and sorry for a lame question ;) Correct. See https://github.com/illumos/illumos-gate/blob/14d44f2248cc2a54490db7f7caa4da5968f90837/usr/src/uts/common/fs/zfs/arc.c#L3685 for an exact description of the ARC-L2ARC interaction mechanism. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
2012-09-24 21:08, Jason Usher wrote: Ok, thank you. The problem with this is, the compressratio only goes to two significant digits, which means if I do the math, I'm only getting an approximation. Since we may use these numbers to compute billing, it is important to get it right. Is there any way at all to get the real *exact* number ? Well, if you take into account snapshots and clones, you can see really small used numbers on datasets which reference a lot of data. In fact, for accounting you might be better off with the referenced field instead of used, but note that it is not recursive and you need to account each child dataset's byte references separately. I am not sure if there is a simple way to get exact byte-counts instead of roundings like 422M... HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cold failover of COMSTAR iSCSI targets on shared storage
On Sep 25, 2012, at 1:32 PM, Jim Klimov jimkli...@cos.ru wrote: 2012-09-26 0:21, Richard Elling пишет: Does this mean that importing a pool with iSCSI zvols on a fresh host (LiveCD instance on the same box, or via failover of shared storage to a different host) will not be able to automagically share the iSCSI targets the same way as they were known in the initial OS that created and shared them - not until an admin defines the same LUNs and WWN numbers and such, manually? Is this a correct understanding (and does the problem exist indeed), or do I (hopefully) miss something? That is pretty much how it works, with one small wrinkle -- the configuration is stored in SMF. So you can either do it the hard way (by hand), use a commercially-available HA solution (eg. RSF-1 from high-availability.com http://high-availability.com), or use SMF export/import. -- richard So if I wanted to make a solution where upon import of the pool with COMSTAR shared zvols, the new host is able to publish the same resources as the previous holder of the pool media, could I get away with some scripts (on all COMSTAR servers involved) which would: 1) Regularly svccfg export certain SMF service configs to a filesystem dataset on the pool in question. This is only needed when you add a new COMSTAR share. You will also need to remove old ones. Fortunately, you have a pool where you can store these :-) 2) Upon import of the pool, such scripts would svccfg import the SMF setup, svcadm refresh and maybe svcadm restart (or svcadm enable) the iSCSI SMF services and thus share the same zvols with same settings? Import should suffice. Is this a correct understanding of doing shareiscsi for COMSTAR in the poor-man's HA setup? ;) Yes. Apparently, to be transparent for clients, this would also use VRRP or something like that to carry over the iSCSI targets' IP address(es), separate from general communications addressing of the hosts (the addressing info might also be in same dataset as SMF exports). Or just add another IP address. This is how HA systems work. Q: Which services are the complete list needed to set up the COMSTAR server from scratch? Dunno off the top of my head. Network isn't needed (COMSTAR can serve FC), but you can look at the SMF configs for details. I haven't looked at the OHAC agents in a long, long time, but you might find some scripts already built there. -- richard -- illumos Day ZFS Day, Oct 1-2, 2012 San Fransisco www.zfsday.com richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
On Sep 25, 2012, at 1:46 PM, Jim Klimov jimkli...@cos.ru wrote: 2012-09-24 21:08, Jason Usher wrote: Ok, thank you. The problem with this is, the compressratio only goes to two significant digits, which means if I do the math, I'm only getting an approximation. Since we may use these numbers to compute billing, it is important to get it right. Is there any way at all to get the real *exact* number ? Well, if you take into account snapshots and clones, you can see really small used numbers on datasets which reference a lot of data. In fact, for accounting you might be better off with the referenced field instead of used, but note that it is not recursive and you need to account each child dataset's byte references separately. I am not sure if there is a simple way to get exact byte-counts instead of roundings like 422M... zfs get -p -- richard -- illumos Day ZFS Day, Oct 1-2, 2012 San Fransisco www.zfsday.com richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
2012-09-26 2:52, Richard Elling wrote: I am not sure if there is a simple way to get exact byte-counts instead of roundings like 422M... zfs get -p -- richard Thanks to all who corrected me, never too old to learn ;) # zfs get referenced rpool/export/home NAME PROPERTYVALUE SOURCE rpool/export/home referenced 5.41M - # zfs get -p referenced rpool/export/home NAME PROPERTYVALUE SOURCE rpool/export/home referenced 5677056 - Thanks, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss