Re: [zfs-discuss] How to avoid striping ?
Le 18 oct. 2010 à 08:44, Habony, Zsolt zsolt.hab...@hp.com a écrit : Hi, I have seen a similar question on this list in the archive but haven’t seen the answer. Can I avoid striping across top level vdevs ? If I use a zpool which is one LUN from the SAN, and when it becomes full I add a new LUN to it. But I cannot guarantee that the LUN will not come from the same spindles on the SAN. Can I force zpool to not to stripe the data ? No. The basic principle of the zpool is dynamic striping across vdevs in order to ensure that all available spindles are contributing to the workload. If you want/need more granular control over what data goes to which disk, then you'll need to create multiple pools. Just create a new pool from the new SAN volume and you will segregate the IO. But then you risk having hot and cold spots in your storage as the IO won't be striped. If the approach is to fill a vdev completely before adding a new one this possibility exists anyway until the block rewrite arrives to redistribute existing data across available vdevs. Cheers, Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On 18 Oct 2010, at 12:40, Habony, Zsolt wrote: Is there a way to avoid it, or can we be sure that the problem does not exist at all ? Grow the existing LUN rather than adding another one. The only way to have ZFS not stripe is to not give it devices to stripe over. So stick with simple mirrors ... (I do not mirror, as the storage gives redundancy behind LUNs.) Then you lose ZFS self healing ability. Sami ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On 18 Oct 2010, at 17:44, Habony, Zsolt wrote: Thank You all for the comments. You should imagine a datacenter with - standards not completely depending on me. - SAN for many OSs, one of them is Solaris, (and not the major amount) So you get luns from the storage team and there is nothing you can do about it. Just use the luns you get as well as you can then. Which is host based mirrored zpool. - usually level 2 engineers doing filesystem increases. - hundreds of physical boxes, dozens of virtuals on one physical - ability to move VMs (zones) across physical boxes. (by assigning LUNs to other boxes) You can do that even if the raid management is done host based with zfs. That probably explains, that I cannot use host based raid management, it is done by storage as standard. No it does not. I would still let zfs do the raid management on host side even if you can't stop the storage team from raiding it again on the storage box. I cannot assign whole disks to boxes, as I get LUNs standardized for all other OSs, and in a size optimized for virtual small virtual machines. You still should mirror across two storage boxes. Sami ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On 18/10/2010 07:44, Habony, Zsolt wrote: I have seen a similar question on this list in the archive but haven’t seen the answer. Can I avoid striping across top level vdevs ? If I use a zpool which is one LUN from the SAN, and when it becomes full I add a new LUN to it. But I cannot guarantee that the LUN will not come from the same spindles on the SAN. That sounds like a problem with your SAN config if that matters to you. Can I force zpool to not to stripe the data ? You can't, but why do you care ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
In many large datacenters, a different storage team handles LUN requests and assignment. We ask a LUN in a specific size, and we get one. It might result that the first vdev (LUN) is on a beginning of a RAID set on the storage, and the second vdev is on the end of the same RAID set on the same physical disks. (If not in the creation time, then later, during the increase of a filled zpool, by adding a LUN) I worry about head thrashing. Though memory cache of large storage should make the problem easier, I would be more happy if I can be sure that zpool will not be handled as a stripe. Is there a way to avoid it, or can we be sure that the problem does not exist at all ? -Original Message- From: Darren J Moffat [mailto:darr...@opensolaris.org] Sent: 2010. október 18. 10:19 To: Habony, Zsolt Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] How to avoid striping ? On 18/10/2010 07:44, Habony, Zsolt wrote: I have seen a similar question on this list in the archive but haven't seen the answer. Can I avoid striping across top level vdevs ? If I use a zpool which is one LUN from the SAN, and when it becomes full I add a new LUN to it. But I cannot guarantee that the LUN will not come from the same spindles on the SAN. That sounds like a problem with your SAN config if that matters to you. Can I force zpool to not to stripe the data ? You can't, but why do you care ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
No. The basic principle of the zpool is dynamic striping across vdevs in order to ensure that all available spindles are contributing to the workload. If you want/need more granular control over what data goes to which disk, then you'll need to create multiple pools. Just create a new pool from the new SAN volume and you will segregate the IO. That's my understanding and that's my problem. You have an application filesystem from one LUN. (vxfs is expensive, ufs/svm is not really able to handle online filesystem increase. Thus we plan to use zfs for application filesystems.) When it fills up you increase it by adding a new LUN. You have to make sure that the added LUN is from different physical disks. Is might be not obvious with todays large storages with thousands of LUNs. If I can force concatenation, then I do not have to investigate, where are the existing parts of the filesystems. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On 18/10/2010 09:28, Habony, Zsolt wrote: I worry about head thrashing. Though memory cache of large storage should make the problem Is that really something you should be worried about with all the other software and hardware between ZFS and the actual drives ? If that is a problem then it isn't ZFS causing it, it will just be using the LUNs that was given to it by the SAN. An access pattern of an application on a completely different filesystem could still mean that you are using both LUNs in that way. Is there a way to avoid it, or can we be sure that the problem does not exist at all ? Grow the existing LUN rather than adding another one. The only way to have ZFS not stripe is to not give it devices to stripe over. So stick with simple mirrors eg this style of configuration: pool: builds state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM builds ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 Where in your configuration c7t3d0/c8t4d0 are your LUNs from the SAN. Rather than this style: pool: builds state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM builds ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On 18/10/2010 10:01, Habony, Zsolt wrote: If I can force concatenation, then I do not have to investigate, where are the existing parts of the filesystems. You can't, the code for concatenation rather than stripping does not exist and there are no plans to add it. Instead of assuming you have a problem I'd highly recommend you go with the recommendation in my other email or don't worry about it. Don't assume that you will have a problem with ZFS because of your experience with other systems. Striping isn't bad it is usually good. Or fix the root cause of the problem - which in this example case isn't ZFS - on the SAN where the LUNs are getting allocated. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On Mon, Oct 18, 2010 at 1:28 AM, Habony, Zsolt zsolt.hab...@hp.com wrote: Is there a way to avoid it, or can we be sure that the problem does not exist at all ? ZFS will coalesce asynchronous writes, which should help for most of the head trash on write. Using a log device will convert sync writes to async. For reads, make sure you have enough memory and a cache device. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
Hi, Habony, Zsolt writes: You have an application filesystem from one LUN. (vxfs is expensive, ufs/svm is not really able to handle online filesystem increase. Thus we plan to use zfs for application filesystems.) What do you mean by not really? Use metattach to grow a metadevice or soft partition. Use growfs to grow UFS on the grown device. Rainer -- Rainer J. H. Brandt Brandt Brandt Computer GmbH Am Wiesenpfad 6, 53340 Meckenheim Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt Handelsregister: Amtsgericht Bonn, HRB 10513 RFC 5322: Each line [...] SHOULD be no more than 78 characters ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On 10/18/10 2:13 AM, Rainer J.H. Brandt wrote: Habony, Zsolt writes: You have an application filesystem from one LUN. (vxfs is expensive, ufs/svm is not really able to handle online filesystem increase. Thus we plan to use zfs for application filesystems.) What do you mean by not really? Use metattach to grow a metadevice or soft partition. Use growfs to grow UFS on the grown device. He is probably referring to the fact that growfs locks the filesystem. -- Carson Gaspar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
You have an application filesystem from one LUN. (vxfs is expensive, ufs/svm is not really able to handle online filesystem increase. Thus we plan to use zfs for application filesystems.) What do you mean by not really? ... Use growfs to grow UFS on the grown device. I know its off-toopic but the statement: growfs will ``write-lock'' (see lockfs(1M)) a mounted file system when expanding. made me always uncomfortable with this online expansion. I cannot guarantee how a specific application will behave during the expansion. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
Is there a way to avoid it, or can we be sure that the problem does not exist at all ? Grow the existing LUN rather than adding another one. The only way to have ZFS not stripe is to not give it devices to stripe over. So stick with simple mirrors ... (I do not mirror, as the storage gives redundancy behind LUNs.) Online LUN expansion seems promising, and answering my question. Thank You for that. Zsolt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
You have an application filesystem from one LUN. (vxfs is expensive, ufs/svm is not really able to handle online filesystem increase. Thus we plan to use zfs for application filesystems.) What do you mean by not really? ... Use growfs to grow UFS on the grown device. I know its off-toopic but the statement: growfs will ``write-lock'' (see lockfs(1M)) a mounted filesystem when expanding. made me always uncomfortable with this online expansion. I cannot guarantee how a specific application will behave during the expansion. -w Write-lock (wlock) the specified file-system. wlock suspends writes that would modify the file system. Access times are not kept while a file system is write- locked. All the applications trying to write will suspend. What would be the risk of that? Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/18/2010 4:28 AM, Habony, Zsolt wrote: I worry about head thrashing. Why? If your SAN group gives you a LUN that is at the opposite end of the array, I would think that was because they had already assigned the space in the middle to other customers (other groups like yours, or other hosts of yours.) If so, don't you think that all those other hosts and customers will be reading and writing from that array all the time anyway? I mean if the heads are going to 'thrash', then they'll be doing so even before you request your second LUN right? Adding your second LUN to the mix isn't going to seriously change the workload on the disks in the array. Though memory cache of large storage should make the problem easier, I would be more happy if I can be sure that zpool will not be handled as a stripe. Is there a way to avoid it, or can we be sure that the problem does not exist at all ? As I think the logic above suggests, If the problem exists, it exists even when you only have 1 LUN. -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMvFeKAAoJEEADRM+bKN5wuc4IALPTIrGcAq6TWa95yrA/DCWp vu2K7+pwSvz/IRIP+C6Y+qvWm/Km+UdtRu6PKb8G/DF8xp5vEnkqXdRSNDC6FlpR EwSNavS7ij87bN6fuBiw6E02GZtADi2RptPKgyGz1FT3wPDHS8SQKtA59DwrWJNS ckHUi+9BwngL4p7E0C+8pcahyF7QmtTm3DpL3y4AZ+7O+c/wPcIwLZ3dI6yQU8vd KuRe6h/xCHffKH9gHoXJf0pG4e5iA8XP+lt7DlJGPxRYzZil0Rr5JA67uGqEf/VY FbhAtXqWrHkNSd2sk1bIJVj7OFCS6j/NXMkV/Dt6OUH2Gkucl1nBs4yIAQ9Hu3s= =I+w1 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/18/2010 5:40 AM, Habony, Zsolt wrote: (I do not mirror, as the storage gives redundancy behind LUNs.) By not enabling redundancy (Mirror or RAIDZ[123]) at the ZFS level, you are opening yourself to corruption problems that the underlying SAN storage can't protect you from. the SAN array won't even notice the problem. ZFS will notice the problem, and (if you don't give it redundancy to work with) it won't be able to repair it for you. You'd be better off getting unprotected LUNS from the Array, and letting ZFS handle the redundancy. -Kyle Online LUN expansion seems promising, and answering my question. Thank You for that. Zsolt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMvFhtAAoJEEADRM+bKN5wmgwIAK2HCAtaHkAp2RxqfkcFGD3A 0YyzP148fzTcEpFwhpNm59nht9fsfAibjCZZ/HmApe2jYWJ2K9l4W0MBXedXnz3e gEaIxqymSHLjkF2SF0OD2XfnNiDMor5CrzPirZMcAL7TeyIqyACeuQTVVqZPw2rZ TF1fGG2M9Y0l1Gq5+PfNcGESiz4tb7Er6UtDnLFe7rx4DObNJnO07jr1BMBxHsp8 tL1+YxhAUpWvaKOqHJvruZRtxagdE1KUQAtipPQjZvFudqIVAT8PRL0Acwz0D6aq Lv1nmYzGg3M1usjrbfSEDV2eM3WR3gc7px93xyxZ1kMQPOgRO7X0YRxwfUMEsUc= =+YXG -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On Mon, Oct 18, 2010 at 3:28 AM, Habony, Zsolt zsolt.hab...@hp.com wrote: In many large datacenters, a different storage team handles LUN requests and assignment. We ask a LUN in a specific size, and we get one. It might result that the first vdev (LUN) is on a beginning of a RAID set on the storage, and the second vdev is on the end of the same RAID set on the same physical disks. (If not in the creation time, then later, during the increase of a filled zpool, by adding a LUN) I worry about head thrashing. Though memory cache of large storage should make the problem easier, I would be more happy if I can be sure that zpool will not be handled as a stripe. Is there a way to avoid it, or can we be sure that the problem does not exist at all ? -Original Message- From: Darren J Moffat [mailto:darr...@opensolaris.org] Sent: 2010. október 18. 10:19 To: Habony, Zsolt Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] How to avoid striping ? On 18/10/2010 07:44, Habony, Zsolt wrote: I have seen a similar question on this list in the archive but haven't seen the answer. Can I avoid striping across top level vdevs ? If I use a zpool which is one LUN from the SAN, and when it becomes full I add a new LUN to it. But I cannot guarantee that the LUN will not come from the same spindles on the SAN. That sounds like a problem with your SAN config if that matters to you. Can I force zpool to not to stripe the data ? You can't, but why do you care ? -- Darren J Moffat It shouldn't matter if LUN's are on the same backend disk. Unless the manufacturer of the array is brain dead, their wide striping algorithm should handle it without breaking a sweat. If the pool of disk can't service the number of IOPS, the storage team should be moving LUN's around, that's what they get paid to do. Your *issue* shouldn't be an issue at all unless the backend disk is junk. I've never seen an issue with Hitachi's HDP or NetApp's aggregates. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On 2010-Oct-18 17:45:34 +0800, casper@sun.com casper@sun.com wrote: Write-lock (wlock) the specified file-system. wlock suspends writes that would modify the file system. Access times are not kept while a file system is write- locked. All the applications trying to write will suspend. What would be the risk of that? At least some versions of Oracle rdbms have timeouts around I/O and will abort if I/O operations don't complete within a short period. -- Peter Jeremy pgp1r1gM7cLEs.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss