Re: [zfs-discuss] partioned cache devices
Andrew Werchowiecki wrote: Hi all, I'm having some trouble with adding cache drives to a zpool, anyone got any ideas? muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 Password: cannot open '/dev/dsk/c25t10d1p2': I/O error muslimwookie@Pyzee:~$ I have two SSDs in the system, I've created an 8gb partition on each drive for use as a mirrored write cache. I also have the remainder of the drive partitioned for use as the read only cache. However, when attempting to add it I get the error above. Create one 100% Solaris partition and then use format to create two slices. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 Password: cannot open '/dev/dsk/c25t10d1p2': I/O error muslimwookie@Pyzee:~$ I have two SSDs in the system, I've created an 8gb partition on each drive for use as a mirrored write cache. I also have the remainder of the drive partitioned for use as the read only cache. However, when attempting to add it I get the error above. Sounds like you're probably running into confusion about how to partition the drive. If you create fdisk partitions, they will be accessible as p0, p1, p2, but I think p0 unconditionally refers to the whole drive, so the first partition is p1, and the second is p2. If you create one big solaris fdisk parititon and then slice it via partition where s2 is typically the encompassing slice, and people usually use s1 and s2 and s6 for actual slices, then they will be accessible via s1, s2, s6 Generally speaking, it's unadvisable to split the slog/cache devices anyway. Because: If you're splitting it, evidently you're focusing on the wasted space. Buying an expensive 128G device where you couldn't possibly ever use more than 4G or 8G in the slog. But that's not what you should be focusing on. You should be focusing on the speed (that's why you bought it in the first place.) The slog is write-only, and the cache is a mixture of read/write, where it should be hopefully doing more reads than writes. But regardless of your actual success with the cache device, your cache device will be busy most of the time, and competing against the slog. You have a mirror, you say. You should probably drop both the cache log. Use one whole device for the cache, use one whole device for the log. The only risk you'll run is: Since a slog is write-only (except during mount, typically at boot) it's possible to have a failure mode where you think you're writing to the log, but the first time you go back and read, you discover an error, and discover the device has gone bad. In other words, without ever doing any reads, you might not notice when/if the device goes bad. Fortunately, there's an easy workaround. You could periodically (say, once a month) script the removal of your log device, create a junk pool, write a bunch of data to it, scrub it (thus verifying it was written correctly) and in the absence of any scrub errors, destroy the junk pool and re-add the device as a slog to the main pool. I've never heard of anyone actually being that paranoid, and I've never heard of anyone actually experiencing the aforementioned possible undetected device failure mode. So this is all mostly theoretical. Mirroring the slog device really isn't necessary in the modern age. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun X4200 Question...
Thanks for the info. I am planning g the install this weekend, between formula one and other hardware upgrades... fingers crossed it works! On 14 Mar 2013 09:19, Heiko L. h.lehm...@hs-lausitz.de wrote: support for VT, but nothing for AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones may be useful... We use XEN/PV on X4200 for many years without problems. dom0: X4200+openindiana+xvm guests(PV): openindiana,linux/fedora,linux/debian (vmlinuz-2.6.32.28-xenU-32,vmlinuz-2.6.18-xenU64) regards Heiko ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Petabyte pool?
Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
On Fri, Mar 15, 2013 at 06:09:34PM -0700, Marion Hakanson wrote: Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion We've come close: admin@mes-str-imgnx-p1:~$ zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT datapool 978T 298T 680T30% 1.00x ONLINE - syspool278G 104G 174G37% 1.00x ONLINE - Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Using Nexenta but no reason you couldn't do this w/ $whatever. We did triple parity and our vdev membership is set up such that we can lose up to three JBODs and still be functional (one vdev member disk per JBOD). This is with 3TB NL-SAS drives. Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Petabyte pool?
Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. Regards, Kristoffer Sheather Cloud Central Scale Your Data Center In The Cloud Phone: 1300 144 007 | Mobile: +61 414 573 130 | Email: k...@cloudcentral.com.au LinkedIn: | Skype: kristoffer.sheather | Twitter: http://twitter.com/kristofferjon From: Marion Hakanson hakan...@ohsu.edu Sent: Saturday, March 16, 2013 12:12 PM To: z...@lists.illumos.org Subject: [zfs] Petabyte pool? Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion --- illumos-zfs Archives: https://www.listbox.com/member/archive/182191/=now RSS Feed: https://www.listbox.com/member/archive/rss/182191/23629987-2afa167a Modify Your Subscription: https://www.listbox.com/member/?member_id=23629987id_secret=23629987-c48148 a8 Powered by Listbox: http://www.listbox.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Petabyte pool?
Actually, you could use 3TB drives and with a 6/8 RAIDZ2 stripe achieve 1080 TB usable. You'll also need 8-16 x SAS ports available on each storage head to provide redundant multi-pathed SAS connectivity to the JBOD's, recommend LSI 9207-8E's for those and Intel X520-DA2's for the 10G NIC's. From: Kristoffer Sheather @ CloudCentral kristoffer.sheat...@cloudcentral.com.au Sent: Saturday, March 16, 2013 12:21 PM To: z...@lists.illumos.org Subject: re: [zfs] Petabyte pool? Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. Regards, Kristoffer Sheather Cloud Central Scale Your Data Center In The Cloud Phone: 1300 144 007 | Mobile: +61 414 573 130 | Email: k...@cloudcentral.com.au LinkedIn: | Skype: kristoffer.sheather | Twitter: http://twitter.com/kristofferjon From: Marion Hakanson hakan...@ohsu.edu Sent: Saturday, March 16, 2013 12:12 PM To: z...@lists.illumos.org Subject: [zfs] Petabyte pool? Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion --- illumos-zfs Archives: https://www.listbox.com/member/archive/182191/=now RSS Feed: https://www.listbox.com/member/archive/rss/182191/23629987-2afa167a Modify Your Subscription: https://www.listbox.com/member/?member_id=23629987id_secret=23629987-c48148 a8 Powered by Listbox: http://www.listbox.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
On Fri, Mar 15, 2013 at 7:09 PM, Marion Hakanson hakan...@ohsu.edu wrote: Has anyone out there built a 1-petabyte pool? I'm not advising against your building/configuring a system yourself, but I suggest taking look at the Petarack: http://www.aberdeeninc.com/abcatg/petarack.htm It shows it's been done with ZFS :-). Jan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
rvandol...@esri.com said: We've come close: admin@mes-str-imgnx-p1:~$ zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT datapool 978T 298T 680T30% 1.00x ONLINE - syspool278G 104G 174G37% 1.00x ONLINE - Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Thanks Ray, We've been looking at those too (we've had good luck with our MD1200's). How many HBA's in the R720? Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
On Fri, Mar 15, 2013 at 06:31:11PM -0700, Marion Hakanson wrote: rvandol...@esri.com said: We've come close: admin@mes-str-imgnx-p1:~$ zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT datapool 978T 298T 680T30% 1.00x ONLINE - syspool278G 104G 174G37% 1.00x ONLINE - Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Thanks Ray, We've been looking at those too (we've had good luck with our MD1200's). How many HBA's in the R720? Thanks and regards, Marion We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]). Ray [1] http://accessories.us.dell.com/sna/productdetail.aspx?c=usl=ens=hiedcs=65sku=a4614101 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
Ray said: Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Marion said: How many HBA's in the R720? Ray said: We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]). Sounds similar in approach to the Aberdeen product another sender referred to, with SAS switch layout: http://www.aberdeeninc.com/images/1-up-petarack2.jpg One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout on a 40-slot server with 40x SATA drives in it. But the server uses no SAS expanders, instead using SAS-to-SATA octopus cables to connect the drives directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i). What I found was that the internal pool was significantly faster for both sequential and random I/O than the pool on the external JBOD. My conclusion was that I would not want to exceed ~48 drives on a single 8-port SAS HBA. So I thought that running the I/O of all your hundreds of drives through only two HBA's would be a bottleneck. LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec for that card in an x8 PCIe-2.0 slot. Sure, the newer 9207-8e is rated at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same 8 SAS ports going at 4800MBytes/sec. Yes, I know the disks probably can't go that fast. But in my tests above, the internal 40-disk pool measures 2000MBytes/sec sequential reads and writes, while the external 40-disk JBOD measures at 1500 to 1700 MBytes/sec. Not a lot slower, but significantly slower, so I do think the number of HBA's makes a difference. At the moment, I'm leaning toward piling six, eight, or ten HBA's into a server, preferably one with dual IOH's (thus two PCIe busses), and connecting dual-path JBOD's in that manner. I hadn't looked into SAS switches much, but they do look more reliable than daisy-chaining a bunch of JBOD's together. I just haven't seen how to get more bandwidth through them to a single host. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
On Mar 15, 2013, at 6:09 PM, Marion Hakanson hakan...@ohsu.edu wrote: Greetings, Has anyone out there built a 1-petabyte pool? Yes, I've done quite a few. I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. Yes. NB, for the PHB, using N^2 is found 2B less effective than N^10. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. NFS v4 or DFS (or even clever sysadmin + automount) offers single namespace without needing the complexity of NFSv4.1, lustre, glusterfs, etc. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Don't forget about backups :-) -- richard -- richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss