[zfs-discuss] Recommendation for home NAS external JBOD
Hi, my oi151 based home NAS is approaching a frightening drive space level. Right now the data volume is a 4*1TB Raid-Z1, 3 1/2 local disks individually connected to an 8 port LSI 6Gbit controller. So I can either exchange the disks one by one with autoexpand, use 2-4 TB disks and be happy. This was my original approach. However I am totally unclear about the 512b vs 4Kb issue. What sata disk could I use that is big enough and still uses 512b? I know about the discussion about the upgrade from a 512b based pool to a 4 KB pool but I fail to see a conclusion. Will the autoexpand mechanism upgrade ashift? And what disks do not lie? Is the performance impact significant? So I started to think about option 2. That would be using an external JOBD chassis (4-8 disks) and eSATA. But I would either need a JBOD with 4-8 eSATA connectors (which I am yet to find) or use a JBOD with a good expander. I see several cheap sata to esata jbod chassis making use of port multiplier. Is this referring to a expander backplane and will work with oi, LSI and mpt or mpt_sas? I am aware that this is not the most performant solution but this is a home nas storing tons of pictures and videos only. And I could use the internal disks for backup purposes. Any suggestion for components are greatly appreciated. And before you ask: Currently I have 3TB net. 6 TB net would be the minimum target. 9TB sounds nicer. So if you have 512b HD recommendations with 2/3TB each or a good JBOD suggestion, please let me know! Kind regards, JP smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendation for home NAS external JBOD
So I can either exchange the disks one by one with autoexpand, use 2-4 TB disks and be happy. This was my original approach. However I am totally unclear about the 512b vs 4Kb issue. What sata disk could I use that is big enough and still uses 512b? I know about the discussion about the upgrade from a 512b based pool to a 4 KB pool but I fail to see a conclusion. Will the autoexpand mechanism upgrade ashift? And what disks do not lie? Is the performance impact significant? Replacing devices will not change the ashift, it is set permanently when a vdev is created, and zpool will refuse to replace a device in an ashift=9 vdev with a device that it would use ashift=12 on. Large Western Digital disks tend to say they have 4k sectors, and hence cannot be used to replace your current disks, while hitachi and seagate offer 512 emulated disks, which should allow you to replace your current disks without needing to copy the contents of the pool to a new one. If you don't have serious performance requirements, you may not notice the impact of emulated 512 sectors (especially since zfs buffers async writes into transaction groups). I did some rudimentary testing on a large pool of hitachi 3TB 512 emulated disks with ashift=9 vs ashift=12 with bonnie, and it didn't seem to matter a whole lot (though its possibly relevant tests were large writes, which have little penalty, and character at a time, which was bottlenecked by the cpu since the test was single threaded, so it didn't test the worst case). The worst case for 512 emulated sectors on zfs is probably small (4KB or so) synchronous writes (which if they mattered to you, you would probably have a separate log device, in which case the data disk write penalty may not matter). So I started to think about option 2. That would be using an external JOBD chassis (4-8 disks) and eSATA. But I would either need a JBOD with 4-8 eSATA connectors (which I am yet to find) or use a JBOD with a good expander. I see several cheap sata to esata jbod chassis making use of port multiplier. Is this referring to a expander backplane and will work with oi, LSI and mpt or mpt_sas? I'm wondering, based on the comment about routing 4 eSATA cables, what kind of options your NAS case has, if your LSI controller has SFF-8087 connectors (or possibly even if it doesn't), you might be able to use an adapter to the SFF-8088 external 4 lane SAS connector, which may increase your options. It seems that support for SATA port multiplier is not mandatory in a controller, so you will want to check with LSI before trying it (I would hope they support it on SAS controllers, since I think it is a vastly simplified version of SAS expanders). Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendation for home NAS external JBOD
2012-06-17 19:11, Koopmann, Jan-Peter пишет: Hi, my oi151 based home NAS is approaching a frightening drive space level. Right now the data volume is a 4*1TB Raid-Z1, 3 1/2 local disks individually connected to an 8 port LSI 6Gbit controller. So I can either exchange the disks one by one with autoexpand, use 2-4 TB disks and be happy. This was my original approach. However I am totally unclear about the 512b vs 4Kb issue. What sata disk could I use that is big enough and still uses 512b? I know about the discussion about the upgrade from a 512b based pool to a 4 KB pool but I fail to see a conclusion. Will the autoexpand mechanism upgrade ashift? And what disks do not lie? Is the performance impact significant? AFAIK the Hitachi Desk/Ultra-Star (5K3000, 7K3000) should be 512b native, maybe the only ones at this size. Larger 4TB Hitachi models are 4KB native, 512e emulated - according to datasheets on site. HTH, //Jim Klimov ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendation for home NAS external JBOD
On Sun, Jun 17, 2012 at 03:19:18PM -0500, Timothy Coalson wrote: Replacing devices will not change the ashift, it is set permanently when a vdev is created, and zpool will refuse to replace a device in an ashift=9 vdev with a device that it would use ashift=12 on. Yep. [..] while hitachi and seagate offer 512 emulated disks I did some rudimentary testing on a large pool of hitachi 3TB 512 emulated disks with ashift=9 vs ashift=12 with bonnie, and it didn't seem to matter a whole lot Hitachi are native 512-byte sectors. At least, the 5k3000 and 7k3000 are, in the 2T and 3T sizes. I haven't noticed if they have a newer model which is 4k native. How long that continues to remain the case, and how long these models continue to remain available (e.g. for replacements) is entirely another matter. The replacement applies even to under-warranty cases; I know someone who recently had a 4k-only drive supplied as a warranty replacement for a 512 native drive (not, in this case, from Hitachi). As for performance, at least in my experience with WD disks emulating 512-byte sectors, you *will* notice the difference; heavy metadata updates being the most obvious impact. The conclusion is that unless your environment is well controlled, the time has probably come where new general-purpose pools should be made at ashift=12, to allow future flexibility. I'm wondering, based on the comment about routing 4 eSATA cables, what kind of options your NAS case has, if your LSI controller has SFF-8087 connectors (or possibly even if it doesn't), you might be able to use an adapter to the SFF-8088 external 4 lane SAS connector, which may increase your options. It seems that support for SATA port multiplier is not mandatory in a controller, so you will want to check with LSI before trying it (I would hope they support it on SAS controllers, since I think it is a vastly simplified version of SAS expanders). SATA port-multipliers and SAS expanders are not related in any sense of common driver support; they're similar only in general concept. Do not conflate them. -- Dan. pgpzXJn70kX7n.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free
On 06/13/2012 03:43 PM, Roch wrote: Sašo Kiselkov writes: On 06/12/2012 05:37 PM, Roch Bourbonnais wrote: So the xcall are necessary part of memory reclaiming, when one needs to tear down the TLB entry mapping the physical memory (which can from here on be repurposed). So the xcall are just part of this. Should not cause trouble, but they do. They consume a cpu for some time. That in turn can cause infrequent latency bubble on the network. A certain root cause of these latency bubble is that network thread are bound by default and if the xcall storm ends up on the CPU that the network thread is bound to, it will wait for the storm to pass. I understand, but the xcall storm settles only eats up a single core out of a total of 32, plus it's not a single specific one, it tends to change, so what are the odds of hitting the same core as the one on which the mac thread is running? That's easy :-) : 1/32 each time it needs to run. So depending on how often it runs (which depends on how much churn there is in the ARC) and how often you see the latency bubbles, that may or may not be it. What is zio_taskq_batch_pct on your system ? That is another storm bit of code which causes bubble. Setting it down to 50 (versus an older default of 100) should help if it's not done already. -r So I tried all of the suggestions above (mac unbinding, zio_taskq tuning) and none helped. I'm beginning to suspect it has something to do with the networking cards. When I try and snoop filtered traffic from one interface into a file (snoop -o /tmp/dump -rd vlan935 host a.b.c.d), my multicast reception throughput plummets to about 1/3 of the original. I'm running a link-aggregation of 4 on-board Broadcom NICs: # dladm show-aggr -x LINK PORT SPEED DUPLEX STATE ADDRESSPORTSTATE aggr0-- 1000Mb fullupd0:67:e5:fc:bd:38 -- bnx1 1000Mb fullupd0:67:e5:fc:bd:38 attached bnx2 1000Mb fullupd0:67:e5:fc:bd:3a attached bnx3 1000Mb fullupd0:67:e5:fc:bd:3c attached bnx0 1000Mb fullupd0:67:e5:fc:bd:36 attached # dladm show-vlan LINKVID OVER FLAGS vlan49 49 aggr0- vlan934 934 aggr0- vlan935 935 aggr0- Normally, I'm getting around 46MB/s on vlan935, however, once I run any snoop command which puts the network interfaces into promisc mode, my throughput plummets to around 20MB/s. During that I can see context switches skyrocket on 4 CPU cores and them being around 75% busy. Now I understand that snoop has some probe effect, but this is definitely too large. I've never seen this kind of bad behavior before on any of my other Solaris systems (with similar load). Are there any tunings I can make to my network to track down the issue? My module for bnx is: # modinfo | grep bnx 169 f80a7000 63ba0 197 1 bnx (Broadcom NXII GbE 6.0.1) Regards, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendation for home NAS external JBOD
Hi Tim, thanks to you and the others for answering. worst case). The worst case for 512 emulated sectors on zfs is probably small (4KB or so) synchronous writes (which if they mattered to you, you would probably have a separate log device, in which case the data disk write penalty may not matter). Good to know. This really opens up the possibility of buying 3 or 4TB Hitachi drives. At least the 4TB Hitachi drives are 4k (512b emulated) drives according to the latest news. I'm wondering, based on the comment about routing 4 eSATA cables, what kind of options your NAS case has, if your LSI controller has SFF-8087 connectors (or possibly even if it doesn't), It has actually. you might be able to use an adapter to the SFF-8088 external 4 lane SAS connector, which may increase your options. So what you are saying is that something like this will do the trick? http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp If I interpret this correctly I get a SFF-8087 to SFF-8088 bracket, connect the 4 port LSI SFF-8077 to that bracket, then get a cable for this JBOD and throw in 4 drives? This would leave me with four additional HDDs without any SAS expander hassle. I had not come across these JBODs. Thanks a million for the hint. Do we agree that for a home NAS box a Hitachi Deskstar (not explicitly being a server SATA drive) will suffice despite potential TLER problems? I was thinking about Hitachi Deskstar 5k3000 drives. The 4TB seemingly came out but are rather expensive in comparison Kind regards, JP smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendation for home NAS external JBOD
On 6/17/12 3:21 PM, Koopmann, Jan-Peter wrote: Hi Tim, you might be able to use an adapter to the SFF-8088 external 4 lane SAS connector, which may increase your options. So what you are saying is that something like this will do the trick? http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp If I interpret this correctly I get a SFF-8087 to SFF-8088 bracket, connect the 4 port LSI SFF-8077 to that bracket, then get a cable for this JBOD and throw in 4 drives? This would leave me with four additional HDDs without any SAS expander hassle. I had not come across these JBODs. Thanks a million for the hint. I have 2 Sans Digital TR8X JBOD enclosures, and they work very well. They also make a 4-bay TR4X. http://www.sansdigital.com/towerraid/tr4xb.html http://www.sansdigital.com/towerraid/tr8xb.html They cost a bit more than the one you linked to, but the drives are hot swap. They also make similar cases with port multipliers, RAID, etc., but I've only used the JBOD. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendation for home NAS external JBOD
worst case). The worst case for 512 emulated sectors on zfs is probably small (4KB or so) synchronous writes (which if they mattered to you, you would probably have a separate log device, in which case the data disk write penalty may not matter). Good to know. This really opens up the possibility of buying 3 or 4TB Hitachi drives. At least the 4TB Hitachi drives are 4k (512b emulated) drives according to the latest news. It appears from the specs listed on the hitachi site that the drives I have may actually be 512 native, in which case my testing was moot. This does explain some other things I saw testing the drives in question, so I will assume they are 512 native, and that my testing was meaningless. If you copy folders containing thousands of small files frequently, the performance impact may be relevant, if you go for the 512 emulated drives. So what you are saying is that something like this will do the trick? http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp If I interpret this correctly I get a SFF-8087 to SFF-8088 bracket, connect the 4 port LSI SFF-8077 to that bracket, then get a cable for this JBOD and throw in 4 drives? This would leave me with four additional HDDs without any SAS expander hassle. I had not come across these JBODs. Thanks a million for the hint. No problem, and yes, I think that should work. One thing to keep in mind, though, is that if the internals of the enclosure simply split the multilane SAS cable into 4 connectors without an expander, and you use SATA drives, the controller will use SATA mode, which as I understand it runs at a lower signalling voltage, and won't work over long cables, so get a short cable (1 meter, shorter if you can find one). It looks like all of the ones mentioned so far use this method, though it would be good to know if Carson populated his with SATA drives. Do we agree that for a home NAS box a Hitachi Deskstar (not explicitly being a server SATA drive) will suffice despite potential TLER problems? I was thinking about Hitachi Deskstar 5k3000 drives. The 4TB seemingly came out but are rather expensive in comparison… I'm not sure what ZFS's timeout for dropping an unresponsive disk is, or what it does when it responds again, so I don't know if TLER would help. I have not had any serious problems with my pool of hitachi 3TB 5400 drives. Two different drives had a checksum error, once each, but stayed online in the pool. Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendation for home NAS external JBOD
On 6/17/12 6:36 PM, Timothy Coalson wrote: No problem, and yes, I think that should work. One thing to keep in mind, though, is that if the internals of the enclosure simply split the multilane SAS cable into 4 connectors without an expander, and you use SATA drives, the controller will use SATA mode, which as I understand it runs at a lower signalling voltage, and won't work over long cables, so get a short cable (1 meter, shorter if you can find one). It looks like all of the ones mentioned so far use this method, though it would be good to know if Carson populated his with SATA drives. SATA drives using 1m cables from an LSI SAS9201-16e -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss