[zfs-discuss] Recommendation for home NAS external JBOD

2012-06-17 Thread Koopmann, Jan-Peter
Hi,

my oi151 based home NAS is approaching a frightening drive space level. Right 
now the data volume is a 4*1TB Raid-Z1, 3 1/2 local disks individually 
connected to an 8 port LSI 6Gbit controller.

So I can either exchange the disks one by one with autoexpand, use 2-4 TB disks 
and be happy. This was my original approach. However I am totally unclear about 
the 512b vs 4Kb issue. What sata disk could I use that is big enough and still 
uses 512b? I know about the discussion about the upgrade from a 512b based pool 
to a 4 KB pool but I fail to see a conclusion. Will the autoexpand mechanism 
upgrade ashift? And what disks do not lie? Is the performance impact 
significant?

So I started to think about option 2. That would be using an external JOBD 
chassis (4-8 disks) and eSATA. But I would either need a JBOD with 4-8 eSATA 
connectors (which I am yet to find) or use a JBOD with a good expander. I see 
several cheap sata to esata jbod chassis making use of port multiplier. Is 
this referring to a expander backplane and will work with oi, LSI and mpt or 
mpt_sas? I am aware that this is not the most performant solution but this is a 
home nas storing tons of pictures and videos only. And I could use the internal 
disks for backup purposes. 

Any suggestion for components are greatly appreciated. 

And before you ask: Currently I have 3TB net. 6 TB net would be the minimum 
target. 9TB sounds nicer. So if you have 512b HD recommendations with 2/3TB 
each or a good JBOD suggestion, please let me know!


Kind regards,
   JP

smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendation for home NAS external JBOD

2012-06-17 Thread Timothy Coalson
 So I can either exchange the disks one by one with autoexpand, use 2-4 TB
 disks and be happy. This was my original approach. However I am totally
 unclear about the 512b vs 4Kb issue. What sata disk could I use that is big
 enough and still uses 512b? I know about the discussion about the upgrade
 from a 512b based pool to a 4 KB pool but I fail to see a conclusion. Will
 the autoexpand mechanism upgrade ashift? And what disks do not lie? Is the
 performance impact significant?

Replacing devices will not change the ashift, it is set permanently
when a vdev is created, and zpool will refuse to replace a device in
an ashift=9 vdev with a device that it would use ashift=12 on.  Large
Western Digital disks tend to say they have 4k sectors, and hence
cannot be used to replace your current disks, while hitachi and
seagate offer 512 emulated disks, which should allow you to replace
your current disks without needing to copy the contents of the pool to
a new one.  If you don't have serious performance requirements, you
may not notice the impact of emulated 512 sectors (especially since
zfs buffers async writes into transaction groups).  I did some
rudimentary testing on a large pool of hitachi 3TB 512 emulated disks
with ashift=9 vs ashift=12 with bonnie, and it didn't seem to matter a
whole lot (though its possibly relevant tests were large writes, which
have little penalty, and character at a time, which was bottlenecked
by the cpu since the test was single threaded, so it didn't test the
worst case).  The worst case for 512 emulated sectors on zfs is
probably small (4KB or so) synchronous writes (which if they mattered
to you, you would probably have a separate log device, in which case
the data disk write penalty may not matter).

 So I started to think about option 2. That would be using an external JOBD
 chassis (4-8 disks) and eSATA. But I would either need a JBOD with 4-8 eSATA
 connectors (which I am yet to find) or use a JBOD with a good expander. I
 see several cheap sata to esata jbod chassis making use of port
 multiplier. Is this referring to a expander backplane and will work with
 oi, LSI and mpt or mpt_sas?

I'm wondering, based on the comment about routing 4 eSATA cables, what
kind of options your NAS case has, if your LSI controller has SFF-8087
connectors (or possibly even if it doesn't), you might be able to use
an adapter to the SFF-8088 external 4 lane SAS connector, which may
increase your options.  It seems that support for SATA port multiplier
is not mandatory in a controller, so you will want to check with LSI
before trying it (I would hope they support it on SAS controllers,
since I think it is a vastly simplified version of SAS expanders).

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendation for home NAS external JBOD

2012-06-17 Thread Jim Klimov

2012-06-17 19:11, Koopmann, Jan-Peter пишет:

Hi,

my oi151 based home NAS is approaching a frightening drive space
level. Right now the data volume is a 4*1TB Raid-Z1, 3 1/2 local disks
individually connected to an 8 port LSI 6Gbit controller.

So I can either exchange the disks one by one with autoexpand, use 2-4
TB disks and be happy. This was my original approach. However I am
totally unclear about the 512b vs 4Kb issue. What sata disk could I use
that is big enough and still uses 512b? I know about the discussion
about the upgrade from a 512b based pool to a 4 KB pool but I fail to
see a conclusion. Will the autoexpand mechanism upgrade ashift? And what
disks do not lie? Is the performance impact significant?


AFAIK the Hitachi Desk/Ultra-Star (5K3000, 7K3000) should be 512b
native, maybe the only ones at this size. Larger 4TB Hitachi models
are 4KB native, 512e emulated - according to datasheets on site.

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendation for home NAS external JBOD

2012-06-17 Thread Daniel Carosone
On Sun, Jun 17, 2012 at 03:19:18PM -0500, Timothy Coalson wrote:
 Replacing devices will not change the ashift, it is set permanently
 when a vdev is created, and zpool will refuse to replace a device in
 an ashift=9 vdev with a device that it would use ashift=12 on. 

Yep.

 [..] while hitachi and seagate offer 512 emulated disks

 I did some rudimentary testing on a large pool of hitachi 3TB 512 emulated 
 disks with ashift=9 vs ashift=12 with bonnie, and it didn't seem to matter a
 whole lot 

Hitachi are native 512-byte sectors.  At least, the 5k3000 and 7k3000
are, in the 2T and 3T sizes. I haven't noticed if they have a newer
model which is 4k native.  

How long that continues to remain the case, and how long these models
continue to remain available (e.g. for replacements) is entirely
another matter.  The replacement applies even to under-warranty cases;
I know someone who recently had a 4k-only drive supplied as a warranty
replacement for a 512 native drive (not, in this case, from Hitachi).

As for performance, at least in my experience with WD disks
emulating 512-byte sectors, you *will* notice the difference; heavy
metadata updates being the most obvious impact.

The conclusion is that unless your environment is well controlled, the
time has probably come where new general-purpose pools should be made
at ashift=12, to allow future flexibility.

 I'm wondering, based on the comment about routing 4 eSATA cables, what
 kind of options your NAS case has, if your LSI controller has SFF-8087
 connectors (or possibly even if it doesn't), you might be able to use
 an adapter to the SFF-8088 external 4 lane SAS connector, which may
 increase your options.  It seems that support for SATA port multiplier
 is not mandatory in a controller, so you will want to check with LSI
 before trying it (I would hope they support it on SAS controllers,
 since I think it is a vastly simplified version of SAS expanders).

SATA port-multipliers and SAS expanders are not related in any sense
of common driver support; they're similar only in general concept. 

Do not conflate them.

--
Dan.


pgpzXJn70kX7n.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-17 Thread Sašo Kiselkov
On 06/13/2012 03:43 PM, Roch wrote:
 
 Sašo Kiselkov writes:
   On 06/12/2012 05:37 PM, Roch Bourbonnais wrote:

So the xcall are necessary part of memory reclaiming, when one needs to 
 tear down the TLB entry mapping the physical memory (which can from here on 
 be repurposed).
So the xcall are just part of this. Should not cause trouble, but they 
 do. They consume a cpu for some time.

That in turn can cause infrequent latency bubble on the network. A 
 certain root cause of these latency bubble is that network thread are bound 
 by default and
if the xcall storm ends up on the CPU that the network thread is bound 
 to, it will wait for the storm to pass.
   
   I understand, but the xcall storm settles only eats up a single core out
   of a total of 32, plus it's not a single specific one, it tends to
   change, so what are the odds of hitting the same core as the one on
   which the mac thread is running?
   
 
 That's easy :-) : 1/32 each time it needs to run. So depending on how often 
 it runs (which depends on how
 much churn there is in the ARC) and how often you see the latency bubbles, 
 that may or may
 not be it.
 
 What is zio_taskq_batch_pct on your system ? That is another storm bit of 
 code which
 causes bubble. Setting it down to 50 (versus an older default of 100) should 
 help if it's
 not done already.
 
 -r

So I tried all of the suggestions above (mac unbinding, zio_taskq
tuning) and none helped. I'm beginning to suspect it has something to do
with the networking cards. When I try and snoop filtered traffic from
one interface into a file (snoop -o /tmp/dump -rd vlan935 host
a.b.c.d), my multicast reception throughput plummets to about 1/3 of
the original.

I'm running a link-aggregation of 4 on-board Broadcom NICs:

# dladm show-aggr -x
LINK PORT SPEED DUPLEX   STATE ADDRESSPORTSTATE
aggr0--   1000Mb fullupd0:67:e5:fc:bd:38  --
 bnx1 1000Mb fullupd0:67:e5:fc:bd:38  attached
 bnx2 1000Mb fullupd0:67:e5:fc:bd:3a  attached
 bnx3 1000Mb fullupd0:67:e5:fc:bd:3c  attached
 bnx0 1000Mb fullupd0:67:e5:fc:bd:36  attached

# dladm show-vlan
LINKVID  OVER FLAGS
vlan49  49   aggr0-
vlan934 934  aggr0-
vlan935 935  aggr0-

Normally, I'm getting around 46MB/s on vlan935, however, once I run any
snoop command which puts the network interfaces into promisc mode, my
throughput plummets to around 20MB/s. During that I can see context
switches skyrocket on 4 CPU cores and them being around 75% busy. Now I
understand that snoop has some probe effect, but this is definitely too
large. I've never seen this kind of bad behavior before on any of my
other Solaris systems (with similar load).

Are there any tunings I can make to my network to track down the issue?
My module for bnx is:

# modinfo | grep bnx
169 f80a7000  63ba0 197   1  bnx (Broadcom NXII GbE 6.0.1)

Regards,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendation for home NAS external JBOD

2012-06-17 Thread Koopmann, Jan-Peter
Hi Tim,

thanks to you and the others for answering.

 worst case).  The worst case for 512 emulated sectors on zfs is
 probably small (4KB or so) synchronous writes (which if they mattered
 to you, you would probably have a separate log device, in which case
 the data disk write penalty may not matter).

Good to know. This really opens up the possibility of buying 3 or 4TB
Hitachi drives. At least the 4TB Hitachi drives are 4k (512b emulated)
drives according to the latest news.


 I'm wondering, based on the comment about routing 4 eSATA cables, what
 kind of options your NAS case has, if your LSI controller has SFF-8087
 connectors (or possibly even if it doesn't),

It has actually. 



 you might be able to use
 an adapter to the SFF-8088 external 4 lane SAS connector, which may
 increase your options.

So what you are saying is that something like this will do the trick?

http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp

If I interpret this correctly I get a SFF-8087 to SFF-8088 bracket, connect
the 4 port LSI SFF-8077 to that bracket, then get a cable for this JBOD and
throw in 4 drives? This would leave me with four additional HDDs without any
SAS expander hassle. I had not come across these JBODs. Thanks a million for
the hint.

Do we agree that for a home NAS box a Hitachi Deskstar (not explicitly being
a server SATA drive) will suffice despite potential TLER problems? I was
thinking about Hitachi Deskstar 5k3000 drives. The 4TB seemingly came out
but are rather expensive in comparisonŠ


Kind regards,
   JP




smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendation for home NAS external JBOD

2012-06-17 Thread Carson Gaspar

On 6/17/12 3:21 PM, Koopmann, Jan-Peter wrote:

Hi Tim,



you might be able to use
an adapter to the SFF-8088 external 4 lane SAS connector, which may
increase your options.


So what you are saying is that something like this will do the trick?

http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp

If I interpret this correctly I get a SFF-8087 to SFF-8088 bracket,
connect the 4 port LSI SFF-8077 to that bracket, then get a cable for
this JBOD and throw in 4 drives? This would leave me with four
additional HDDs without any SAS expander hassle. I had not come across
these JBODs. Thanks a million for the hint.


I have 2 Sans Digital TR8X JBOD enclosures, and they work very well. 
They also make a 4-bay TR4X.


http://www.sansdigital.com/towerraid/tr4xb.html
http://www.sansdigital.com/towerraid/tr8xb.html

They cost a bit more than the one you linked to, but the drives are hot 
swap. They also make similar cases with port multipliers, RAID, etc., 
but I've only used the JBOD.



--
Carson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendation for home NAS external JBOD

2012-06-17 Thread Timothy Coalson
 worst case).  The worst case for 512 emulated sectors on zfs is
 probably small (4KB or so) synchronous writes (which if they mattered
 to you, you would probably have a separate log device, in which case
 the data disk write penalty may not matter).


 Good to know. This really opens up the possibility of buying 3 or 4TB
 Hitachi drives. At least the 4TB Hitachi drives are 4k (512b emulated)
 drives according to the latest news.

It appears from the specs listed on the hitachi site that the drives I
have may actually be 512 native, in which case my testing was moot.
This does explain some other things I saw testing the drives in
question, so I will assume they are 512 native, and that my testing
was meaningless.  If you copy folders containing thousands of small
files frequently, the performance impact may be relevant, if you go
for the 512 emulated drives.

 So what you are saying is that something like this will do the trick?

 http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp

 If I interpret this correctly I get a SFF-8087 to SFF-8088 bracket, connect
 the 4 port LSI SFF-8077 to that bracket, then get a cable for this JBOD and
 throw in 4 drives? This would leave me with four additional HDDs without any
 SAS expander hassle. I had not come across these JBODs. Thanks a million for
 the hint.

No problem, and yes, I think that should work.  One thing to keep in
mind, though, is that if the internals of the enclosure simply split
the multilane SAS cable into 4 connectors without an expander, and you
use SATA drives, the controller will use SATA mode, which as I
understand it runs at a lower signalling voltage, and won't work over
long cables, so get a short cable (1 meter, shorter if you can find
one).  It looks like all of the ones mentioned so far use this method,
though it would be good to know if Carson populated his with SATA
drives.

 Do we agree that for a home NAS box a Hitachi Deskstar (not explicitly being
 a server SATA drive) will suffice despite potential TLER problems? I was
 thinking about Hitachi Deskstar 5k3000 drives. The 4TB seemingly came out
 but are rather expensive in comparison…

I'm not sure what ZFS's timeout for dropping an unresponsive disk is,
or what it does when it responds again, so I don't know if TLER would
help.  I have not had any serious problems with my pool of hitachi 3TB
5400 drives.  Two different drives had a checksum error, once each,
but stayed online in the pool.

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendation for home NAS external JBOD

2012-06-17 Thread Carson Gaspar

On 6/17/12 6:36 PM, Timothy Coalson wrote:


No problem, and yes, I think that should work.  One thing to keep in
mind, though, is that if the internals of the enclosure simply split
the multilane SAS cable into 4 connectors without an expander, and you
use SATA drives, the controller will use SATA mode, which as I
understand it runs at a lower signalling voltage, and won't work over
long cables, so get a short cable (1 meter, shorter if you can find
one).  It looks like all of the ones mentioned so far use this method,
though it would be good to know if Carson populated his with SATA
drives.


SATA drives using 1m cables from an LSI SAS9201-16e

--
Carson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss