Re: [zfs-discuss] partioned cache devices

2013-03-15 Thread Ian Collins

Andrew Werchowiecki wrote:


Hi all,

I'm having some trouble with adding cache drives to a zpool, anyone 
got any ideas?


muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2

Password:

cannot open '/dev/dsk/c25t10d1p2': I/O error

muslimwookie@Pyzee:~$

I have two SSDs in the system, I've created an 8gb partition on each 
drive for use as a mirrored write cache. I also have the remainder of 
the drive partitioned for use as the read only cache. However, when 
attempting to add it I get the error above.




Create one 100% Solaris partition and then use format to create two slices.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-15 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki
 
 muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
 Password:
 cannot open '/dev/dsk/c25t10d1p2': I/O error
 muslimwookie@Pyzee:~$
 
 I have two SSDs in the system, I've created an 8gb partition on each drive for
 use as a mirrored write cache. I also have the remainder of the drive
 partitioned for use as the read only cache. However, when attempting to add
 it I get the error above.

Sounds like you're probably running into confusion about how to partition the 
drive.  If you create fdisk partitions, they will be accessible as p0, p1, p2, 
but I think p0 unconditionally refers to the whole drive, so the first 
partition is p1, and the second is p2.

If you create one big solaris fdisk parititon and then slice it via partition 
where s2 is typically the encompassing slice, and people usually use s1 and s2 
and s6 for actual slices, then they will be accessible via s1, s2, s6

Generally speaking, it's unadvisable to split the slog/cache devices anyway.  
Because:  

If you're splitting it, evidently you're focusing on the wasted space.  Buying 
an expensive 128G device where you couldn't possibly ever use more than 4G or 
8G in the slog.  But that's not what you should be focusing on.  You should be 
focusing on the speed (that's why you bought it in the first place.)  The slog 
is write-only, and the cache is a mixture of read/write, where it should be 
hopefully doing more reads than writes.  But regardless of your actual success 
with the cache device, your cache device will be busy most of the time, and 
competing against the slog.

You have a mirror, you say.  You should probably drop both the cache  log.  
Use one whole device for the cache, use one whole device for the log.  The only 
risk you'll run is:

Since a slog is write-only (except during mount, typically at boot) it's 
possible to have a failure mode where you think you're writing to the log, but 
the first time you go back and read, you discover an error, and discover the 
device has gone bad.  In other words, without ever doing any reads, you might 
not notice when/if the device goes bad.  Fortunately, there's an easy 
workaround.  You could periodically (say, once a month) script the removal of 
your log device, create a junk pool, write a bunch of data to it, scrub it 
(thus verifying it was written correctly) and in the absence of any scrub 
errors, destroy the junk pool and re-add the device as a slog to the main pool.

I've never heard of anyone actually being that paranoid, and I've never heard 
of anyone actually experiencing the aforementioned possible undetected device 
failure mode.  So this is all mostly theoretical.

Mirroring the slog device really isn't necessary in the modern age.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4200 Question...

2013-03-15 Thread Tiernan OToole
Thanks for the info. I am planning g the install this weekend, between
formula one and other hardware upgrades... fingers crossed it works!
On 14 Mar 2013 09:19, Heiko L. h.lehm...@hs-lausitz.de wrote:


  support for VT, but nothing for AMD... The Opterons dont have VT, so i
 wont
  be using XEN, but the Zones may be useful...

 We use XEN/PV on X4200 for many years without problems.
 dom0: X4200+openindiana+xvm
 guests(PV): openindiana,linux/fedora,linux/debian
 (vmlinuz-2.6.32.28-xenU-32,vmlinuz-2.6.18-xenU64)


 regards Heiko

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Petabyte pool?

2013-03-15 Thread Marion Hakanson
Greetings,

Has anyone out there built a 1-petabyte pool?  I've been asked to look
into this, and was told low performance is fine, workload is likely
to be write-once, read-occasionally, archive storage of gene sequencing
data.  Probably a single 10Gbit NIC for connectivity is sufficient.

We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
Back-of-the-envelope might suggest stacking up eight to ten of those,
depending if you want a raw marketing petabyte, or a proper power-of-two
usable petabyte.

I get a little nervous at the thought of hooking all that up to a single
server, and am a little vague on how much RAM would be advisable, other
than as much as will fit (:-).  Then again, I've been waiting for
something like pNFS/NFSv4.1 to be usable for gluing together multiple
NFS servers into a single global namespace, without any sign of that
happening anytime soon.

So, has anyone done this?  Or come close to it?  Thoughts, even if you
haven't done it yourself?

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Ray Van Dolson
On Fri, Mar 15, 2013 at 06:09:34PM -0700, Marion Hakanson wrote:
 Greetings,
 
 Has anyone out there built a 1-petabyte pool?  I've been asked to look
 into this, and was told low performance is fine, workload is likely
 to be write-once, read-occasionally, archive storage of gene sequencing
 data.  Probably a single 10Gbit NIC for connectivity is sufficient.
 
 We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
 using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
 Back-of-the-envelope might suggest stacking up eight to ten of those,
 depending if you want a raw marketing petabyte, or a proper power-of-two
 usable petabyte.
 
 I get a little nervous at the thought of hooking all that up to a single
 server, and am a little vague on how much RAM would be advisable, other
 than as much as will fit (:-).  Then again, I've been waiting for
 something like pNFS/NFSv4.1 to be usable for gluing together multiple
 NFS servers into a single global namespace, without any sign of that
 happening anytime soon.
 
 So, has anyone done this?  Or come close to it?  Thoughts, even if you
 haven't done it yourself?
 
 Thanks and regards,
 
 Marion

We've come close:

admin@mes-str-imgnx-p1:~$ zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
datapool   978T   298T   680T30%  1.00x  ONLINE  -
syspool278G   104G   174G37%  1.00x  ONLINE  -

Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual
pathed to a couple of LSI SAS switches.

Using Nexenta but no reason you couldn't do this w/ $whatever.

We did triple parity and our vdev membership is set up such that we can
lose up to three JBODs and still be functional (one vdev member disk
per JBOD).

This is with 3TB NL-SAS drives.

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Petabyte pool?

2013-03-15 Thread Kristoffer Sheather @ CloudCentral
Well, off the top of my head:

2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's
8 x 60-Bay JBOD's with 60 x 4TB SAS drives
RAIDZ2 stripe over the 8 x JBOD's

That should fit within 1 rack comfortably and provide 1 PB storage..

Regards,

Kristoffer Sheather
Cloud Central
Scale Your Data Center In The Cloud 
Phone: 1300 144 007 | Mobile: +61 414 573 130 | Email: 
k...@cloudcentral.com.au
LinkedIn:   | Skype: kristoffer.sheather | Twitter: 
http://twitter.com/kristofferjon 


 From: Marion Hakanson hakan...@ohsu.edu
Sent: Saturday, March 16, 2013 12:12 PM
To: z...@lists.illumos.org
Subject: [zfs] Petabyte pool?

Greetings,

Has anyone out there built a 1-petabyte pool?  I've been asked to look
into this, and was told low performance is fine, workload is likely
to be write-once, read-occasionally, archive storage of gene sequencing
data.  Probably a single 10Gbit NIC for connectivity is sufficient.

We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
Back-of-the-envelope might suggest stacking up eight to ten of those,
depending if you want a raw marketing petabyte, or a proper 
power-of-two
usable petabyte.

I get a little nervous at the thought of hooking all that up to a single
server, and am a little vague on how much RAM would be advisable, other
than as much as will fit (:-).  Then again, I've been waiting for
something like pNFS/NFSv4.1 to be usable for gluing together multiple
NFS servers into a single global namespace, without any sign of that
happening anytime soon.

So, has anyone done this?  Or come close to it?  Thoughts, even if you
haven't done it yourself?

Thanks and regards,

Marion

---
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: 
https://www.listbox.com/member/archive/rss/182191/23629987-2afa167a
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=23629987id_secret=23629987-c48148
a8
Powered by Listbox: http://www.listbox.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Petabyte pool?

2013-03-15 Thread Kristoffer Sheather @ CloudCentral
Actually, you could use 3TB drives and with a 6/8 RAIDZ2 stripe achieve 
1080 TB usable.

You'll also need 8-16 x SAS ports available on each storage head to provide 
redundant multi-pathed SAS connectivity to the JBOD's, recommend LSI 
9207-8E's for those and Intel X520-DA2's for the 10G NIC's.


 From: Kristoffer Sheather @ CloudCentral 
kristoffer.sheat...@cloudcentral.com.au
Sent: Saturday, March 16, 2013 12:21 PM
To: z...@lists.illumos.org
Subject: re: [zfs] Petabyte pool?

Well, off the top of my head:

2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's
8 x 60-Bay JBOD's with 60 x 4TB SAS drives
RAIDZ2 stripe over the 8 x JBOD's

That should fit within 1 rack comfortably and provide 1 PB storage..

Regards,

Kristoffer Sheather
Cloud Central
Scale Your Data Center In The Cloud 
Phone: 1300 144 007 | Mobile: +61 414 573 130 | Email: 
k...@cloudcentral.com.au
LinkedIn:   | Skype: kristoffer.sheather | Twitter: 
http://twitter.com/kristofferjon 


 From: Marion Hakanson hakan...@ohsu.edu
Sent: Saturday, March 16, 2013 12:12 PM
To: z...@lists.illumos.org
Subject: [zfs] Petabyte pool?

Greetings,

Has anyone out there built a 1-petabyte pool?  I've been asked to look
into this, and was told low performance is fine, workload is likely
to be write-once, read-occasionally, archive storage of gene sequencing
data.  Probably a single 10Gbit NIC for connectivity is sufficient.

We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
Back-of-the-envelope might suggest stacking up eight to ten of those,
depending if you want a raw marketing petabyte, or a proper 
power-of-two
usable petabyte.

I get a little nervous at the thought of hooking all that up to a single
server, and am a little vague on how much RAM would be advisable, other
than as much as will fit (:-).  Then again, I've been waiting for
something like pNFS/NFSv4.1 to be usable for gluing together multiple
NFS servers into a single global namespace, without any sign of that
happening anytime soon.

So, has anyone done this?  Or come close to it?  Thoughts, even if you
haven't done it yourself?

Thanks and regards,

Marion

---
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: 
https://www.listbox.com/member/archive/rss/182191/23629987-2afa167a
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=23629987id_secret=23629987-c48148
a8
Powered by Listbox: http://www.listbox.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Jan Owoc
On Fri, Mar 15, 2013 at 7:09 PM, Marion Hakanson hakan...@ohsu.edu wrote:
 Has anyone out there built a 1-petabyte pool?

I'm not advising against your building/configuring a system yourself,
but I suggest taking look at the Petarack:
http://www.aberdeeninc.com/abcatg/petarack.htm

It shows it's been done with ZFS :-).

Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Marion Hakanson
rvandol...@esri.com said:
 We've come close:
 
 admin@mes-str-imgnx-p1:~$ zpool list
 NAME   SIZE  ALLOC   FREECAP DEDUP  HEALTH  ALTROOT
 datapool   978T   298T   680T30%  1.00x  ONLINE  -
 syspool278G   104G   174G37%  1.00x  ONLINE  -
 
 Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to
 a couple of LSI SAS switches. 

Thanks Ray,

We've been looking at those too (we've had good luck with our MD1200's).

How many HBA's in the R720?

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Ray Van Dolson
On Fri, Mar 15, 2013 at 06:31:11PM -0700, Marion Hakanson wrote:
 rvandol...@esri.com said:
  We've come close:
  
  admin@mes-str-imgnx-p1:~$ zpool list
  NAME   SIZE  ALLOC   FREECAP DEDUP  HEALTH  ALTROOT
  datapool   978T   298T   680T30%  1.00x  ONLINE  -
  syspool278G   104G   174G37%  1.00x  ONLINE  -
  
  Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed 
  to
  a couple of LSI SAS switches. 
 
 Thanks Ray,
 
 We've been looking at those too (we've had good luck with our MD1200's).
 
 How many HBA's in the R720?
 
 Thanks and regards,
 
 Marion

We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).

Ray

[1] 
http://accessories.us.dell.com/sna/productdetail.aspx?c=usl=ens=hiedcs=65sku=a4614101
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Marion Hakanson
Ray said:
 Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed
 to a couple of LSI SAS switches. 
 
Marion said:
 How many HBA's in the R720?

Ray said:
 We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).

Sounds similar in approach to the Aberdeen product another sender referred to,
with SAS switch layout:
  http://www.aberdeeninc.com/images/1-up-petarack2.jpg

One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives
in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout
on a 40-slot server with 40x SATA drives in it.  But the server uses no SAS
expanders, instead using SAS-to-SATA octopus cables to connect the drives
directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i).

What I found was that the internal pool was significantly faster for both
sequential and random I/O than the pool on the external JBOD.

My conclusion was that I would not want to exceed ~48 drives on a single
8-port SAS HBA.  So I thought that running the I/O of all your hundreds
of drives through only two HBA's would be a bottleneck.

LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec
for that card in an x8 PCIe-2.0 slot.  Sure, the newer 9207-8e is rated
at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same
8 SAS ports going at 4800MBytes/sec.

Yes, I know the disks probably can't go that fast.  But in my tests
above, the internal 40-disk pool measures 2000MBytes/sec sequential
reads and writes, while the external 40-disk JBOD measures at 1500
to 1700 MBytes/sec.  Not a lot slower, but significantly slower, so
I do think the number of HBA's makes a difference.

At the moment, I'm leaning toward piling six, eight, or ten HBA's into
a server, preferably one with dual IOH's (thus two PCIe busses), and
connecting dual-path JBOD's in that manner.

I hadn't looked into SAS switches much, but they do look more reliable
than daisy-chaining a bunch of JBOD's together.  I just haven't seen
how to get more bandwidth through them to a single host.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Richard Elling
On Mar 15, 2013, at 6:09 PM, Marion Hakanson hakan...@ohsu.edu wrote:

 Greetings,
 
 Has anyone out there built a 1-petabyte pool?

Yes, I've done quite a few.

  I've been asked to look
 into this, and was told low performance is fine, workload is likely
 to be write-once, read-occasionally, archive storage of gene sequencing
 data.  Probably a single 10Gbit NIC for connectivity is sufficient.
 
 We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
 using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
 Back-of-the-envelope might suggest stacking up eight to ten of those,
 depending if you want a raw marketing petabyte, or a proper power-of-two
 usable petabyte.

Yes. NB, for the PHB, using N^2 is found 2B less effective than N^10.

 I get a little nervous at the thought of hooking all that up to a single
 server, and am a little vague on how much RAM would be advisable, other
 than as much as will fit (:-).  Then again, I've been waiting for
 something like pNFS/NFSv4.1 to be usable for gluing together multiple
 NFS servers into a single global namespace, without any sign of that
 happening anytime soon.

NFS v4 or DFS (or even clever sysadmin + automount) offers single namespace
without needing the complexity of NFSv4.1, lustre, glusterfs, etc.

 
 So, has anyone done this?  Or come close to it?  Thoughts, even if you
 haven't done it yourself?

Don't forget about backups :-)
 -- richard


--

richard.ell...@richardelling.com
+1-760-896-4422









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss