Re: [zfs-discuss] NFS performance?

2010-07-26 Thread Garrett D'Amore
On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote:
 On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore garr...@nexenta.com wrote:
  On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
 
  I think there may be very good reason to use iSCSI, if you're limited
  to gigabit but need to be able to handle higher throughput for a
  single client. I may be wrong, but I believe iSCSI to/from a single
  initiator can take advantage of multiple links in an active-active
  multipath scenario whereas NFS is only going to be able to take
  advantage of 1 link (at least until pNFS).
 
  There are other ways to get multiple paths.  First off, there is IP
  multipathing. which offers some of this at the IP layer.  There is also
  802.3ad link aggregation (trunking).  So you can still get high
  performance beyond  single link with NFS.  (It works with iSCSI too,
  btw.)
 
 With both IPMP and link aggregation, each TCP session will go over the
 same wire.  There is no guarantee that load will be evenly balanced
 between links when there are multiple TCP sessions.  As such, any
 scalability you get using these configurations will be dependent on
 having a complex enough workload, wise cconfiguration choices, and and
 a bit of luck.

If you're really that concerned, you could use UDP instead of TCP.  But
that may have other detrimental performance impacts, I'm not sure how
bad they would be in a data center with generally lossless ethernet
links.

Btw, I am not certain that the multiple initiator support (mpxio) is
necessarily any better as far as guaranteed performance/balancing.  (It
may be; I've not looked closely enough at it.)

I should look more closely at NFS as well -- if multiple applications on
the same client are access the same filesystem, do they use a single
common TCP session, or can they each have separate instances open?
Again, I'm not sure.

 
 Note that with Sun Trunking there was an option to load balance using
 a round robin hashing algorithm.  When pushing high network loads this
 may cause performance problems with reassembly.

Yes.  Reassembly is Evil for TCP performance.

Btw, the iSCSI balancing act that was described does seem a bit
contrived -- a single initiator and a COMSTAR server, both client *and
server* with multiple ethernet links instead of a single 10GbE link.

I'm not saying it doesn't happen, but I think it happens infrequently
enough that its reasonable that this scenario wasn't one that popped
immediately into my head. :-)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Volume Issue

2010-07-26 Thread Ketan
I 've a ZFS volume exported to one of my Ldom .. but now the Ldom does not see 
the data and complaing missing device .. is there any way i can mount or see 
what in the volume ... or check if the volume got corrupted or some other issue 
?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fails to import zpool

2010-07-26 Thread Jorge Montes IV
Ok I played around with the physical configuration and placed them on the 
original controller and zdb -l is now able to unpack LABEL 0,1,2,3 for all 
drives in the pool.  I also changed the hostname in opensolaris to 
freenas.local  as that is what was listed in the zdb -l(although I doubt this 
matters).  

The new setup looks like this:
 
@freenas:~/dskp0s# zpool import
  pool: Raidz
id: 14119036174566039103
 state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-72
config:

Raidz FAULTED  corrupted data
  raidz1-0FAULTED  corrupted data
c5t0d0p0  ONLINE
c5t1d0p0  ONLINE
c5t2d0s2  ONLINE
c5t3d0p0  ONLINE
c5t4d0p0  ONLINE
c5t5d0p0  ONLINE

@freenas:~/dskp0s# ls -l /dev/dsk/c5*
lrwxrwxrwx   1 root root  62 Jul 23 08:05 /dev/dsk/c5t0d0p0 - 
../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@0,0:q
lrwxrwxrwx   1 root root  62 Jul 23 08:05 /dev/dsk/c5t1d0p0 - 
../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@1,0:q
lrwxrwxrwx   1 root root  62 Jul 23 08:05 /dev/dsk/c5t2d0s2 - 
../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@2,0:c
lrwxrwxrwx   1 root root  62 Jul 23 08:05 /dev/dsk/c5t3d0p0 - 
../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@3,0:q
lrwxrwxrwx   1 root root  62 Jul 23 08:05 /dev/dsk/c5t4d0p0 - 
../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@4,0:q
lrwxrwxrwx   1 root root  62 Jul 23 08:05 /dev/dsk/c5t5d0p0 - 
../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@5,0:q

The output of the above command was edited to show only the devices listed in 
the pool.

I then made symlinks to the devices directly as follows in a dir call /dskp0s

@freenas:~/dskp0s# ls -l
total 17
lrwxrwxrwx   1 root root  57 Jul 23 08:40 aacdu0 - 
/devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@0,0:q
lrwxrwxrwx   1 root root  57 Jul 23 08:40 aacdu1 - 
/devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@1,0:q
lrwxrwxrwx   1 root root  57 Jul 23 08:40 aacdu2 - 
/devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@2,0:q
lrwxrwxrwx   1 root root  57 Jul 23 08:40 aacdu3 - 
/devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@3,0:q
lrwxrwxrwx   1 root root  57 Jul 23 08:41 aacdu4 - 
/devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@4,0:q
lrwxrwxrwx   1 root root  57 Jul 23 08:41 aacdu5 - 
/devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@5,0:q
-rw-r--r--   1 root root1992 Jul 26 07:30 zpool.cache

Note: aacdu2 symlink was linked to d...@2,0:q instead of d...@2,0:c because in 
FreeNAS the disks should be identical(maybe this is part of the problem)?  zdb 
-l completes with either symlink.

then I ran these commands from /dskp0s directory

@freenas:~/dskp0s# zpool import Raidz
cannot import 'Raidz': pool may be in use from other system
use '-f' to import anyway

@freenas:~/dskp0s# zpool import -d . Raidz
cannot import 'Raidz': pool may be in use from other system
use '-f' to import anyway

@freenas:~/dskp0s# zpool import -f Raidz
cannot import 'Raidz': one or more devices is currently unavailable
Destroy and re-create the pool from
a backup source.

@freenas:~/dskp0s# zpool import -d . -f Raidz
cannot import 'Raidz': one or more devices is currently unavailable
Destroy and re-create the pool from
a backup source.

@freenas:~/dskp0s# zpool import -F Raidz
cannot import 'Raidz': pool may be in use from other system
use '-f' to import anyway

@freenas:~/dskp0s# zpool import -d . -F Raidz
cannot import 'Raidz': pool may be in use from other system
use '-f' to import anyway

@freenas:~/dskp0s# zdb -l aacdu0

LABEL 0

version: 6
name: 'Raidz'
state: 0
txg: 11730350
pool_guid: 14119036174566039103
hostid: 0
hostname: 'freenas.local'
top_guid: 16879648846521942561
guid: 6543046729241888600
vdev_tree:
type: 'raidz'
id: 0
guid: 16879648846521942561
nparity: 1
metaslab_array: 14
metaslab_shift: 32
ashift: 9
asize: 6000992059392
children[0]:
type: 'disk'
id: 0
guid: 6543046729241888600
path: '/dev/aacdu0'
whole_disk: 0
children[1]:
type: 'disk'
id: 1
guid: 14313209149820231630
path: '/dev/aacdu1'
whole_disk: 0
children[2]:
type: 'disk'
id: 2
guid: 5383435113781649515
path: '/dev/aacdu2'

Re: [zfs-discuss] NFS performance?

2010-07-26 Thread Mike Gerdts
On Mon, Jul 26, 2010 at 1:27 AM, Garrett D'Amore garr...@nexenta.com wrote:
 On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote:
 On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore garr...@nexenta.com wrote:
  On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
 
  I think there may be very good reason to use iSCSI, if you're limited
  to gigabit but need to be able to handle higher throughput for a
  single client. I may be wrong, but I believe iSCSI to/from a single
  initiator can take advantage of multiple links in an active-active
  multipath scenario whereas NFS is only going to be able to take
  advantage of 1 link (at least until pNFS).
 
  There are other ways to get multiple paths.  First off, there is IP
  multipathing. which offers some of this at the IP layer.  There is also
  802.3ad link aggregation (trunking).  So you can still get high
  performance beyond  single link with NFS.  (It works with iSCSI too,
  btw.)

 With both IPMP and link aggregation, each TCP session will go over the
 same wire.  There is no guarantee that load will be evenly balanced
 between links when there are multiple TCP sessions.  As such, any
 scalability you get using these configurations will be dependent on
 having a complex enough workload, wise cconfiguration choices, and and
 a bit of luck.

 If you're really that concerned, you could use UDP instead of TCP.  But
 that may have other detrimental performance impacts, I'm not sure how
 bad they would be in a data center with generally lossless ethernet
 links.

Heh.  My horror story with reassembly was actually with connectionless
transports (LLT, then UDP).  Oracle RAC's cache fusion sends 8 KB
blocks via UDP by default, or LLT when used in the Veritas + Oracle
RAC certified configuration from 5+ years ago.  The use of Sun
trunking with round robin hashing and the lack of use of jumbo packets
made every cache fusion block turn into 6 LLT or UDP packets that had
to be reassembled on the other end.  This was on a 15K domain with the
NICs spread across IO boards.  I assume that interrupts for a NIC are
handled by a CPU on the closest system board (Solaris 8, FWIW).  If
that assumption is true then there would also be a flurry of
inter-system board chatter to put the block back together.  In any
case, performance was horrible until we got rid of round robin and
enabled jumbo frames.

 Btw, I am not certain that the multiple initiator support (mpxio) is
 necessarily any better as far as guaranteed performance/balancing.  (It
 may be; I've not looked closely enough at it.)

I haven't paid close attention to how mpxio works.  The Veritas
analog, vxdmp, does a very good job of balancing traffic down multiple
paths, even when only a single LUN is accessed.  The exact mode that
dmp will use is dependent on the capabilities of the array it is
talking to - many arrays work in an active/passive mode.  As such, I
would expect that with vxdmp or mpxio the balancing with iSCSI would
be at least partially dependent on what the array said to do.

 I should look more closely at NFS as well -- if multiple applications on
 the same client are access the same filesystem, do they use a single
 common TCP session, or can they each have separate instances open?
 Again, I'm not sure.

It's worse than that.  A quick experiment with two different
automounted home directories from the same NFS server suggests that
both home directories share one TCP session to the NFS server.

The latest version of Oracle's RDBMS supports a userland NFS client
option.  It would be very interesting to see if this does a separate
session per data file, possibly allowing for better load spreading.

 Note that with Sun Trunking there was an option to load balance using
 a round robin hashing algorithm.  When pushing high network loads this
 may cause performance problems with reassembly.

 Yes.  Reassembly is Evil for TCP performance.

 Btw, the iSCSI balancing act that was described does seem a bit
 contrived -- a single initiator and a COMSTAR server, both client *and
 server* with multiple ethernet links instead of a single 10GbE link.

 I'm not saying it doesn't happen, but I think it happens infrequently
 enough that its reasonable that this scenario wasn't one that popped
 immediately into my head. :-)

It depends on whether the people that control the network gear are the
same ones that control servers.  My experience suggests that if there
is a disconnect, it seems rather likely that each group's
standardization efforts, procurement cycles, and capacity plans will
work against any attempt to have an optimal configuration.

Also, it is rather common to have multiple 1 Gb links to servers going
to disparate switches so as to provide resilience in the face of
switch failures.  This is not unlike (at a block diagram level) the
architecture that you see in pretty much every SAN.  In such a
configuation, it is reasonable for people to expect that load
balancing will occur.

-- 
Mike Gerdts

Re: [zfs-discuss] Severe ZFS corruption, help needed.

2010-07-26 Thread Orvar Korvar
Have you posted on the FreeBSD forums?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mirrored raidz

2010-07-26 Thread Dav Banks
This may have been covered somewhere but I couldn't find it.

Is it possible to mirror two raidz vdevs? Like a RAID50 basically.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Serge Fonville
Hi,

 Is it possible to mirror two raidz vdevs? Like a RAID50 basically.

Raid 50 is striped...

basically:
zpool create tank raidz c0t0d0 c0t0d1 c0t0d2 raidz c1t0d0 c1t0d1 c0t0d2

Other than that, I believe it is not possible to create a mirrored
pool from raidz vdevs

Regards,

Serge Fonville

-- 
http://www.sergefonville.nl

Convince Google!!
They need to support Adsense over SSL
https://www.google.com/adsense/support/bin/answer.py?hl=enanswer=10528
http://www.google.com/support/forum/p/AdSense/thread?tid=1884bc9310d9f923hl=en
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Bob Friesenhahn

On Mon, 26 Jul 2010, Dav Banks wrote:


This may have been covered somewhere but I couldn't find it.

Is it possible to mirror two raidz vdevs? Like a RAID50 basically.


This config is not supported by zfs.  It should be possible to do 
though if you are really serious about it.  You can create two zfs 
zvols (volumes) which are hopefully in two different raidz-based zfs 
pools, and then create a new zfs pool using those two devices.  The 
end result would be three zfs pools.  It is probably not a wise idea 
to use this layered approach.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Dav Banks
Ah. Thanks! I should have said RAID51 - a mirror of RAID5 elements.

Thanks for the info. Bummer that it can't be done.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Saxon, Will

 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org 
 [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Dav Banks
 Sent: Monday, July 26, 2010 2:02 PM
 To: zfs-discuss@opensolaris.org
 Subject: [zfs-discuss] Mirrored raidz
 
 This may have been covered somewhere but I couldn't find it.
 
 Is it possible to mirror two raidz vdevs? Like a RAID50 basically.

RAID50 is not a mirror of RAID5s, but a stripset of RAID5s. RAID50 is analogous 
to multiple raidz vdevs in a single zpool. 

Mirrored RAID5s are not directly possible, as ZFS does not permit nested vdevs 
(i.e. a mirror vdev composed of raidz vdevs). 

I think you can make 2 separate zpools composed of single raidz vdevs, make 
zvols in those, then create a 3rd zpool with a mirror vdev of the zvols. 

-Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread David Magda
On Mon, July 26, 2010 14:17, Dav Banks wrote:
 Ah. Thanks! I should have said RAID51 - a mirror of RAID5 elements.

 Thanks for the info. Bummer that it can't be done.

Out of curiosity, any particular reason why you want to do this?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Cindy Swearingen
A small follow-up is that creating pools from components of other pools 
can cause system deadlocks.


This approach is not recommended.

Thanks,

Cindy

On 07/26/10 12:19, Saxon, Will wrote:

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Dav Banks

Sent: Monday, July 26, 2010 2:02 PM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Mirrored raidz

This may have been covered somewhere but I couldn't find it.

Is it possible to mirror two raidz vdevs? Like a RAID50 basically.


RAID50 is not a mirror of RAID5s, but a stripset of RAID5s. RAID50 is analogous to multiple raidz vdevs in a single zpool. 

Mirrored RAID5s are not directly possible, as ZFS does not permit nested vdevs (i.e. a mirror vdev composed of raidz vdevs). 

I think you can make 2 separate zpools composed of single raidz vdevs, make zvols in those, then create a 3rd zpool with a mirror vdev of the zvols. 


-Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Dav Banks
I wanted to test it as a backup solution. Maybe that's crazy in itself but I 
want to try it.

Basically, once a week detach the 'backup' pool from the mirror, replace the 
drives, add the new raidz to the mirror and let it resilver and sit for a week.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Alex Blewitt
On 26 Jul 2010, at 19:51, Dav Banks davba...@virginia.edu wrote:

 I wanted to test it as a backup solution. Maybe that's crazy in itself but I 
 want to try it.
 
 Basically, once a week detach the 'backup' pool from the mirror, replace the 
 drives, add the new raidz to the mirror and let it resilver and sit for a 
 week.

Why not do it the other way around? Create a pool which consists of mirrored 
pairs (or triples) of drives. You don't need raidz to make it appear that the 
pool is bigger and it will use disks in the pool appropriately. If you want to 
have more copies of data, set copies=2 and zfs will try to schedule writes 
across different mirrored pairs. 

Alex
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Cindy Swearingen

You might look at the zpool split feature, where you can
split off the disks from a mirrored pool to create an identical
pool, described here:

http://hub.opensolaris.org/bin/view/Community+Group+zfs/docs

ZFS Admin Guide, p. 87

Thanks,

Cindy

On 07/26/10 12:51, Dav Banks wrote:

I wanted to test it as a backup solution. Maybe that's crazy in itself but I 
want to try it.

Basically, once a week detach the 'backup' pool from the mirror, replace the 
drives, add the new raidz to the mirror and let it resilver and sit for a week.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-26 Thread Miles Nordin
 mg == Mike Gerdts mger...@gmail.com writes:
 sw == Saxon, Will will.sa...@sage.com writes:

sw I think there may be very good reason to use iSCSI, if you're
sw limited to gigabit but need to be able to handle higher
sw throughput for a single client.

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6817942

look at it now before it gets pulled back inside the wall. :(

I think this bug was posted on zfs-discuss earlier.  Please see the
comments because he is not using lagg's: even with a single 10Gbit/s
NIC, you cannot use the link well unless you take advantage of the
multiple MSI's and L4 preclass built into the NIC.  You need multiple
TCP circuits between client and server so that each will fire a
different MSI.  He got about 3x performance using 8 connections.

It sounds like NFS is already fixed for this, but requires manual
tuning of clnt_max_conns and the number of reader and writer threads.

mg it is rather common to have multiple 1 Gb links to
mg servers going to disparate switches so as to provide
mg resilience in the face of switch failures.  This is not unlike
mg (at a block diagram level) the architecture that you see in
mg pretty much every SAN.  In such a configuation, it is
mg reasonable for people to expect that load balancing will
mg occur.

nope.  spanning tree removes all loops, which means between any two
points there will be only one enabled path.  An L2-switched network
will look into L4 headers for splitting traffic across an aggregated
link (as long as it's been deliberately configured to do that---by
default probably only looks to L2), but it won't do any multipath
within the mesh.

Even with an L3 routing protocol it usually won't do multipath unless
the costs of the paths match exactly, so you'd want to build the
topology to achieve this and then do all switching at layer 3 by
making sure no VLAN is larger than a switch.

There's actually a cisco feature to make no VLAN larger than a *port*,
which I use a little bit.  It's meant for CATV networks I think, or
DSL networks aggregated by IP instead of ATM like maybe some European
ones?  but the idea is not to put edge ports into vlans any more but
instead say 'ip unnumbered loopbackN', and then some black magic they
have built into their DHCP forwarder adds /32 routes by watching the
DHCP replies.  If you don't use DHCP you can add static /32 routes
yourself, and it will work.  It does not help with IPv6, and also you
can only use it on vlan-tagged edge ports (what? arbitrary!) but
neat that it's there at all.

 http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gtunvlan.html

The best thing IMHO would be to use this feature on the edge ports,
just as I said, but you will have to teach the servers to VLAN-tag
their packets.  not such a bad idea, but weird.

You could also use it one hop up from the edge switches, but I think
it might have problems in general removing the routes when you unplug
a server, and using it one hop up could make them worse.  I only use
it with static routes so far, so no mobility for me: I have to keep
each server plugged into its assigned port, and reconfigure switches
if I move it.  Once you have ``no vlan larger than 1 switch,'' if you
actually need a vlan-like thing that spans multiple switches, the new
word for it is 'vrf'.

so, yeah, it means the server people will have to take over the job of
the networking people.  The good news is that networking people don't
like spanning tree very much because it's always going wrong, so
AFAICT most of them who are paying attention are already moving in
this direction.


pgpEDdDjwl9Ck.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Ross Walker
On Jul 26, 2010, at 2:51 PM, Dav Banks davba...@virginia.edu wrote:

 I wanted to test it as a backup solution. Maybe that's crazy in itself but I 
 want to try it.
 
 Basically, once a week detach the 'backup' pool from the mirror, replace the 
 drives, add the new raidz to the mirror and let it resilver and sit for a 
 week.

If that's the case why not create a second pool called 'backup' and 'zfs send' 
periodically to the backup pool?

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread David Magda
On Mon, July 26, 2010 14:51, Dav Banks wrote:
 I wanted to test it as a backup solution. Maybe that's crazy in itself but
 I want to try it.

 Basically, once a week detach the 'backup' pool from the mirror, replace
 the drives, add the new raidz to the mirror and let it resilver and sit
 for a week.

While a neat solution, I think you'd be better off using incremental
send/recv functionality for backups. Having an online backup really
isn't a true backup IMHO. It's too easy to fat finger something and then
you're hosed as the change was replicated in real-time to both sides of
the mirror (though this is mitigated a bit if you automatically take
regular snapshots).

Mirroring is (IMHO) for up time and insurance against hardware failure.
Backups are /independent/ copies of data that are insurance something
happening to your primary copy.

You could do the same thing with a separate pool and send/recv, without
taking the hit on write IOps from the second half of the mirror: basically
async replication instead of synchronous.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Eric D. Mudama

On Mon, Jul 26 at 11:51, Dav Banks wrote:

I wanted to test it as a backup solution. Maybe that's crazy in
itself but I want to try it.

Basically, once a week detach the 'backup' pool from the mirror,
replace the drives, add the new raidz to the mirror and let it
resilver and sit for a week.


Since you're already spending the disk drives for this that get
detached, it seems safer to me to just 'zfs send' to a minimal backup
system, and remove the extra drives from your primary server.  Less
overhead and the scrub can validate your backup copy at whatever
frequency you choose.

You don't even need the same pool layout on the backup machine.
Primary can be a stripe of mirrors, while your backup can be a wide
raidz2 setup.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Maurice Volaski

 It should be possible to do
though if you are really serious about it.  You can create two zfs
zvols (volumes) which are hopefully in two different raidz-based zfs
pools, and then create a new zfs pool using those two devices.  The
end result would be three zfs pools.  It is probably not a wise idea
to use this layered approach.




A small follow-up is that creating pools from components of other pools
can cause system deadlocks.


One can make the zvols iSCSI targets and then attach them to the 
local initiator. This works and, indeed, it's a way to mirror storage 
across a network.

--

Maurice Volaski, maurice.vola...@einstein.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-26 Thread Mike Gerdts
On Mon, Jul 26, 2010 at 2:56 PM, Miles Nordin car...@ivy.net wrote:
 mg == Mike Gerdts mger...@gmail.com writes:
    mg it is rather common to have multiple 1 Gb links to
    mg servers going to disparate switches so as to provide
    mg resilience in the face of switch failures.  This is not unlike
    mg (at a block diagram level) the architecture that you see in
    mg pretty much every SAN.  In such a configuation, it is
    mg reasonable for people to expect that load balancing will
    mg occur.

 nope.  spanning tree removes all loops, which means between any two
 points there will be only one enabled path.  An L2-switched network
 will look into L4 headers for splitting traffic across an aggregated
 link (as long as it's been deliberately configured to do that---by
 default probably only looks to L2), but it won't do any multipath
 within the mesh.

I was speaking more of IPMP, which is at layer 3.

 Even with an L3 routing protocol it usually won't do multipath unless
 the costs of the paths match exactly, so you'd want to build the
 topology to achieve this and then do all switching at layer 3 by
 making sure no VLAN is larger than a switch.

By default, IPMP does outbound load spreading.  Inbound load spreading
is not practical with a single (non-test) IP address.  If you have
multiple virtual IP's you can spread them across all of the NICs in
the IPMP group and get some degree of inbound spreading as well.  This
is the default behavior of the OpenSolaris IPMP implementation, last I
looked.  I've not seen any examples (although I can't say I've looked
real hard either) of the Solaris 10 IPMP configuration set up with
multipe IP's to encourage inbound load spreading as well.


 There's actually a cisco feature to make no VLAN larger than a *port*,
 which I use a little bit.  It's meant for CATV networks I think, or
 DSL networks aggregated by IP instead of ATM like maybe some European
 ones?  but the idea is not to put edge ports into vlans any more but
 instead say 'ip unnumbered loopbackN', and then some black magic they
 have built into their DHCP forwarder adds /32 routes by watching the
 DHCP replies.  If you don't use DHCP you can add static /32 routes
 yourself, and it will work.  It does not help with IPv6, and also you
 can only use it on vlan-tagged edge ports (what? arbitrary!) but
 neat that it's there at all.

  http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gtunvlan.html

Interesting... however this seems to limit you to  4096 edge ports
per VTP domain, as the VID field in the 802.1q header is only 12 bits.
 It is also unclear how this works when you have one physical host
with many guests.  And then there is the whole thing that I don't
really see how this helps with resilience in the face of a switch
failure.  Cool technology, but I'm not certain that it addresses what
I was talking about.


 The best thing IMHO would be to use this feature on the edge ports,
 just as I said, but you will have to teach the servers to VLAN-tag
 their packets.  not such a bad idea, but weird.

 You could also use it one hop up from the edge switches, but I think
 it might have problems in general removing the routes when you unplug
 a server, and using it one hop up could make them worse.  I only use
 it with static routes so far, so no mobility for me: I have to keep
 each server plugged into its assigned port, and reconfigure switches
 if I move it.  Once you have ``no vlan larger than 1 switch,'' if you
 actually need a vlan-like thing that spans multiple switches, the new
 word for it is 'vrf'.

There was some other Cisco dark magic that our network guys were
touting a while ago that would make each edge switch look like a blade
in a 6500 series.  This would then allow them to do link aggregation
across edge switches.  At least two of organizational changes,
personnel changes, and roadmap changes happened so I've not seen
this in action.


 so, yeah, it means the server people will have to take over the job of
 the networking people.  The good news is that networking people don't
 like spanning tree very much because it's always going wrong, so
 AFAICT most of them who are paying attention are already moving in
 this direction.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS where to go!

2010-07-26 Thread James
I might be mistaken, but it looks like 3ware does have a driver, several in 
fact:

http://www.3ware.com/support/downloadpageprod.asp?pcode=9path=Escalade9500SSeriesprodname=3ware%209500S%20Series

Any comment on this?  I'm thinking about picking up a server with this card, 
and it would be cool if it worked.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FreeBSD 8.1 out, has zfs vserion 14 and can boot from zfs

2010-07-26 Thread Peter Jeremy
On 2010-Jul-26 20:32:41 +0800, Eugen Leitl eu...@leitl.org wrote:
FreeBSD 8.1 features version 14 of the ZFS subsystem, the addition of the ZFS
Loader (zfsloader), allowing users to boot from ZFS,

Only on i386 or amd64 systems at present, but you can boot RAIDZ1 and
RAIDZ2 as well as mirrored roots.

Note that ZFS v15 has been integrated into the development branches
(-current and 8-stable) and will be in FreeBSD 8.2 (or you can run it
now by compiling FreeBSD yourself - unlike OpenSolaris, the full build
process is documented and everything necessary is on the release DVDs
or can be downloaded).

See http://www.freebsd.org/releases/8.1R/announce.html

-- 
Peter Jeremy


pgppFbh5U0Jj5.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Severe ZFS corruption, help needed.

2010-07-26 Thread Voloymyr Kostyrko
Nope, mailed freebsd-fs mailing list.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored raidz

2010-07-26 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Ross Walker
 
 If that's the case why not create a second pool called 'backup' and
 'zfs send' periodically to the backup pool?

+1

This is what I do.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss