Re: [zfs-discuss] NFS performance?
On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote: On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore garr...@nexenta.com wrote: On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote: I think there may be very good reason to use iSCSI, if you're limited to gigabit but need to be able to handle higher throughput for a single client. I may be wrong, but I believe iSCSI to/from a single initiator can take advantage of multiple links in an active-active multipath scenario whereas NFS is only going to be able to take advantage of 1 link (at least until pNFS). There are other ways to get multiple paths. First off, there is IP multipathing. which offers some of this at the IP layer. There is also 802.3ad link aggregation (trunking). So you can still get high performance beyond single link with NFS. (It works with iSCSI too, btw.) With both IPMP and link aggregation, each TCP session will go over the same wire. There is no guarantee that load will be evenly balanced between links when there are multiple TCP sessions. As such, any scalability you get using these configurations will be dependent on having a complex enough workload, wise cconfiguration choices, and and a bit of luck. If you're really that concerned, you could use UDP instead of TCP. But that may have other detrimental performance impacts, I'm not sure how bad they would be in a data center with generally lossless ethernet links. Btw, I am not certain that the multiple initiator support (mpxio) is necessarily any better as far as guaranteed performance/balancing. (It may be; I've not looked closely enough at it.) I should look more closely at NFS as well -- if multiple applications on the same client are access the same filesystem, do they use a single common TCP session, or can they each have separate instances open? Again, I'm not sure. Note that with Sun Trunking there was an option to load balance using a round robin hashing algorithm. When pushing high network loads this may cause performance problems with reassembly. Yes. Reassembly is Evil for TCP performance. Btw, the iSCSI balancing act that was described does seem a bit contrived -- a single initiator and a COMSTAR server, both client *and server* with multiple ethernet links instead of a single 10GbE link. I'm not saying it doesn't happen, but I think it happens infrequently enough that its reasonable that this scenario wasn't one that popped immediately into my head. :-) - Garrett ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Volume Issue
I 've a ZFS volume exported to one of my Ldom .. but now the Ldom does not see the data and complaing missing device .. is there any way i can mount or see what in the volume ... or check if the volume got corrupted or some other issue ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fails to import zpool
Ok I played around with the physical configuration and placed them on the original controller and zdb -l is now able to unpack LABEL 0,1,2,3 for all drives in the pool. I also changed the hostname in opensolaris to freenas.local as that is what was listed in the zdb -l(although I doubt this matters). The new setup looks like this: @freenas:~/dskp0s# zpool import pool: Raidz id: 14119036174566039103 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-72 config: Raidz FAULTED corrupted data raidz1-0FAULTED corrupted data c5t0d0p0 ONLINE c5t1d0p0 ONLINE c5t2d0s2 ONLINE c5t3d0p0 ONLINE c5t4d0p0 ONLINE c5t5d0p0 ONLINE @freenas:~/dskp0s# ls -l /dev/dsk/c5* lrwxrwxrwx 1 root root 62 Jul 23 08:05 /dev/dsk/c5t0d0p0 - ../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@0,0:q lrwxrwxrwx 1 root root 62 Jul 23 08:05 /dev/dsk/c5t1d0p0 - ../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@1,0:q lrwxrwxrwx 1 root root 62 Jul 23 08:05 /dev/dsk/c5t2d0s2 - ../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@2,0:c lrwxrwxrwx 1 root root 62 Jul 23 08:05 /dev/dsk/c5t3d0p0 - ../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@3,0:q lrwxrwxrwx 1 root root 62 Jul 23 08:05 /dev/dsk/c5t4d0p0 - ../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@4,0:q lrwxrwxrwx 1 root root 62 Jul 23 08:05 /dev/dsk/c5t5d0p0 - ../../devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@5,0:q The output of the above command was edited to show only the devices listed in the pool. I then made symlinks to the devices directly as follows in a dir call /dskp0s @freenas:~/dskp0s# ls -l total 17 lrwxrwxrwx 1 root root 57 Jul 23 08:40 aacdu0 - /devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@0,0:q lrwxrwxrwx 1 root root 57 Jul 23 08:40 aacdu1 - /devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@1,0:q lrwxrwxrwx 1 root root 57 Jul 23 08:40 aacdu2 - /devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@2,0:q lrwxrwxrwx 1 root root 57 Jul 23 08:40 aacdu3 - /devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@3,0:q lrwxrwxrwx 1 root root 57 Jul 23 08:41 aacdu4 - /devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@4,0:q lrwxrwxrwx 1 root root 57 Jul 23 08:41 aacdu5 - /devices/p...@0,0/pci8086,2...@1e/pci9005,2...@2/d...@5,0:q -rw-r--r-- 1 root root1992 Jul 26 07:30 zpool.cache Note: aacdu2 symlink was linked to d...@2,0:q instead of d...@2,0:c because in FreeNAS the disks should be identical(maybe this is part of the problem)? zdb -l completes with either symlink. then I ran these commands from /dskp0s directory @freenas:~/dskp0s# zpool import Raidz cannot import 'Raidz': pool may be in use from other system use '-f' to import anyway @freenas:~/dskp0s# zpool import -d . Raidz cannot import 'Raidz': pool may be in use from other system use '-f' to import anyway @freenas:~/dskp0s# zpool import -f Raidz cannot import 'Raidz': one or more devices is currently unavailable Destroy and re-create the pool from a backup source. @freenas:~/dskp0s# zpool import -d . -f Raidz cannot import 'Raidz': one or more devices is currently unavailable Destroy and re-create the pool from a backup source. @freenas:~/dskp0s# zpool import -F Raidz cannot import 'Raidz': pool may be in use from other system use '-f' to import anyway @freenas:~/dskp0s# zpool import -d . -F Raidz cannot import 'Raidz': pool may be in use from other system use '-f' to import anyway @freenas:~/dskp0s# zdb -l aacdu0 LABEL 0 version: 6 name: 'Raidz' state: 0 txg: 11730350 pool_guid: 14119036174566039103 hostid: 0 hostname: 'freenas.local' top_guid: 16879648846521942561 guid: 6543046729241888600 vdev_tree: type: 'raidz' id: 0 guid: 16879648846521942561 nparity: 1 metaslab_array: 14 metaslab_shift: 32 ashift: 9 asize: 6000992059392 children[0]: type: 'disk' id: 0 guid: 6543046729241888600 path: '/dev/aacdu0' whole_disk: 0 children[1]: type: 'disk' id: 1 guid: 14313209149820231630 path: '/dev/aacdu1' whole_disk: 0 children[2]: type: 'disk' id: 2 guid: 5383435113781649515 path: '/dev/aacdu2'
Re: [zfs-discuss] NFS performance?
On Mon, Jul 26, 2010 at 1:27 AM, Garrett D'Amore garr...@nexenta.com wrote: On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote: On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore garr...@nexenta.com wrote: On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote: I think there may be very good reason to use iSCSI, if you're limited to gigabit but need to be able to handle higher throughput for a single client. I may be wrong, but I believe iSCSI to/from a single initiator can take advantage of multiple links in an active-active multipath scenario whereas NFS is only going to be able to take advantage of 1 link (at least until pNFS). There are other ways to get multiple paths. First off, there is IP multipathing. which offers some of this at the IP layer. There is also 802.3ad link aggregation (trunking). So you can still get high performance beyond single link with NFS. (It works with iSCSI too, btw.) With both IPMP and link aggregation, each TCP session will go over the same wire. There is no guarantee that load will be evenly balanced between links when there are multiple TCP sessions. As such, any scalability you get using these configurations will be dependent on having a complex enough workload, wise cconfiguration choices, and and a bit of luck. If you're really that concerned, you could use UDP instead of TCP. But that may have other detrimental performance impacts, I'm not sure how bad they would be in a data center with generally lossless ethernet links. Heh. My horror story with reassembly was actually with connectionless transports (LLT, then UDP). Oracle RAC's cache fusion sends 8 KB blocks via UDP by default, or LLT when used in the Veritas + Oracle RAC certified configuration from 5+ years ago. The use of Sun trunking with round robin hashing and the lack of use of jumbo packets made every cache fusion block turn into 6 LLT or UDP packets that had to be reassembled on the other end. This was on a 15K domain with the NICs spread across IO boards. I assume that interrupts for a NIC are handled by a CPU on the closest system board (Solaris 8, FWIW). If that assumption is true then there would also be a flurry of inter-system board chatter to put the block back together. In any case, performance was horrible until we got rid of round robin and enabled jumbo frames. Btw, I am not certain that the multiple initiator support (mpxio) is necessarily any better as far as guaranteed performance/balancing. (It may be; I've not looked closely enough at it.) I haven't paid close attention to how mpxio works. The Veritas analog, vxdmp, does a very good job of balancing traffic down multiple paths, even when only a single LUN is accessed. The exact mode that dmp will use is dependent on the capabilities of the array it is talking to - many arrays work in an active/passive mode. As such, I would expect that with vxdmp or mpxio the balancing with iSCSI would be at least partially dependent on what the array said to do. I should look more closely at NFS as well -- if multiple applications on the same client are access the same filesystem, do they use a single common TCP session, or can they each have separate instances open? Again, I'm not sure. It's worse than that. A quick experiment with two different automounted home directories from the same NFS server suggests that both home directories share one TCP session to the NFS server. The latest version of Oracle's RDBMS supports a userland NFS client option. It would be very interesting to see if this does a separate session per data file, possibly allowing for better load spreading. Note that with Sun Trunking there was an option to load balance using a round robin hashing algorithm. When pushing high network loads this may cause performance problems with reassembly. Yes. Reassembly is Evil for TCP performance. Btw, the iSCSI balancing act that was described does seem a bit contrived -- a single initiator and a COMSTAR server, both client *and server* with multiple ethernet links instead of a single 10GbE link. I'm not saying it doesn't happen, but I think it happens infrequently enough that its reasonable that this scenario wasn't one that popped immediately into my head. :-) It depends on whether the people that control the network gear are the same ones that control servers. My experience suggests that if there is a disconnect, it seems rather likely that each group's standardization efforts, procurement cycles, and capacity plans will work against any attempt to have an optimal configuration. Also, it is rather common to have multiple 1 Gb links to servers going to disparate switches so as to provide resilience in the face of switch failures. This is not unlike (at a block diagram level) the architecture that you see in pretty much every SAN. In such a configuation, it is reasonable for people to expect that load balancing will occur. -- Mike Gerdts
Re: [zfs-discuss] Severe ZFS corruption, help needed.
Have you posted on the FreeBSD forums? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Mirrored raidz
This may have been covered somewhere but I couldn't find it. Is it possible to mirror two raidz vdevs? Like a RAID50 basically. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
Hi, Is it possible to mirror two raidz vdevs? Like a RAID50 basically. Raid 50 is striped... basically: zpool create tank raidz c0t0d0 c0t0d1 c0t0d2 raidz c1t0d0 c1t0d1 c0t0d2 Other than that, I believe it is not possible to create a mirrored pool from raidz vdevs Regards, Serge Fonville -- http://www.sergefonville.nl Convince Google!! They need to support Adsense over SSL https://www.google.com/adsense/support/bin/answer.py?hl=enanswer=10528 http://www.google.com/support/forum/p/AdSense/thread?tid=1884bc9310d9f923hl=en ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
On Mon, 26 Jul 2010, Dav Banks wrote: This may have been covered somewhere but I couldn't find it. Is it possible to mirror two raidz vdevs? Like a RAID50 basically. This config is not supported by zfs. It should be possible to do though if you are really serious about it. You can create two zfs zvols (volumes) which are hopefully in two different raidz-based zfs pools, and then create a new zfs pool using those two devices. The end result would be three zfs pools. It is probably not a wise idea to use this layered approach. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
Ah. Thanks! I should have said RAID51 - a mirror of RAID5 elements. Thanks for the info. Bummer that it can't be done. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
-Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Dav Banks Sent: Monday, July 26, 2010 2:02 PM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Mirrored raidz This may have been covered somewhere but I couldn't find it. Is it possible to mirror two raidz vdevs? Like a RAID50 basically. RAID50 is not a mirror of RAID5s, but a stripset of RAID5s. RAID50 is analogous to multiple raidz vdevs in a single zpool. Mirrored RAID5s are not directly possible, as ZFS does not permit nested vdevs (i.e. a mirror vdev composed of raidz vdevs). I think you can make 2 separate zpools composed of single raidz vdevs, make zvols in those, then create a 3rd zpool with a mirror vdev of the zvols. -Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
On Mon, July 26, 2010 14:17, Dav Banks wrote: Ah. Thanks! I should have said RAID51 - a mirror of RAID5 elements. Thanks for the info. Bummer that it can't be done. Out of curiosity, any particular reason why you want to do this? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
A small follow-up is that creating pools from components of other pools can cause system deadlocks. This approach is not recommended. Thanks, Cindy On 07/26/10 12:19, Saxon, Will wrote: -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Dav Banks Sent: Monday, July 26, 2010 2:02 PM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Mirrored raidz This may have been covered somewhere but I couldn't find it. Is it possible to mirror two raidz vdevs? Like a RAID50 basically. RAID50 is not a mirror of RAID5s, but a stripset of RAID5s. RAID50 is analogous to multiple raidz vdevs in a single zpool. Mirrored RAID5s are not directly possible, as ZFS does not permit nested vdevs (i.e. a mirror vdev composed of raidz vdevs). I think you can make 2 separate zpools composed of single raidz vdevs, make zvols in those, then create a 3rd zpool with a mirror vdev of the zvols. -Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
I wanted to test it as a backup solution. Maybe that's crazy in itself but I want to try it. Basically, once a week detach the 'backup' pool from the mirror, replace the drives, add the new raidz to the mirror and let it resilver and sit for a week. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
On 26 Jul 2010, at 19:51, Dav Banks davba...@virginia.edu wrote: I wanted to test it as a backup solution. Maybe that's crazy in itself but I want to try it. Basically, once a week detach the 'backup' pool from the mirror, replace the drives, add the new raidz to the mirror and let it resilver and sit for a week. Why not do it the other way around? Create a pool which consists of mirrored pairs (or triples) of drives. You don't need raidz to make it appear that the pool is bigger and it will use disks in the pool appropriately. If you want to have more copies of data, set copies=2 and zfs will try to schedule writes across different mirrored pairs. Alex ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
You might look at the zpool split feature, where you can split off the disks from a mirrored pool to create an identical pool, described here: http://hub.opensolaris.org/bin/view/Community+Group+zfs/docs ZFS Admin Guide, p. 87 Thanks, Cindy On 07/26/10 12:51, Dav Banks wrote: I wanted to test it as a backup solution. Maybe that's crazy in itself but I want to try it. Basically, once a week detach the 'backup' pool from the mirror, replace the drives, add the new raidz to the mirror and let it resilver and sit for a week. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance?
mg == Mike Gerdts mger...@gmail.com writes: sw == Saxon, Will will.sa...@sage.com writes: sw I think there may be very good reason to use iSCSI, if you're sw limited to gigabit but need to be able to handle higher sw throughput for a single client. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6817942 look at it now before it gets pulled back inside the wall. :( I think this bug was posted on zfs-discuss earlier. Please see the comments because he is not using lagg's: even with a single 10Gbit/s NIC, you cannot use the link well unless you take advantage of the multiple MSI's and L4 preclass built into the NIC. You need multiple TCP circuits between client and server so that each will fire a different MSI. He got about 3x performance using 8 connections. It sounds like NFS is already fixed for this, but requires manual tuning of clnt_max_conns and the number of reader and writer threads. mg it is rather common to have multiple 1 Gb links to mg servers going to disparate switches so as to provide mg resilience in the face of switch failures. This is not unlike mg (at a block diagram level) the architecture that you see in mg pretty much every SAN. In such a configuation, it is mg reasonable for people to expect that load balancing will mg occur. nope. spanning tree removes all loops, which means between any two points there will be only one enabled path. An L2-switched network will look into L4 headers for splitting traffic across an aggregated link (as long as it's been deliberately configured to do that---by default probably only looks to L2), but it won't do any multipath within the mesh. Even with an L3 routing protocol it usually won't do multipath unless the costs of the paths match exactly, so you'd want to build the topology to achieve this and then do all switching at layer 3 by making sure no VLAN is larger than a switch. There's actually a cisco feature to make no VLAN larger than a *port*, which I use a little bit. It's meant for CATV networks I think, or DSL networks aggregated by IP instead of ATM like maybe some European ones? but the idea is not to put edge ports into vlans any more but instead say 'ip unnumbered loopbackN', and then some black magic they have built into their DHCP forwarder adds /32 routes by watching the DHCP replies. If you don't use DHCP you can add static /32 routes yourself, and it will work. It does not help with IPv6, and also you can only use it on vlan-tagged edge ports (what? arbitrary!) but neat that it's there at all. http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gtunvlan.html The best thing IMHO would be to use this feature on the edge ports, just as I said, but you will have to teach the servers to VLAN-tag their packets. not such a bad idea, but weird. You could also use it one hop up from the edge switches, but I think it might have problems in general removing the routes when you unplug a server, and using it one hop up could make them worse. I only use it with static routes so far, so no mobility for me: I have to keep each server plugged into its assigned port, and reconfigure switches if I move it. Once you have ``no vlan larger than 1 switch,'' if you actually need a vlan-like thing that spans multiple switches, the new word for it is 'vrf'. so, yeah, it means the server people will have to take over the job of the networking people. The good news is that networking people don't like spanning tree very much because it's always going wrong, so AFAICT most of them who are paying attention are already moving in this direction. pgpEDdDjwl9Ck.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
On Jul 26, 2010, at 2:51 PM, Dav Banks davba...@virginia.edu wrote: I wanted to test it as a backup solution. Maybe that's crazy in itself but I want to try it. Basically, once a week detach the 'backup' pool from the mirror, replace the drives, add the new raidz to the mirror and let it resilver and sit for a week. If that's the case why not create a second pool called 'backup' and 'zfs send' periodically to the backup pool? -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
On Mon, July 26, 2010 14:51, Dav Banks wrote: I wanted to test it as a backup solution. Maybe that's crazy in itself but I want to try it. Basically, once a week detach the 'backup' pool from the mirror, replace the drives, add the new raidz to the mirror and let it resilver and sit for a week. While a neat solution, I think you'd be better off using incremental send/recv functionality for backups. Having an online backup really isn't a true backup IMHO. It's too easy to fat finger something and then you're hosed as the change was replicated in real-time to both sides of the mirror (though this is mitigated a bit if you automatically take regular snapshots). Mirroring is (IMHO) for up time and insurance against hardware failure. Backups are /independent/ copies of data that are insurance something happening to your primary copy. You could do the same thing with a separate pool and send/recv, without taking the hit on write IOps from the second half of the mirror: basically async replication instead of synchronous. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
On Mon, Jul 26 at 11:51, Dav Banks wrote: I wanted to test it as a backup solution. Maybe that's crazy in itself but I want to try it. Basically, once a week detach the 'backup' pool from the mirror, replace the drives, add the new raidz to the mirror and let it resilver and sit for a week. Since you're already spending the disk drives for this that get detached, it seems safer to me to just 'zfs send' to a minimal backup system, and remove the extra drives from your primary server. Less overhead and the scrub can validate your backup copy at whatever frequency you choose. You don't even need the same pool layout on the backup machine. Primary can be a stripe of mirrors, while your backup can be a wide raidz2 setup. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
It should be possible to do though if you are really serious about it. You can create two zfs zvols (volumes) which are hopefully in two different raidz-based zfs pools, and then create a new zfs pool using those two devices. The end result would be three zfs pools. It is probably not a wise idea to use this layered approach. A small follow-up is that creating pools from components of other pools can cause system deadlocks. One can make the zvols iSCSI targets and then attach them to the local initiator. This works and, indeed, it's a way to mirror storage across a network. -- Maurice Volaski, maurice.vola...@einstein.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance?
On Mon, Jul 26, 2010 at 2:56 PM, Miles Nordin car...@ivy.net wrote: mg == Mike Gerdts mger...@gmail.com writes: mg it is rather common to have multiple 1 Gb links to mg servers going to disparate switches so as to provide mg resilience in the face of switch failures. This is not unlike mg (at a block diagram level) the architecture that you see in mg pretty much every SAN. In such a configuation, it is mg reasonable for people to expect that load balancing will mg occur. nope. spanning tree removes all loops, which means between any two points there will be only one enabled path. An L2-switched network will look into L4 headers for splitting traffic across an aggregated link (as long as it's been deliberately configured to do that---by default probably only looks to L2), but it won't do any multipath within the mesh. I was speaking more of IPMP, which is at layer 3. Even with an L3 routing protocol it usually won't do multipath unless the costs of the paths match exactly, so you'd want to build the topology to achieve this and then do all switching at layer 3 by making sure no VLAN is larger than a switch. By default, IPMP does outbound load spreading. Inbound load spreading is not practical with a single (non-test) IP address. If you have multiple virtual IP's you can spread them across all of the NICs in the IPMP group and get some degree of inbound spreading as well. This is the default behavior of the OpenSolaris IPMP implementation, last I looked. I've not seen any examples (although I can't say I've looked real hard either) of the Solaris 10 IPMP configuration set up with multipe IP's to encourage inbound load spreading as well. There's actually a cisco feature to make no VLAN larger than a *port*, which I use a little bit. It's meant for CATV networks I think, or DSL networks aggregated by IP instead of ATM like maybe some European ones? but the idea is not to put edge ports into vlans any more but instead say 'ip unnumbered loopbackN', and then some black magic they have built into their DHCP forwarder adds /32 routes by watching the DHCP replies. If you don't use DHCP you can add static /32 routes yourself, and it will work. It does not help with IPv6, and also you can only use it on vlan-tagged edge ports (what? arbitrary!) but neat that it's there at all. http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gtunvlan.html Interesting... however this seems to limit you to 4096 edge ports per VTP domain, as the VID field in the 802.1q header is only 12 bits. It is also unclear how this works when you have one physical host with many guests. And then there is the whole thing that I don't really see how this helps with resilience in the face of a switch failure. Cool technology, but I'm not certain that it addresses what I was talking about. The best thing IMHO would be to use this feature on the edge ports, just as I said, but you will have to teach the servers to VLAN-tag their packets. not such a bad idea, but weird. You could also use it one hop up from the edge switches, but I think it might have problems in general removing the routes when you unplug a server, and using it one hop up could make them worse. I only use it with static routes so far, so no mobility for me: I have to keep each server plugged into its assigned port, and reconfigure switches if I move it. Once you have ``no vlan larger than 1 switch,'' if you actually need a vlan-like thing that spans multiple switches, the new word for it is 'vrf'. There was some other Cisco dark magic that our network guys were touting a while ago that would make each edge switch look like a blade in a 6500 series. This would then allow them to do link aggregation across edge switches. At least two of organizational changes, personnel changes, and roadmap changes happened so I've not seen this in action. so, yeah, it means the server people will have to take over the job of the networking people. The good news is that networking people don't like spanning tree very much because it's always going wrong, so AFAICT most of them who are paying attention are already moving in this direction. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS where to go!
I might be mistaken, but it looks like 3ware does have a driver, several in fact: http://www.3ware.com/support/downloadpageprod.asp?pcode=9path=Escalade9500SSeriesprodname=3ware%209500S%20Series Any comment on this? I'm thinking about picking up a server with this card, and it would be cool if it worked. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FreeBSD 8.1 out, has zfs vserion 14 and can boot from zfs
On 2010-Jul-26 20:32:41 +0800, Eugen Leitl eu...@leitl.org wrote: FreeBSD 8.1 features version 14 of the ZFS subsystem, the addition of the ZFS Loader (zfsloader), allowing users to boot from ZFS, Only on i386 or amd64 systems at present, but you can boot RAIDZ1 and RAIDZ2 as well as mirrored roots. Note that ZFS v15 has been integrated into the development branches (-current and 8-stable) and will be in FreeBSD 8.2 (or you can run it now by compiling FreeBSD yourself - unlike OpenSolaris, the full build process is documented and everything necessary is on the release DVDs or can be downloaded). See http://www.freebsd.org/releases/8.1R/announce.html -- Peter Jeremy pgppFbh5U0Jj5.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Severe ZFS corruption, help needed.
Nope, mailed freebsd-fs mailing list. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored raidz
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Ross Walker If that's the case why not create a second pool called 'backup' and 'zfs send' periodically to the backup pool? +1 This is what I do. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss