Torrey McMahon wrote:
> AVS?
>   
Jim Dunham will probably shoot me, or worse, but I recommend thinking 
twice about using AVS for ZFS replication. Basically, you only have a 
few options:

 1) Using a battery buffered hardware RAID controller, which leads to 
bad ZFS performance in many cases,
 2) Buildung up Three-Way-Mirrors to avoid complete data loss in several 
desaster scenarios due to missing ZFS recovery mechanisms like `fsck`, 
which makes AVS/ZFS based solutions quite expensive,
 3) Additionally using another form of backup, e.g. tapes.

For instance, one scenario which made me think: Imagine you have a 
X4500. 48 internal disks, 500 GB each. This would lead to ZFS pool on 40 
disks (you need 1 for the system, plus 3x RAID 10 for the bitmap 
volumes, otherwise your performance will be very bad, plus 2x HSP). 
Using 40 disks leads to a total of 40 separate replications. Now imagine 
the following two scenarios:

a) A disk in the primary fails. What happens? A HSP jumps in and 500 GB 
will be rebuilt. These 500 GB are synced over a single 1 GBit/s 
crossover cable. This takes a bit of time and is 100% unnecessary - and 
it will become much worse in the future, because the disk capacities 
rocket up into the sky, while the performance isn't improved as much. 
During this time, your service misses redundancy. And we're not talking 
about some minutes during this time. Well, and now try to imagine what 
will happen if another disks fails during this rebuild, this time in the 
secondary ...

b) A disk in the secondary fails. What happens now? No HSP will jump in 
on the secondary, because the zpool isn't imported and ZFS doesn't know 
about the failure.  Instead, you'll end up with 39 active replications 
instead of 40. The one which replicates to the failed drive will become 
inactive. But ... oh damn, the zpool isn't mounted on the secondary 
host, so ZFS doesn't report the drive failure to our server monitoring. 
That can be funny. The only way to get aware of the problem I found 
after a minute of thinking was asking sndradm about the health status - 
which would lead to a  false alarm on Host A, because the failed disc is 
in Host B, and operators are usually not bright enough to change the 
disc in Host B after they get notified about a problem on Host B. But 
even if everything works,  what will if the primary fails before an 
administrator fixed the problem, the missing replication is running 
again and the replacement disc has been completely synced? "Hello, 
kernel panic", and "Goodbye, 12 TB of data").

c) You *must* force every single `zfs import <zpool>` on the secondary 
host. Always. Because you usually need your secondary host after your 
primary crashed. You won't have the chance to export your zpool on the 
primary first - and if you do, you don't need AVS at all. Bring some 
Kleenex to get rid of the sweat on your forehead when you have to switch 
to your secondary host, because a single mistake (like forgetting to put 
the secondary host into logging mode manually before you try to import 
the zpool) will lead to a complete data loss. I bet you won't even trust 
your own failover scripts.

Use AVS and ZFS together. I use it myself. But I made sure that I know 
what I'm doing. Most people probably don't.

Btw: I have to admit that I haven't tried the newst nevada builds during 
the tests. It's possible that AVS and ZFS work better together than they 
did under Solaris 10 11/06 and AVS 4.0. But there's a reason I haven't 
tried. It's because Sun Cluster 3.2 instantly crashes on Thumpers, 
SATA-related kernel panics, and the OpenHA Cluster isn't available yet.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963 
[EMAIL PROTECTED] - http://web.de/

1&1 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, 
Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss 
Aufsichtsratsvorsitzender: Michael Scheeren

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to