Hi,

I'm struggling to get a stable ZFS replication using Solaris 10 110/06 
(actual patches) and AVS 4.0  for several weeks now. We tried it on 
VMware first and ended up in kernel panics en masse (yes, we read Jim 
Dunham's blog articles :-). Now we try on the real thing, two X4500 
servers. Well, I have no trouble replicating our kernel panics there, 
too ... but I think I learned some important things, too. But one 
problem is still remaining.

I have a zpool on host A. Replication to host B works fine.

* "zpool export tank" on the primary - works.
* "sndradm -d" on both servers - works (paranoia mode)
* "zpool import <id>" on the secondary - works.

So far, so good. I chance the contents of the file system, add some 
files, delete some others ... no problems. The secondary is in 
production use now, everything is fine.

Okay, let's imagine I switched to the secodary host because had a 
problem with the primary. Now it's repaired, now I want my redundancy back.

* "sndradm -E -f ...." on both hosts - works.
* "sndradm -u -r" on the primary for refreshing the primary - works. 
`nicstat` shows me a bit of traffic.

Good, let's switch back to the primary. Actual status: zpool is imported 
on the secondary and NOT imported on the primary.

* "zpool export tank" on the secondary - *kernel panic*

Sadly, the machine dies fast, I don't see the kernel panic with `dmesg`. 
And disabling the replication again later and mounting the zpool on the 
primary again shows me that the update sync didn't take place, the file 
system changes I did on the secondary wren't replicated. Exporting the 
zpool on the secondary works *after* the system rebooted.

I uses slices for the zpool, not LUNs, because I think many of my 
problems were caused by exclusive locking, but it doesn't help with this 
one.

Questions:

a) I don't understand why the kernel panics at the moment. the zpool 
isn't mounted on both systems, the zpool itself seems to be fine after a 
reboot ... and switching the primary and secondary hosts just for 
resyncing seems to force a full sync, which isn't an option.

b) I'll try a "sndradm -m -r" the next time ... but I'm not sure if I 
like that thought. I would accept this if I replaced the primary host 
with another server, but having to do a 24 TB full sync just because the 
replication itself had been disabled for a few minutes would be hard to 
swallow. Or did I do something wrong?

c) What performance can I expect from a X4500, 40 disks zpool, when 
using slices, compared to LUNs? Any experiences?

And another thing: I did some experiments with zvols, because I wanted 
to make desasters and the AVS configuration itself easier to handle - 
there won't be a full sync after replacing a disk because AVS doesn't 
"see" that a hot spare is being used, and hot spares won't be replicated 
to the secondary host as well although the original drive on the 
secondary never failed.  I used the zvol with UFS and this kind of 
"hardware RAID controller emulation by ZFS" works pretty well, just the 
performance went down the cliff. Sunsolve told me that this is a 
flushing problem and there's a workaround in Nevada build 53 and higher. 
Has somebody done a comparison, can you share some experiences? I only 
have a few days left and I don't waste time on installing Nevada for 
nothing ...

Thanks,

  Ralf

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963 
[EMAIL PROTECTED] - http://web.de/

1&1 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, 
Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss 
Aufsichtsratsvorsitzender: Michael Scheeren

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to