Hi, I'm struggling to get a stable ZFS replication using Solaris 10 110/06 (actual patches) and AVS 4.0 for several weeks now. We tried it on VMware first and ended up in kernel panics en masse (yes, we read Jim Dunham's blog articles :-). Now we try on the real thing, two X4500 servers. Well, I have no trouble replicating our kernel panics there, too ... but I think I learned some important things, too. But one problem is still remaining.
I have a zpool on host A. Replication to host B works fine. * "zpool export tank" on the primary - works. * "sndradm -d" on both servers - works (paranoia mode) * "zpool import <id>" on the secondary - works. So far, so good. I chance the contents of the file system, add some files, delete some others ... no problems. The secondary is in production use now, everything is fine. Okay, let's imagine I switched to the secodary host because had a problem with the primary. Now it's repaired, now I want my redundancy back. * "sndradm -E -f ...." on both hosts - works. * "sndradm -u -r" on the primary for refreshing the primary - works. `nicstat` shows me a bit of traffic. Good, let's switch back to the primary. Actual status: zpool is imported on the secondary and NOT imported on the primary. * "zpool export tank" on the secondary - *kernel panic* Sadly, the machine dies fast, I don't see the kernel panic with `dmesg`. And disabling the replication again later and mounting the zpool on the primary again shows me that the update sync didn't take place, the file system changes I did on the secondary wren't replicated. Exporting the zpool on the secondary works *after* the system rebooted. I uses slices for the zpool, not LUNs, because I think many of my problems were caused by exclusive locking, but it doesn't help with this one. Questions: a) I don't understand why the kernel panics at the moment. the zpool isn't mounted on both systems, the zpool itself seems to be fine after a reboot ... and switching the primary and secondary hosts just for resyncing seems to force a full sync, which isn't an option. b) I'll try a "sndradm -m -r" the next time ... but I'm not sure if I like that thought. I would accept this if I replaced the primary host with another server, but having to do a 24 TB full sync just because the replication itself had been disabled for a few minutes would be hard to swallow. Or did I do something wrong? c) What performance can I expect from a X4500, 40 disks zpool, when using slices, compared to LUNs? Any experiences? And another thing: I did some experiments with zvols, because I wanted to make desasters and the AVS configuration itself easier to handle - there won't be a full sync after replacing a disk because AVS doesn't "see" that a hot spare is being used, and hot spares won't be replicated to the secondary host as well although the original drive on the secondary never failed. I used the zvol with UFS and this kind of "hardware RAID controller emulation by ZFS" works pretty well, just the performance went down the cliff. Sunsolve told me that this is a flushing problem and there's a workaround in Nevada build 53 and higher. Has somebody done a comparison, can you share some experiences? I only have a few days left and I don't waste time on installing Nevada for nothing ... Thanks, Ralf -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 [EMAIL PROTECTED] - http://web.de/ 1&1 Internet AG Brauerstraße 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss