[zfs-discuss] Scenario sanity check

Brian Wilson Fri, 06 Jul 2012 13:34:54 -0700

Hello,

I'd like a sanity check from people more knowledgeable than myself.

I'm managing backups on a production system. Previously I was usinganother volume manager and filesystem on Solaris, and I've just switchedto using ZFS.


My model is -
Production Server A
Test Server B
Mirrored storage arrays (HDS TruCopy if it matters)
Backup software (TSM)

Production server A sees the live volumes.

Test Server B sees the TruCopy mirrors of the live volumes. (it seesthe second storage array, the production server sees the primary array)


Production server A shuts down zone C, and exports the zpools for zone C.

Production server A splits the mirror to secondary storage array,leaving the mirror writable.

Production server A re-imports the pools for zone C, and boots zone C.
Test Server B imports the ZFS pool using -R /backup.
Backup software backs up the mounted mirror volumes on Test Server B.

Later in the day after the backups finish, a script exports the ZFSpools on test server B, and re-establishes the TruCopy mirror betweenthe storage arrays.

So.. I had this working fine with one zone on server A for a couple ofmonths. This week I've added 6 more zones, each with two ZFS pools.The first night went okay. Last night, the test server B kernel panic'dwell after the mirrored volumes zpools were imported, just after the TSMbackup started reading all the ZFS pools to push it all to theenterprise backup environment.


Here's the kernel panic message -
Jul  6 03:04:55 riggs ^Mpanic[cpu22]/thread=2a10e81bca0:

Jul 6 03:04:55 riggs unix: [ID 403854 kern.notice] assertion failed: 0== dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs,&dbp), file: ../../common/fs/zfs/dmu.c, line: 759

Jul  6 03:04:55 riggs unix: [ID 100000 kern.notice]

Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b4f0genunix:assfail+74 (7af0f8c0, 7af0f910, 2f7, 190d000, 12a1800, 0)Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3:0000000000000000 0000000000000001 0000000000000001 00000300f20fdf81Jul 6 03:04:55 riggs %l4-7: 00000000012a1800 00000000000000000000000001959400 0000000000000000Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b5a0zfs:dmu_write+54 (300cbfd5c40, ad, a70, 20, 300b8c02800, 300f83414d0)Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3:0000000000000038 0000000000000007 000000000194bd40 000000000194bc00Jul 6 03:04:55 riggs %l4-7: 0000000000000001 0000030071bcb7010000000000003006 0000000000003000Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b670zfs:space_map_sync+278 (3009babd130, b, 3009babcfe0, 20, 4, 58)Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3:0000000000000020 00000300b8c02800 00000300b8c02820 00000300b8c02858Jul 6 03:04:55 riggs %l4-7: 00007fffffffffff 0000000000007fff00000000000022d9 0000000000000020Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b760zfs:metaslab_sync+2b0 (3009babcfc0, 1db7, 300f83414d0, 3009babd408,300c9724000, 6003e24acc0)Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3:00000300cbfd5c40 000003009babcff8 000003009babd130 000003009babd2d0Jul 6 03:04:55 riggs %l4-7: 000003009babcfe0 0000000000000000000003009babd268 000000000000001aJul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b820zfs:vdev_sync+b8 (6003e24acc0, 1db7, 1db6, 3009babcfc0, 6003e24b000, 17)Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3:0000000000000090 0000000000000012 000006003e24acc0 00000300c9724000Jul 6 03:04:55 riggs %l4-7: 0000000000000000 00000000000000000000000000000000 00000009041ea000Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b8d0zfs:spa_sync+484 (300c9724000, 1db7, 3005fec09a8, 300c9724428, 1,300cbfd5c40)Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3:0000000000000000 00000300c9724280 0000030087c3e940 00000300c6aae700Jul 6 03:04:55 riggs %l4-7: 0000030080073520 00000300c972437800000300c9724300 00000300c9724330Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b9a0zfs:txg_sync_thread+1b8 (30087c3e940, 183f9f0, 707a3130, 0, 2a10e81ba70, 0)Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3:0000000000000000 0000030087c3eb0e 0000030087c3eb08 0000030087c3eb0cJul 6 03:04:55 riggs %l4-7: 000000001230fa07 0000030087c3eac80000030087c3ead0 0000000000001db7

Jul  6 03:04:55 riggs unix: [ID 100000 kern.notice]

So, I guess my question is - is what I'm doing sane? Or is theresomething inherint with ZFS that I'm missing that's going to cause thiskernel panic to repeat? Best I can guess, it got upset when the poolswere being read. I'm wondering of exporting the pools later in the daybefore re-syncing the SAN volumes to mirrors is causing weirdness.(because makes the mirrored volumes visible on Test Server B read-onlyuntil the split). I wouldn't think so, because they're exported beforethe luns go read-only, but I could be wrong.


Anyway, am I off my rocker?  This should work with ZFS, right?

Thanks!
Brian

--
-----------------------------------------------------------------------------------
Brian Wilson, Solaris SE, UW-Madison DoIT
Room 3114 CS&S            608-263-8047
brian.wilson(a)doit.wisc.edu
'I try to save a life a day. Usually it's my own.' - John Crichton
-----------------------------------------------------------------------------------

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Scenario sanity check

Reply via email to