Re: [zfs-discuss] zpool does not like iSCSI ?
From Oracle Support we got the following info: Bug ID: 6992124 reboot of Sol10 u9 host makes zpool FAULTED when zpool uses iscsi LUNs This is a duplicate of: Bug ID: 6907687 zfs pool is not automatically fixed when disk are brought back online or after boot An IDR patch already exists, but no official patch yet. - Andreas -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool does not like iSCSI ?
After a while I was able to track down the problem. During boot process service filesystem/local gets enabled long before iscsi/initiator. Start method of filesystem/local will mount ufs, swap and all stuff from /etc/vftab. Some recent patch added a zfs mount -a to the filesystem/local start method. Of couse zfs mount -a will not find our iSCSI zpools and should no do nothing at all. But in our case it finds and imports a zpool data located on a local attached disk. I suspect that the zpool management information stored within zpool data contains a pointer to a (at that time) inaccessible zpool iscsi1, which gets marked as FAULTED Therefore zpool list shows that missing zpool as FAULTED since all devices of iscsi1 are inaccessible. Some times later in boot process iscsi/initiator gets enabled. After getting all iscsi targets online, one would expect the zpool to change state, because of all devices are online now. But that will not happen. As a workaround I've written a service manifest and start method. The service will be fired up after iscsi/initiator. The start method scans output of zpool list for faulted pools with size of -. For each faulted zpool it will do a zpool export pool-name and after that a re-import with zpool import pool-name. After that the previously faulted iscsi zpool will be online again (unless you have other problems). But be aware, there is a race condition: you have to wait some time (at least 10 secs) between export and re-import of a zpool. Without wait inbetween the fault condition will not get cleared. Again: you may encounter this case only if... ... you're running some recent kernel patch level (in our case 142909-17) ... *and* you have placed zpools on both iscsi and non-iscsi devices. - Andreas -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Race condition yields to kernel panic (u3, u4) or hanging zfs commands (u5)
Hello Matt, you wrote about panic in u3 u4: These stack traces look like 6569719 (fixed in s10u5). Then I suppose it's also fixed by 127127-11 because that patch mentions 6569719. According to my zfs-hardness-test script this is true. Instead of crashing with an panic, with 127127-11 these servers now show hanging zfs commands like update 5. Please try my test script on a test server or see below. For update 5, you could start with the kernel stack of the hung commands. (use ::pgrep and ::findstack) We might also need the sync thread's stack (something like ::walk spa | ::print spa_t spa_dsl_pool-dp_txg.tx_sync_thread | ::findstack) Okay, I'll give it a try. $ uname -a SunOS qacult10 5.10 Generic_137111-08 sun4u sparc SUNW,Ultra-5_10 $ head -1 /etc/release Solaris 10 5/08 s10s_u5wos_10 SPARC $ ps -ef|grep zfs root 23795 23466 0 11:02:45 pts/1 0:00 ssh localhost zfs receive hardness-test/received root 23782 23779 0 11:02:45 ? 0:01 zfs receive hardness-test/received root 23807 23804 0 11:02:52 ? 0:00 zfs receive hardness-test/received root 23466 23145 0 11:00:35 pts/1 0:00 /usr/bin/bash ./zfs-hardness-test.sh root 23793 23466 0 11:02:45 pts/1 0:00 /usr/bin/bash ./zfs-hardness-test.sh root 23804 23797 0 11:02:52 ? 0:00 sh -c zfs receive hardness-test/received root 23779 1 0 11:02:45 ? 0:00 sh -c zfs receive hardness-test/received It seems that a receiving process (pid 23782) already killed has not yet finished. After killing and aborting data transmission, the script does a retry of the send-receive pipe (with same arguments) with pid 23807 on receiving end. There must be a deadlock/race condition. $ mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs pcipsy ip hook neti sctp arp usba fcp fctl zfs random nfs audiosup md lofs logindmux sd ptm fcip crypto ipc ] ::pgrep zfs$ SPID PPID PGIDSIDUID FLAGS ADDR NAME R 23782 23779 23779 23779 0 0x4a004000 03000171cc90 zfs R 23807 23804 23804 23804 0 0x4a004000 030001728058 zfs ::pgrep zfs$ | ::walk thread | ::findstack -v stack pointer for thread 3d24480: 2a1007fc8c1 [ 02a1007fc8c1 cv_wait+0x38() ] 02a1007fc971 delay+0x90(1, 183f000, 17cdef7, 17cdef8, 1, 18c0578) 02a1007fca21 dnode_special_close+0x20(300221e0a58, 7, 1, 300221e0c68, 7, 300221e0a58) 02a1007fcad1 dmu_objset_evict+0xb8(30003a8dc40, 300027cf500, 7b652000, 70407538, 7b652000, 70407400) 02a1007fcb91 dsl_dataset_evict+0x34(30003a8dc40, 30003a8dc40, 0, 300027cf500, 3000418c2c0, 30022366200) 02a1007fcc41 dbuf_evict_user+0x48(7b6140b0, 30022366200, 30003a8dc48, 0, 0 , 30022355e20) 02a1007fccf1 dbuf_rele+0x8c(30022355e78, 30022355e20, 70400400, 3, 3, 3) 02a1007fcda1 dmu_recvbackup+0x94c(300017c7400, 300017c7d80, 300017c7c28, 300017c7416, 16, 1) 02a1007fcf71 zfs_ioc_recvbackup+0x74(300017c7000, 0, 30004320150, 0, 0, 300017c7400) 02a1007fd031 zfsdev_ioctl+0x15c(70401400, 57, ffbfee20, 1d, 74, ef0) 02a1007fd0e1 fop_ioctl+0x20(30001d7a0c0, 5a1d, ffbfee20, 13, 300027da0c0, 12247f8) 02a1007fd191 ioctl+0x184(3, 300043216f8, ffbfee20, 0, 1ec08, 5a1d) 02a1007fd2e1 syscall_trap32+0xcc(3, 5a1d, ffbfee20, 0, 1ec08, ff34774c) stack pointer for thread 30003d12e00: 2a1009dca41 [ 02a1009dca41 turnstile_block+0x600() ] 02a1009dcaf1 mutex_vector_enter+0x3f0(0, 0, 30022355e78, 3d24480, 3d24480, 0) 02a1009dcba1 dbuf_read+0x6c(30022355e20, 0, 1, 1, 0, 300220f1cf8) 02a1009dcc61 dmu_bonus_hold+0xec(0, 15, 30022355e20, 2a1009dd5d8, 8, 0) 02a1009dcd21 dsl_dataset_open_obj+0x2c(3000418c2c0, 15, 0, 9, 300043ebe88 , 2a1009dd6a8) 02a1009dcde1 dsl_dataset_open_spa+0x140(0, 7b64d000, 3000418c488, 300043ebe88, 2a1009dd768, 9) 02a1009dceb1 dmu_objset_open+0x20(30003ca9000, 5, 9, 2a1009dd828, 1, 300043ebe88) 02a1009dcf71 zfs_ioc_objset_stats+0x18(30003ca9000, 0, 0, 0, 70401400, 39 ) 02a1009dd031 zfsdev_ioctl+0x15c(70401400, 39, ffbfc468, 13, 4c, ef0) 02a1009dd0e1 fop_ioctl+0x20(30001d7a0c0, 5a13, ffbfc468, 13, 300027da010, 12247f8) 02a1009dd191 ioctl+0x184(3, 300043208f8, ffbfc468, 0, 1010101, 5a13) 02a1009dd2e1 syscall_trap32+0xcc(3, 5a13, ffbfc468, 0, 1010101, 7cb88) ::walk spa | ::print spa_t { spa_name = 0x30022613108 hardness-test spa_avl = { avl_child = [ 0, 0 ] avl_pcb = 0x1 } spa_config = 0x3002244abd0 spa_config_syncing = 0 spa_config_txg = 0x4 spa_config_cache_lock = { _opaque = [ 0 ] } spa_sync_pass = 0x1 spa_state = 0 spa_inject_ref = 0 spa_traverse_wanted = 0 spa_sync_on = 0x1 spa_load_state = 0 (SPA_LOAD_NONE) spa_zio_issue_taskq = [ 0x300225e5528, 0x300225e56d8, 0x300225e5888, 0x300225e5a38, 0x300225e5be8, 0x300225e5d98 ]
Re: [zfs-discuss] Race condition yields to kernel panic (u3, u4) or hanging zfs commands (u5)
Could you provide the panic message and stack trace, plus the stack traces of when it's hung? --matt Hello matt, here is info and stack trace of a server running Update 3: $ uname -a SunOS qacpp03 5.10 Generic_127111-05 sun4us sparc FJSV,GPUSC-M $ head -1 /etc/release Solaris 10 11/06 s10s_u3wos_10 SPARC $ grep panic: /var/adm/messages.0 Nov 4 16:04:42 qacpp03 savecore: [ID 570001 auth.error] reboot after panic: dangling dbufs (dn=600ac879410, dbuf=600475682c0) $ mdb -k unix.4 vmcore.4 Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcipsy ssd fcp fctl md emlxs ip hook neti sctp arp usba nca random zfs nfs crypto sppp ipc logindmux ptm cpc fcip lofs ] $C 02a103108811 vpanic(7b2e1df0, 600ac879410, 600475682c0, 600ac879638, 60047568350, 90) 02a1031088c1 dnode_evict_dbufs+0x19c(600ac879410, 1, 7b2e1c00, 6000b0be500, 1, 2a103109170) 02a103108a31 dmu_objset_evict_dbufs+0xd8(17, 0, 0, 7b2db400, 6000b0be380, 6000b0be380) 02a103108ae1 dmu_objset_evict+0xb4(600054f9480, 6000b0be380, 7b2e1800, 7095b4f0, 7b2e1800, 7095b400) 02a103108ba1 dsl_dataset_evict+0x34(600054f9480, 600054f9480, deadbeef, 6000b0be380, 60076b3a700, 6009b489200) 02a103108c51 dbuf_evict_user+0x48(7b2a3f00, 6009b489200, 600054f9488, 0, 0, 6009a99b5c0) 02a103108d01 dbuf_rele+0x8c(6009a99b618, 6009a99b5c0, 70954400, 3, 3, 3) 02a103108db1 dmu_recvbackup+0x94c(60046a32a80, 60046a33400, 60046a332a8, 60046a32a96, 16, 1) 02a103108f81 zfs_ioc_recvbackup+0x74(60046a32680, 0, 6003b454130, 0, 0, 60046a32a80) 02a103109041 zfsdev_ioctl+0x15c(70955400, 57, ffbfee28, 1d, 74, ef0) 02a1031090f1 fop_ioctl+0x20(6000a020b80, 5a1d, ffbfee28, 13, 6003b28c020, 121e728) 02a1031091a1 ioctl+0x184(3, 6003b159680, ffbfee28, 0, 1ec08, 5a1d) 02a1031092f1 syscall_trap32+0xcc(3, 5a1d, ffbfee28, 0, 1ec08, ff347734) -- Panic of server running Update 4: $ uname -a SunOS vs104is1 5.10 Generic_127112-02 i86pc i386 i86pc $ head -1 /etc/release Solaris 10 8/07 s10x_u4wos_12b X86 $ grep panic: /var/adm/messages.0 Nov 4 14:37:15 vs104is1 savecore: [ID 570001 auth.error] reboot after panic: dangling dbufs (dn=d5662518, dbuf=ded05420) $ mdb -k unix.0 vmcore.0 Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip hook neti sctp arp usba fctl nca lofs zfs random nfs sppp ptm ] $C e5f73b68 vpanic(f9cdf0e4, d5662518, ded05420) e5f73c08 dnode_evict_dbufs+0x143(d5662518, 0) e5f73c28 dmu_objset_evict_dbufs+0xe3(e5f73c40, 0) e5f73c4c dmu_objset_evict+0x72(eab53e40, d56f8940) e5f73c68 dsl_dataset_evict+0x2f(f16fc240, eab53e40) e5f73c7c dbuf_evict_user+0x29(f16fc240) e5f73c9c dbuf_rele+0x7a(f16fc240, f9cd9519) e5f73cbc dsl_dataset_close+0x57(eab53e40, 3, f9cd9519) e5f73e0c dmu_recvbackup+0x88f(dec54400, dec54d80, dec54c28, 0, d6d48180, 186) e5f73e58 zfs_ioc_recvbackup+0x85(dec54000) e5f73e78 zfsdev_ioctl+0xf8() e5f73e98 cdev_ioctl+0x2b(2d4, 5a1d, 804680c, 13, ee43fb60, e5f73f78) e5f73ebc spec_ioctl+0x62(d689d900, 5a1d, 804680c, 13, ee43fb60, e5f73f78) e5f73eec fop_ioctl+0x24(d689d900, 5a1d, 804680c, 13, ee43fb60, e5f73f78) e5f73f84 ioctl+0x199() e5f73fac sys_sysenter+0x101() --- Hope this helps. Unfortunately I have no idea which kind of stack trace you need of the server running Update 5!? A stack trace of the running kernel is useless, since it will no show any usefull informationen. It's the zfs command (zfs receive and zfs list) in user space which are hanging. What kind of inforation do you need for analysis and how can I provide them? Or did I miss something? - Andreas -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Possible ZFS panic on Solaris 10 Update 6
Maybe this is the same as I've described in article http://www.opensolaris.org/jive/thread.jspa?threadID=81613tstart=0 I've written a quickdirty shell script to reproduce a race condition which forces Update 34 to panic and leads Update 5 to hanging zfs commands. - Andreas -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Race condition yields to kernel panic (u3, u4) or hanging zfs commands (u5)
Hello, occasionally we got some solaris 10 server to panic in zfs code while doing zfs send -i [EMAIL PROTECTED] [EMAIL PROTECTED] | ssh remote zfs receive poolname. The race condition(s) get triggered by a broken data transmission or killing sending zfs or ssh command. The panic or hanging zfs commands occurs on receiving end. I've tried to reproduce the conditions with a small (quickdirty) script. You'll find it as an attached file. After starting the enclosed script with ./zfs-hardness-test.sh /var/tmp it does the following: - Creates a zpool on files in /var/tmp/... - Copies some files to that zpool and creates snapshot A. - Copies more files into zpool and creates another snapshot B. - zfs send snapA fileA - zfs send -i snapA snapB fileAB - destroys zpool and creates new/empty zpool - zfs receive of snapA from fileA via ssh localhost At this point the script enters a endless loop of... - zfs receive of incremental snapAB from file AB via ssh localhost - zfs destroy snapB (if exists) - zfs list - restart loop The key point is that receiving of incremental snapAB will break at some random point, except for the first iteration where a full receive will be done. For breaking things the script sends a SIGINT to zfs and ssh commands. This simulates a broken data transmission. I suppose on busy (or slow) server you'll have more chance to trigger the problem. On all testes system, sparc and x86, solaris will panic or hang in zfs receive zfs list. Is this a (two?) known bug(s). Are there patches available? This is the output of uname -a;head -1 /etc/release of some systems tested: SunOS vs104is1 5.10 Generic_127112-02 i86pc i386 i86pc Solaris 10 8/07 s10x_u4wos_12b X86 SunOS qachpi10 5.10 Generic_137112-02 i86pc i386 i86pc Solaris 10 5/08 s10x_u5wos_10 X86 SunOS qacult10 5.10 Generic_137111-08 sun4u sparc SUNW,Ultra-5_10 Solaris 10 5/08 s10s_u5wos_10 SPARC SunOS qacpp03 5.10 Generic_127111-05 sun4us sparc FJSV,GPUSC-M Solaris 10 11/06 s10s_u3wos_10 SPARC SunOS qacult31 5.10 Generic_127127-11 sun4u sparc SUNW,Ultra-30 Solaris 10 5/08 s10s_u5wos_10 SPARC Disclaimer: DO NOT USE THE ATTACHED SCRIPT ON PRODUCTIVE SERVER. USE IT AT YOUR OWN RISK. USE IT ON A TEST SERVER ONLY. IT'S LIKELY THAT YOU WILL DAMAGE YOUR SERVER OR YOUR DATA BY RUNNING THIS SCRIPT. - Andreas -- This message posted from opensolaris.org zfs-hardness-test.sh Description: Binary data ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] clones bound too tightly to its origin
[I apologise for reposting this... but no one replied to my post from Dec, 4th.] Hallo all, while experimenting with zfs send and zfs receive mixed with cloning on receiver side I found the following... On server A there is a zpool with snapshots created on regular basis via cron. Server B get's updated by a zfs-send-ssh-zfs-receive command pipe. Both servers are running Solaris 10 update 4 (08/07). Sometimes I want to do some testing on server B without corrupting data on server A. For doing so I create a clone of the filesystem. Up to here everything is ok. As long as the mounte clone filesystem is NOT busy, any further zfs-send-ssh-zfs-receive will work properly, updating my pool on B. But there are some long running test jobs on server B which keeps clone's filesystem busy or just a single login shell with its cwd within clone's filesystem, which makes the filesystem busy from umount's point of view. Meanwhile another zfs-send-ssh-zfs-receive command gets launched to copy new snapshot from A to B. If the receiving pool of a zfs-receive-command has busy clones, the receive command will fail. For some unknown reason the receive command tries to umount my cloned filesystem and fails with Device busy. The question is: why? Since the clone is (or should be) independent of its origin, zfs receive should not umount cloned data of older snapshots. If you want to reproduce this - below (and attached) you find a simple test script. The script will bump out at last zfs-receive command. If you comment out the line cd /mnt/copy, script will run as expected. Disclaimer: Before running my script, make sure you do not have zpools named copy or origin. Use this script only on a test machine! Use it at your own risk. Here is the script: pre #!/usr/bin/bash # cleanup before test cd / set -ex for pool in origin copy; do zpool destroy $pool || : rm -f /var/tmp/zpool.$pool [ -d /mnt/$pool ] rmdir /mnt/$pool mkfile -nv 64m /var/tmp/zpool.$pool zpool create -m none $pool /var/tmp/zpool.$pool done zfs create -o mountpoint=/mnt/origin origin/test update () { # create/update a log file date /mnt/origin/log } snapnum=0 make_snap () { snapnum=$(($snapnum+1)) zfs snapshot origin/[EMAIL PROTECTED] } update make_snap update make_snap update make_snap update zfs send origin/[EMAIL PROTECTED] | zfs receive -v -d copy zfs clone copy/[EMAIL PROTECTED] copy/clone zfs send -i origin/[EMAIL PROTECTED] origin/[EMAIL PROTECTED] | zfs receive -v -d copy zfs set mountpoint=/mnt/copy copy/clone zfs list -r origin copy cd /mnt/copy# make filesystem busy zfs send -i origin/[EMAIL PROTECTED] origin/[EMAIL PROTECTED] | zfs receive -v -d copy ls -l /mnt/{origin,copy}/log exit /pre - Cleanup with pre zpool destroy copy; zpool destroy origin; rm /var/tmp/zpool.* /pre after running tests. - Andreas This message posted from opensolaris.org test-zfs-clone.sh Description: Bourne shell script ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] clones bound too tightly to its origin
Hallo all, while experimenting with zfs send and zfs receive mixed with cloning on receiver side I found the following... On server A there is a zpool with snapshots created on regular basis via cron. Server B get's updated by a zfs-send-ssh-zfs-receive command pipe. Sometimes I want to do some testing on server B without corrupting data on server A. For doing so I create a clone of the filesystem. Up to here everything is ok. As long as the clone filesystem is NOT busy, any further zfs-send-ssh-zfs-receive will work properly, updating my pool on B without desturbing my clone. But there are some long running test jobs on server B which keeps clone's filesystem busy. Meanwhile another zfs-send-ssh-zfs-receive command gets launched to copy new snapshot from A to B. If the receiving pool of a zfs-receive-command has busy clones, the receive command will fail. For some unknown reason the receive command tries a umount my cloned filesystem and fails with Device busy. The question is: why? Since the clone is (or should be) independent of its origin, zfs receive should not umount cloned data of older snapshots. If you want to reproduce this - I've attached simple test script. The script will bump out at last zfs-receive command. If you comment out the cd /mnt/copy, script will run as expected. Disclaimer: Before running my script, make sure you do not have zpools named copy or origin. Use this script only on a test machine! Cleanup with zpool destroy copy; zpool destroy origin; rm /var/tmp/zpool.* after running tests. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] clones bound too tightly to its origin
I forgot to mention: we are running Solaris 10 Update 4 (08/07)... - Andreas This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] clones bound too tightly to its origin
It seems my script got lost during edit/posting message. I'll try again attaching... - Andreas This message posted from opensolaris.org test-zfs-clone.sh Description: Bourne shell script ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Storage Pools Recommendations for Productive Environments
which one is the most performant: copies=2 or zfs-mirror? What type of copies are you talking about? Mirrored data in underlying storage subsystem or a (new) feature in zfs? - Andreas This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS not utilizing all disks
What does zpool status database say? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Storage Pools Recommendations for Productive Environments
Hello, solaris Internals wiki contains many interesting things about zfs. But i have no glue about the reasons for this entry: In Section ZFS Storage Pools Recommendations - Storage Pools you can read: [i]For all production environments, set up a redundant ZFS storage pool, such as a raidz, raidz2, or a mirrored configuration, regardless of the RAID level implemented on the underlying storage device.[/i] (see http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pools) In our environment we use EMC based storage subsystems which are protected by RAID1 (mirrored disks). What's the reason for building an upper level zfs mirror on these already mirrored disks? Or do I misinterpret solaris internals wiki? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss