Re: [zfs-discuss] kernel panic during zfs import [UPDATE]
Hello everybody, just to let you know what happened in the meantime: I was able to open a Service Request at Oracle. The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf) The bug has bin fixed (according to Oracle support) since build 164, but there is no fix for Solaris 11 available so far (will be fixed in S11U7?). There is a workaround available that works (partly), but my system crashed again when trying to rebuild the offending zfs within the affected zpool. At the moment I'm waiting for a so called interim diagnostic relief patch cu Carsten -- Max Planck Institut fuer marine Mikrobiologie - Network Administration - Celsiustr. 1 D-28359 Bremen Tel.: +49 421 2028568 Fax.: +49 421 2028565 PGP public key:http://www.mpi-bremen.de/Carsten_John.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import [UPDATE]
On 17/04/2012 16:40, Carsten John wrote: Hello everybody, just to let you know what happened in the meantime: I was able to open a Service Request at Oracle. The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf) The bug has bin fixed (according to Oracle support) since build 164, but there is no fix for Solaris 11 available so far (will be fixed in S11U7?). There is a workaround available that works (partly), but my system crashed again when trying to rebuild the offending zfs within the affected zpool. At the moment I'm waiting for a so called interim diagnostic relief patch so are you on s11, can I see pkg info entire this bug is fixed in FCS s11 release, as that is 175b, and it got fixed in build 164. So if you have solaris 11 that CR is fixed. In solaris 10 it is fixed in 147440-14/147441-14 ( sparc/x86 ) Enda cu Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import [UPDATE]
Hi Carsten, Am 17.04.12 17:40, schrieb Carsten John: Hello everybody, just to let you know what happened in the meantime: I was able to open a Service Request at Oracle. The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf) The bug has bin fixed (according to Oracle support) since build 164, but there is no fix for Solaris 11 available so far (will be fixed in S11U7?). There is a workaround available that works (partly), but my system crashed again when trying to rebuild the offending zfs within the affected zpool. At the moment I'm waiting for a so called interim diagnostic relief patch cu Carsten Afaik, bug 6742788 is fixed in S11 FCS (release) but you might be hitting this bug: 7098658. This bug, according to MOS, is still unresolved. My solution is to mount the affected zfs fs in read-only mode upon importing the zpool and setting it to rw afterwards. Cheers, budy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]
In message 4f735451.2020...@oracle.com, Deepak Honnalli writes: Thanks for your reply. I would love to take a look at the core file. If there is a way this can somehow be transferred to the internal cores server, I can work on the bug. I am not sure about the modalities of transferring the core file though. I will ask around and see if I can help you here. How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1] John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]
Am 30.03.12 21:45, schrieb John D Groenveld: In message4f735451.2020...@oracle.com, Deepak Honnalli writes: Thanks for your reply. I would love to take a look at the core file. If there is a way this can somehow be transferred to the internal cores server, I can work on the bug. I am not sure about the modalities of transferring the core file though. I will ask around and see if I can help you here. How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1] John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss https://supportiles.sun.com ist the place to send those files to. Cheers, budy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]
-Original message- To: zfs-discuss@opensolaris.org; From: John D Groenveld jdg...@elvis.arl.psu.edu Sent: Fri 30-03-2012 21:47 Subject:Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this] In message 4f735451.2020...@oracle.com, Deepak Honnalli writes: Thanks for your reply. I would love to take a look at the core file. If there is a way this can somehow be transferred to the internal cores server, I can work on the bug. I am not sure about the modalities of transferring the core file though. I will ask around and see if I can help you here. How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1] John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Hi John, in the meantime I managed to open a service request at Oracle. There is a webportal https://supportfiles.sun.com. There you can upload the files... cu Carsten -- Max Planck Institut fuer marine Mikrobiologie - Network Administration - Celsiustr. 1 D-28359 Bremen Tel.: +49 421 2028568 Fax.: +49 421 2028565 PGP public key:http://www.mpi-bremen.de/Carsten_John.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import
Hi Carsten, This was supposed to be fixed in build 164 of Nevada (6742788). If you are still seeing this issue in S11, I think you should raise a bug with relevant details. As Paul has suggested, this could also be due to incomplete snapshot. I have seen interrupted zfs recv's causing weired bugs. Thanks, Deepak. On 03/27/12 12:44 PM, Carsten John wrote: Hallo everybody, I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic during the import of a zpool (some 30TB) containing ~500 zfs filesystems after reboot. This causes a reboot loop, until booted single user and removed /etc/zfs/zpool.cache. From /var/adm/messages: savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=ff002f9cec50 addr=20 occurred in module zfs due to a NULL pointer dereference savecore: [ID 882351 auth.error] Saving compressed system crash dump in /var/crash/vmdump.2 This is what mdb tells: mdb unix.2 vmcore.2 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs random fcp idm sata fcip cpc crypto ufs logindmux ptm sppp ] $c zap_leaf_lookup_closest+0x45(ff0700ca2a98, 0, 0, ff002f9cedb0) fzap_cursor_retrieve+0xcd(ff0700ca2a98, ff002f9ceed0, ff002f9cef10) zap_cursor_retrieve+0x195(ff002f9ceed0, ff002f9cef10) zfs_purgedir+0x4d(ff0721d32c20) zfs_rmnode+0x57(ff0721d32c20) zfs_zinactive+0xb4(ff0721d32c20) zfs_inactive+0x1a3(ff0721d3a700, ff07149dc1a0, 0) fop_inactive+0xb1(ff0721d3a700, ff07149dc1a0, 0) vn_rele+0x58(ff0721d3a700) zfs_unlinked_drain+0xa7(ff07022dab40) zfsvfs_setup+0xf1(ff07022dab40, 1) zfs_domount+0x152(ff07223e3c70, ff0717830080) zfs_mount+0x4e3(ff07223e3c70, ff07223e5900, ff002f9cfe20, ff07149dc1a0) fsop_mount+0x22(ff07223e3c70, ff07223e5900, ff002f9cfe20, ff07149dc1a0) domount+0xd2f(0, ff002f9cfe20, ff07223e5900, ff07149dc1a0, ff002f9cfe18) mount+0xc0(ff0713612c78, ff002f9cfe98) syscall_ap+0x92() _sys_sysenter_post_swapgs+0x149() I can import the pool readonly. The server is a mirror for our primary file server and is synced via zfs send/receive. I saw a similar effect some time ago on a opensolaris box (build 111b). That time my final solution was to copy over the read only mounted stuff to a newly created pool. As it is the second time this failure occures (on different machines) I'm really concerned about overall reliability Any suggestions? thx Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import
-Original message- To: ZFS Discussions zfs-discuss@opensolaris.org; From: Paul Kraus p...@kraus-haus.org Sent: Tue 27-03-2012 15:05 Subject:Re: [zfs-discuss] kernel panic during zfs import On Tue, Mar 27, 2012 at 3:14 AM, Carsten John cj...@mpi-bremen.de wrote: Hallo everybody, I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic during the import of a zpool (some 30TB) containing ~500 zfs filesystems after reboot. This causes a reboot loop, until booted single user and removed /etc/zfs/zpool.cache. From /var/adm/messages: savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=ff002f9cec50 addr=20 occurred in module zfs due to a NULL pointer dereference savecore: [ID 882351 auth.error] Saving compressed system crash dump in /var/crash/vmdump.2 I ran into a very similar problem with Solaris 10U9 and the replica (zfs send | zfs recv destination) of a zpool of about 25 TB of data. The problem was an incomplete snapshot (the zfs send | zfs recv had been interrupted). On boot the system was trying to import the zpool and as part of that it was trying to destroy the offending (incomplete) snapshot. This was zpool version 22 and destruction of snapshots is handled as a single TXG. The problem was that the operation was running the system out of RAM (32 GB worth). There is a fix for this and it is in zpool 26 (or newer), but any snapshots created while the zpool is at a version prior to 26 will have the problem on-disk. We have support with Oracle and were able to get a loaner system with 128 GB RAM to clean up the zpool (it took about 75 GB RAM to do so). If you are at zpool 26 or later this is not your problem. If you are at zpool 26, then test for an incomplete snapshot by importing the pool read only, then `zdb -d zpool | grep '%'` as the incomplete snapshot will have a '%' instead of a '@' as the dataset / snapshot separator. You can also run the zdb against the _un_imported_ zpool using the -e option to zdb. See the following Oracle Bugs for more information. CR# 6876953 CR# 6910767 CR# 7082249 CR#7082249 has been marked as a duplicate of CR# 6948890 P.S. I have a suspect that the incomplete snapshot was also corrupt in some strange way, but could never make a solid determination of that. We think what caused the zfs send | zfs recv to be interrupted was hitting an e1000g Ethernet device driver bug. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, Troy Civic Theatre Company - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Hi, this scenario seems to fit. The machine that was sending the snapshot is on OpenSolaris Build 111b (which is running zpool version 14). I rebooted the receiving machine due to a hanging zfs receive that couldn't be killed. zdb -d -e pool does not give any useful information: zdb -d -e san_pool Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects When importing the pool readonly, I get an error about two datasets: zpool import -o readonly=on san_pool cannot set property for 'san_pool/home/someuser': dataset is read-only cannot set property for 'san_pool/home/someotheruser': dataset is read-only As this is a mirror machine, I still have the option to destroy the pool and copy over the stuff via send/receive from the primary. But nobody knows how long this will work until I'm hit again If an interrupted send/receive can screw up a 30TB target pool, then send/receive isn't an option for replication data at all, furthermore it should be flagged as don't use it if your target pool might contain any valuable data I wil reproduce the crash once more and try to file a bug report for S11 as recommended by Deepak (not so easy these days...). thanks Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]
-Original message- To: zfs-discuss@opensolaris.org; From: Deepak Honnalli deepak.honna...@oracle.com Sent: Wed 28-03-2012 09:12 Subject:Re: [zfs-discuss] kernel panic during zfs import Hi Carsten, This was supposed to be fixed in build 164 of Nevada (6742788). If you are still seeing this issue in S11, I think you should raise a bug with relevant details. As Paul has suggested, this could also be due to incomplete snapshot. I have seen interrupted zfs recv's causing weired bugs. Thanks, Deepak. Hi Deepak, I just spent about an hour (or two) trying to file a bug report regarding the issue without success. Seems to me, that I'm too stupid to use this MyOracleSupport portal. So, as I'm getting paid for keeping systems running and not clicking through flash overloaded support portals searching for CSIs, I'm giving the relevant information to the list now. Perhaps, someone at Oracle, reading the list, is able to file a bug report, or contact me off list. Background: Machine A - Sun X4270 - Opensolaris Build 111b - zpool version 14 - primary file server - sending snapshots via zfs send - direct attached Sun J4400 SAS JBODs with totally 40 TB storage Machine B - Sun X4270 - Solaris 11 - zpool version 33 - mirror server - receiving snapshots via zfs receive - FC attached Storagetek FLX280 storage Incident: After a zfs send/receive run machine B had a hanging zfs receive process. To get rid of the process, I rebooted the machine. During reboot the kernel panics, resulting in a reboot loop. To bring up the system, I rebooted single user, removed /etc/zfs/zpool.cache and rebooted again. The damaged pool can imported readonly, giving a warning: $zpool import -o readonly=on san_pool cannot set property for 'san_pool/home/someuser': dataset is read-only cannot set property for 'san_pool/home/someotheruser': dataset is read-only The ZFS debugger zdb does not give any additional information: $zdb -d -e san_pool Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects The issue can reproduced by trying to import the pool r/w, resulting in a kernel panic. The fmdump utility gives the following information for the relevant UUID: $fmdump -Vp -u 91da1503-74c5-67c2-b7c1-d4e245e4d968 TIME UUID SUNW-MSG-ID Mar 28 2012 12:54:26.563203000 91da1503-74c5-67c2-b7c1-d4e245e4d968 SUNOS-8000-KL TIME CLASS ENA Mar 28 12:54:24.2698 ireport.os.sunos.panic.dump_available 0x Mar 28 12:54:05.9826 ireport.os.sunos.panic.dump_pending_on_device 0x nvlist version: 0 version = 0x0 class = list.suspect uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968 code = SUNOS-8000-KL diag-time = 1332932066 541092 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 __case_state = 0x1 topo-uuid = 3b4117e0-0ac7-cde5-b434-b9735176d591 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968 resource = sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968 savecore-succcess = 1 dump-dir = /var/crash dump-files = vmdump.0 os-instance-uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968 panicstr = BAD TRAP: type=e (#pf Page fault) rp=ff002f6dcc50 addr=20 occurred in module zfs due to a NULL pointer dereference panicstack = unix:die+d8 () | unix:trap+152b () | unix:cmntrap+e6 () | zfs:zap_leaf_lookup_closest+45 () | zfs:fzap_cursor_retrieve+cd () | zfs:zap_cursor_retrieve+195 () | zfs:zfs_purgedir+4d () | zfs:zfs_rmnode+57 () | zfs:zfs_zinactive+b4 () | zfs:zfs_inactive+1a3 () | genunix:fop_inactive+b1 () | genunix:vn_rele+58 () | zfs:zfs_unlinked_drain+a7 () | zfs:zfsvfs_setup+f1 () | zfs:zfs_domount+152 () | zfs:zfs_mount+4e3 () | genunix:fsop_mount+22 () | genunix:domount+d2f () | genunix:mount+c0 () | genunix:syscall_ap+92 () | unix:brand_sys_sysenter+1cf () | crashtime = 1332931339 panic-time = March 28, 2012 12:42:19 PM CEST CEST (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x4f72ede2 0x2191cbb8 The 'first view' debugger output looks like: mdb unix.0 vmcore.0 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs random idm sppp crypto sata fcip cpc fcp ufs logindmux ptm ] $c zap_leaf_lookup_closest+0x45(ff0728eac588, 0
Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]
In message zarafa.4f7307dd.297a.5713b0445a582...@zarafa.mpi-bremen.de, =?utf- 8?Q?Carsten_John?= writes: I just spent about an hour (or two) trying to file a bug report regarding the issue without success. Seems to me, that I'm too stupid to use this MyOracleSupport portal. So, as I'm getting paid for keeping systems running and not clicking through f lash overloaded support portals searching for CSIs, I'm giving the relevant in formation to the list now. If the Flash interface is broken, try the non-Flash MOS site: URL:http://SupportHTML.Oracle.COM/ John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]
Hi Carsten, Thanks for your reply. I would love to take a look at the core file. If there is a way this can somehow be transferred to the internal cores server, I can work on the bug. I am not sure about the modalities of transferring the core file though. I will ask around and see if I can help you here. Thanks, Deepak. On Wednesday 28 March 2012 06:15 PM, Carsten John wrote: -Original message- To: zfs-discuss@opensolaris.org; From: Deepak Honnallideepak.honna...@oracle.com Sent: Wed 28-03-2012 09:12 Subject:Re: [zfs-discuss] kernel panic during zfs import Hi Carsten, This was supposed to be fixed in build 164 of Nevada (6742788). If you are still seeing this issue in S11, I think you should raise a bug with relevant details. As Paul has suggested, this could also be due to incomplete snapshot. I have seen interrupted zfs recv's causing weired bugs. Thanks, Deepak. Hi Deepak, I just spent about an hour (or two) trying to file a bug report regarding the issue without success. Seems to me, that I'm too stupid to use this MyOracleSupport portal. So, as I'm getting paid for keeping systems running and not clicking through flash overloaded support portals searching for CSIs, I'm giving the relevant information to the list now. Perhaps, someone at Oracle, reading the list, is able to file a bug report, or contact me off list. Background: Machine A - Sun X4270 - Opensolaris Build 111b - zpool version 14 - primary file server - sending snapshots via zfs send - direct attached Sun J4400 SAS JBODs with totally 40 TB storage Machine B - Sun X4270 - Solaris 11 - zpool version 33 - mirror server - receiving snapshots via zfs receive - FC attached Storagetek FLX280 storage Incident: After a zfs send/receive run machine B had a hanging zfs receive process. To get rid of the process, I rebooted the machine. During reboot the kernel panics, resulting in a reboot loop. To bring up the system, I rebooted single user, removed /etc/zfs/zpool.cache and rebooted again. The damaged pool can imported readonly, giving a warning: $zpool import -o readonly=on san_pool cannot set property for 'san_pool/home/someuser': dataset is read-only cannot set property for 'san_pool/home/someotheruser': dataset is read-only The ZFS debugger zdb does not give any additional information: $zdb -d -e san_pool Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects The issue can reproduced by trying to import the pool r/w, resulting in a kernel panic. The fmdump utility gives the following information for the relevant UUID: $fmdump -Vp -u 91da1503-74c5-67c2-b7c1-d4e245e4d968 TIME UUID SUNW-MSG-ID Mar 28 2012 12:54:26.563203000 91da1503-74c5-67c2-b7c1-d4e245e4d968 SUNOS-8000-KL TIME CLASS ENA Mar 28 12:54:24.2698 ireport.os.sunos.panic.dump_available 0x Mar 28 12:54:05.9826 ireport.os.sunos.panic.dump_pending_on_device 0x nvlist version: 0 version = 0x0 class = list.suspect uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968 code = SUNOS-8000-KL diag-time = 1332932066 541092 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 __case_state = 0x1 topo-uuid = 3b4117e0-0ac7-cde5-b434-b9735176d591 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968 resource = sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968 savecore-succcess = 1 dump-dir = /var/crash dump-files = vmdump.0 os-instance-uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968 panicstr = BAD TRAP: type=e (#pf Page fault) rp=ff002f6dcc50 addr=20 occurred in module zfs due to a NULL pointer dereference panicstack = unix:die+d8 () | unix:trap+152b () | unix:cmntrap+e6 () | zfs:zap_leaf_lookup_closest+45 () | zfs:fzap_cursor_retrieve+cd () | zfs:zap_cursor_retrieve+195 () | zfs:zfs_purgedir+4d () | zfs:zfs_rmnode+57 () | zfs:zfs_zinactive+b4 () | zfs:zfs_inactive+1a3 () | genunix:fop_inactive+b1 () | genunix:vn_rele+58 () | zfs:zfs_unlinked_drain+a7 () | zfs:zfsvfs_setup+f1 () | zfs:zfs_domount+152 () | zfs:zfs_mount+4e3 () | genunix:fsop_mount+22 () | genunix:domount+d2f () | genunix:mount+c0 () | genunix:syscall_ap+92 () | unix:brand_sys_sysenter+1cf () | crashtime = 1332931339 panic-time = March 28, 2012 12:42:19 PM CEST CEST (end fault-list[0]) fault-status = 0x1
Re: [zfs-discuss] kernel panic during zfs import
2012-03-27 11:14, Carsten John write: I saw a similar effect some time ago on a opensolaris box (build 111b). That time my final solution was to copy over the read only mounted stuff to a newly created pool. As it is the second time this failure occures (on different machines) I'm really concerned about overall reliability Any suggestions? A couple of months ago I reported a similar issue (though with a different stacktrace and code path). I tracked it to code in freeing of deduped blocks where a valid code path could return a NULL pointer, but further routines used the pointer as if it is always valid - thus a NULL dereference when the pool was imported RW and tried to release blocks marked for deletion. Adding a check for non-NULLness in my private rebuild of oi_151a has fixed the issue. I wouldn't be surprised to see similar slackiness in other parts of the code now. Not checking input values in routines seems like an arrogant mistake waiting to fire (and it did for us). I am not sure how to make a webrev and ultimately a signed-off contribution upstream, but I posted my patch and research on the list and in illumos bugtracker. I am not sure how you can fix a S11 system though. If it is at zpool v28 or older, you can try to import it into an openindiana installation, perhaps rebuilt for similar patched code that would check for NULLs and fix your pool (and then reuse it in S11 if you must). The source is there on http://src.illumos.org and your stacktrace should tell you in which functions you should start looking... Good luck, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import
On Tue, Mar 27, 2012 at 3:14 AM, Carsten John cj...@mpi-bremen.de wrote: Hallo everybody, I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic during the import of a zpool (some 30TB) containing ~500 zfs filesystems after reboot. This causes a reboot loop, until booted single user and removed /etc/zfs/zpool.cache. From /var/adm/messages: savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=ff002f9cec50 addr=20 occurred in module zfs due to a NULL pointer dereference savecore: [ID 882351 auth.error] Saving compressed system crash dump in /var/crash/vmdump.2 I ran into a very similar problem with Solaris 10U9 and the replica (zfs send | zfs recv destination) of a zpool of about 25 TB of data. The problem was an incomplete snapshot (the zfs send | zfs recv had been interrupted). On boot the system was trying to import the zpool and as part of that it was trying to destroy the offending (incomplete) snapshot. This was zpool version 22 and destruction of snapshots is handled as a single TXG. The problem was that the operation was running the system out of RAM (32 GB worth). There is a fix for this and it is in zpool 26 (or newer), but any snapshots created while the zpool is at a version prior to 26 will have the problem on-disk. We have support with Oracle and were able to get a loaner system with 128 GB RAM to clean up the zpool (it took about 75 GB RAM to do so). If you are at zpool 26 or later this is not your problem. If you are at zpool 26, then test for an incomplete snapshot by importing the pool read only, then `zdb -d zpool | grep '%'` as the incomplete snapshot will have a '%' instead of a '@' as the dataset / snapshot separator. You can also run the zdb against the _un_imported_ zpool using the -e option to zdb. See the following Oracle Bugs for more information. CR# 6876953 CR# 6910767 CR# 7082249 CR#7082249 has been marked as a duplicate of CR# 6948890 P.S. I have a suspect that the incomplete snapshot was also corrupt in some strange way, but could never make a solid determination of that. We think what caused the zfs send | zfs recv to be interrupted was hitting an e1000g Ethernet device driver bug. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, Troy Civic Theatre Company - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
It seems that obtaining an Oracle support contract or a contract renewal is equally frustrating. I don't have any axe to grind with Oracle. I'm new to the Solaris thing and wanted to see if it was for me. If I was using this box to make money then sure I wouldn't have any problem paying for support. I don't expect handouts and I don't mind paying. I trusted ZFS because I heard it's for enterprise use and now I have 200G of data offline and not a peep from Oracle. Looking on the net I found another guy who had the same exact failure. To my way of thinking somebody needs to standup and get this fixed for us and make sure it doesn't happen to anybody else. If that happens I have no grudge against Oracle or Solaris. If it doesn't that's a pretty sour experience for someone to go through and it will definitely make me look at this whole thing in another light. I still believe somebody over there will do the right thing. I don't believe Oracle needs to hold people's data hostage to make money. I am sure they have enough good products and services to make money honestly. Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
On Fri, Aug 19, 2011 at 4:43 AM, Stu Whitefish swhitef...@yahoo.com wrote: It seems that obtaining an Oracle support contract or a contract renewal is equally frustrating. I don't have any axe to grind with Oracle. I'm new to the Solaris thing and wanted to see if it was for me. If I was using this box to make money then sure I wouldn't have any problem paying for support. I don't expect handouts and I don't mind paying. I trusted ZFS because I heard it's for enterprise use and now I have 200G of data offline and not a peep from Oracle. Looking on the net I found another guy who had the same exact failure. To my way of thinking somebody needs to standup and get this fixed for us and make sure it doesn't happen to anybody else. If that happens I have no grudge against Oracle or Solaris. If it doesn't that's a pretty sour experience for someone to go through and it will definitely make me look at this whole thing in another light. I still believe somebody over there will do the right thing. I don't believe Oracle needs to hold people's data hostage to make money. I am sure they have enough good products and services to make money honestly. Jim You digitally signed a license agreement stating the following: *No Technical Support* Our technical support organization will not provide technical support, phone support, or updates to you for the Programs licensed under this agreement. To turn around and keep repeating that they're holding your data hostage is disingenuous at best. Nobody is holding your data hostage. You voluntarily put it on an operating system that explicitly states doesn't offer support from the parent company. Nobody from Oracle is going to show up with a patch for you on this mailing list because none of the Oracle employees want to lose their job and subsequently be subjected to a lawsuit. If that's what you're planning on waiting for, I'd suggest you take a new approach. Sorry to be a downer, but that's reality. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
In message 1313687977.77375.yahoomail...@web121903.mail.ne1.yahoo.com, Stu Wh itefish writes: Nope, not a clue how to do that and I have installed Windows on this box inste ad of Solaris since I can't get my data back from ZFS. I have my two drives the pool is on disconnected so if this ever gets resolved I can reinstall Solaris and start learning again. I believe you can configure VirtualBox for Windows to pass thru the disk with your unimportable rpool to guest OSs. Can OpenIndiana or FreeBSD guest import the pool? Does Solaris 11X crash at the same place when run from within VirtualBox? John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
You're probably hitting bug 7056738 - http://wesunsolve.net/bugid/id/7056738 Looks like it's not fixed yet @ oracle anyway... Were you using crypto on your datasets ? Regards, Thomas On Tue, 16 Aug 2011 09:33:34 -0700 (PDT) Stu Whitefish swhitef...@yahoo.com wrote: - Original Message - From: Alexander Lesle gro...@tierarzt-mueller.de To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 8:37:42 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! Hello Stu Whitefish and List, On August, 15 2011, 21:17 Stu Whitefish wrote in [1]: 7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a kernel panic, even when booted from different OS versions Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest from Oracle) several times each as well as 2 new installs of Update 8. When I understand you right is your primary interest to recover your data on tank pool. Have you check the way to boot from a Live-DVD, mount your safe place and copy the data on a other machine? Hi Alexander, Yes of course...the problem is no version of Solaris can import the pool. Please refer to the first message in the thread. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Gouverneur Thomas t...@ians.be ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
Have you already extracted the core file of the kernel crash ? (and btw activated dump device for such dumping happen at next reboot...) Have you also tried applying the latest kernel/zfs patches and try importing the pool afterwards ? Thomas On 08/18/2011 06:40 PM, Stu Whitefish wrote: Hi Thomas, Thanks for that link. That's very similar but not identical. There's a different line number in zfs_ioctl.c, mine and Preston's fail on line 1815. It could be because of a difference in levels in that module of course, but the traceback is not identical either. Ours show brand_sysenter and the one you linked to shows brand_sys_syscall. I don't know what all that means but it is different. Anyway at least two of us have identical failures. I was not using crypto, just a plain jane mirror on 2 drives. Possibly I had compression on a few file systems but everything else was allowed to default. Here are our screenshots in case anybody doesn't want to go through the thread. http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ http://prestonconnors.com/zvol_get_stats.jpg I hope somebody can help with this. It's not a good feeling having so much data gone. Thanks for your help. Oracle, are you listening? Jim - Original Message - From: Thomas Gouverneurt...@ians.be To: zfs-discuss@opensolaris.org Cc: Stu Whitefishswhitef...@yahoo.com Sent: Thursday, August 18, 2011 1:57:29 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! You're probably hitting bug 7056738 - http://wesunsolve.net/bugid/id/7056738 Looks like it's not fixed yet @ oracle anyway... Were you using crypto on your datasets ? Regards, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
From: Thomas Gouverneur t...@ians.be To: zfs-discuss@opensolaris.org Cc: Sent: Thursday, August 18, 2011 5:11:16 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! Have you already extracted the core file of the kernel crash ? Nope, not a clue how to do that and I have installed Windows on this box instead of Solaris since I can't get my data back from ZFS. I have my two drives the pool is on disconnected so if this ever gets resolved I can reinstall Solaris and start learning again. (and btw activated dump device for such dumping happen at next reboot...) This was a development box for me to see how I get along with Solaris. I'm afraid I don't have any experience in Solaris to understand your question. Have you also tried applying the latest kernel/zfs patches and try importing the pool afterwards ? Wish I had them and knew what to do with them if I had them. Somebody on OTN noted this is supposed to be fixed by 142910 but I didn't hear back yet whether it fixes an pool ZFS won't import, or it only stops it from happening in the first place. Don't have a service contract as I say this box was my first try with Solaris and it is a homebrew system not on Oracle's support list. I am sure if there is a patch for this or a way to get my 200G of data back some kind soul at Oracle will certainly help me since I lost my data and getting it back isn't a matter of convenience. What an opportunity to generate some old fashioned goodwill! :-) Jim Thomas On 08/18/2011 06:40 PM, Stu Whitefish wrote: Hi Thomas, Thanks for that link. That's very similar but not identical. There's a different line number in zfs_ioctl.c, mine and Preston's fail on line 1815. It could be because of a difference in levels in that module of course, but the traceback is not identical either. Ours show brand_sysenter and the one you linked to shows brand_sys_syscall. I don't know what all that means but it is different. Anyway at least two of us have identical failures. I was not using crypto, just a plain jane mirror on 2 drives. Possibly I had compression on a few file systems but everything else was allowed to default. Here are our screenshots in case anybody doesn't want to go through the thread. http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ http://prestonconnors.com/zvol_get_stats.jpg I hope somebody can help with this. It's not a good feeling having so much data gone. Thanks for your help. Oracle, are you listening? Jim - Original Message - From: Thomas Gouverneurt...@ians.be To: zfs-discuss@opensolaris.org Cc: Stu Whitefishswhitef...@yahoo.com Sent: Thursday, August 18, 2011 1:57:29 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! You're probably hitting bug 7056738 - http://wesunsolve.net/bugid/id/7056738 Looks like it's not fixed yet @ oracle anyway... Were you using crypto on your datasets ? Regards, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
On Fri, 19 Aug 2011, Edho Arief wrote: Asking Oracle for help without support contract would be like shouting in vacuum space... It seems that obtaining an Oracle support contract or a contract renewal is equally frustrating. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
- Original Message - From: Alexander Lesle gro...@tierarzt-mueller.de To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 8:37:42 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! Hello Stu Whitefish and List, On August, 15 2011, 21:17 Stu Whitefish wrote in [1]: 7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a kernel panic, even when booted from different OS versions Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest from Oracle) several times each as well as 2 new installs of Update 8. When I understand you right is your primary interest to recover your data on tank pool. Have you check the way to boot from a Live-DVD, mount your safe place and copy the data on a other machine? Hi Alexander, Yes of course...the problem is no version of Solaris can import the pool. Please refer to the first message in the thread. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
- Original Message - From: John D Groenveld jdg...@elvis.arl.psu.edu To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 6:12:37 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! In message 1313431448.5331.yahoomail...@web121911.mail.ne1.yahoo.com, Stu Whi tefish writes: I'm sorry, I don't understand this suggestion. The pool that won't import is a mirror on two drives. Disconnect all but the two mirrored drives that you must import and try to import from a S11X LiveUSB. Hi John, Thanks for the suggestion, but it fails the same way. It panics and reboots too fast for me to capture the messages but they're the same as what I posted in the opening post of this thread. This is a snap of zpool import before I tried importing it. Everything looks normal except it's odd the controller numbers keep changing. http://imageshack.us/photo/my-images/705/sol11expresslive.jpg/ Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish swhitef...@yahoo.com wrote: # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I encourage you to open a support case and ask for an escalation on CR 7056738. -- Mike Gerdts Hi Mike, Unfortunately I don't have a support contract. I've been trying to set up a development system on Solaris and learn it. Until this happened, I was pretty happy with it. Even so, I don't have supported hardware so I couldn't buy a contract until I bought another machine and I really have enough machines so I cannot justify the expense right now. And I refuse to believe Oracle would hold people hostage in a situation like this, but I do believe they could generate a lot of goodwill by fixing this for me and whoever else it happened to and telling us what level of Solaris 10 this is fixed at so this doesn't continue happening. It's a pretty serious failure and I'm not the only one who it happened to. It's incredible but in all the years I have been using computers I don't ever recall losing data due to a filesystem or OS issue. That includes DOS, Windows, Linux, etc. I cannot believe ZFS on Intel is so fragile that people lose hundreds of gigs of data and that's just the way it is. There must be a way to recover this data and some advice on preventing it from happening again. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
may be try the following 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris then choose single user mode(6)) 2)when ask to mount rpool just say no 3)mkdir /tmp/mnt1 /tmp/mnt2 4)zpool import -f -R /tmp/mnt1 tank 5)zpool import -f -R /tmp/mnt2 rpool On 8/15/2011 9:12 AM, Stu Whitefish wrote: On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish swhitef...@yahoo.com wrote: # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I encourage you to open a support case and ask for an escalation on CR 7056738. -- Mike Gerdts Hi Mike, Unfortunately I don't have a support contract. I've been trying to set up a development system on Solaris and learn it. Until this happened, I was pretty happy with it. Even so, I don't have supported hardware so I couldn't buy a contract until I bought another machine and I really have enough machines so I cannot justify the expense right now. And I refuse to believe Oracle would hold people hostage in a situation like this, but I do believe they could generate a lot of goodwill by fixing this for me and whoever else it happened to and telling us what level of Solaris 10 this is fixed at so this doesn't continue happening. It's a pretty serious failure and I'm not the only one who it happened to. It's incredible but in all the years I have been using computers I don't ever recall losing data due to a filesystem or OS issue. That includes DOS, Windows, Linux, etc. I cannot believe ZFS on Intel is so fragile that people lose hundreds of gigs of data and that's just the way it is. There must be a way to recover this data and some advice on preventing it from happening again. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
Hi. Thanks I have tried this on update 8 and Sol 11 Express. The import always results in a kernel panic as shown in the picture. I did not try an alternate mountpoint though. Would it make that much difference? - Original Message - From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 3:06:20 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! may be try the following 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris then choose single user mode(6)) 2)when ask to mount rpool just say no 3)mkdir /tmp/mnt1 /tmp/mnt2 4)zpool import -f -R /tmp/mnt1 tank 5)zpool import -f -R /tmp/mnt2 rpool On 8/15/2011 9:12 AM, Stu Whitefish wrote: On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish swhitef...@yahoo.com wrote: # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I encourage you to open a support case and ask for an escalation on CR 7056738. -- Mike Gerdts Hi Mike, Unfortunately I don't have a support contract. I've been trying to set up a development system on Solaris and learn it. Until this happened, I was pretty happy with it. Even so, I don't have supported hardware so I couldn't buy a contract until I bought another machine and I really have enough machines so I cannot justify the expense right now. And I refuse to believe Oracle would hold people hostage in a situation like this, but I do believe they could generate a lot of goodwill by fixing this for me and whoever else it happened to and telling us what level of Solaris 10 this is fixed at so this doesn't continue happening. It's a pretty serious failure and I'm not the only one who it happened to. It's incredible but in all the years I have been using computers I don't ever recall losing data due to a filesystem or OS issue. That includes DOS, Windows, Linux, etc. I cannot believe ZFS on Intel is so fragile that people lose hundreds of gigs of data and that's just the way it is. There must be a way to recover this data and some advice on preventing it from happening again. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
On 8/15/2011 11:25 AM, Stu Whitefish wrote: Hi. Thanks I have tried this on update 8 and Sol 11 Express. The import always results in a kernel panic as shown in the picture. I did not try an alternate mountpoint though. Would it make that much difference? try it - Original Message - From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.laot...@gmail.com To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 3:06:20 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! may be try the following 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris then choose single user mode(6)) 2)when ask to mount rpool just say no 3)mkdir /tmp/mnt1 /tmp/mnt2 4)zpool import -f -R /tmp/mnt1 tank 5)zpool import -f -R /tmp/mnt2 rpool On 8/15/2011 9:12 AM, Stu Whitefish wrote: On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish swhitef...@yahoo.com wrote: # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I encourage you to open a support case and ask for an escalation on CR 7056738. -- Mike Gerdts Hi Mike, Unfortunately I don't have a support contract. I've been trying to set up a development system on Solaris and learn it. Until this happened, I was pretty happy with it. Even so, I don't have supported hardware so I couldn't buy a contract until I bought another machine and I really have enough machines so I cannot justify the expense right now. And I refuse to believe Oracle would hold people hostage in a situation like this, but I do believe they could generate a lot of goodwill by fixing this for me and whoever else it happened to and telling us what level of Solaris 10 this is fixed at so this doesn't continue happening. It's a pretty serious failure and I'm not the only one who it happened to. It's incredible but in all the years I have been using computers I don't ever recall losing data due to a filesystem or OS issue. That includes DOS, Windows, Linux, etc. I cannot believe ZFS on Intel is so fragile that people lose hundreds of gigs of data and that's just the way it is. There must be a way to recover this data and some advice on preventing it from happening again. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
Unfortunately this panics the same exact way. Thanks for the suggestion though. - Original Message - From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 3:06:20 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! may be try the following 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris then choose single user mode(6)) 2)when ask to mount rpool just say no 3)mkdir /tmp/mnt1 /tmp/mnt2 4)zpool import -f -R /tmp/mnt1 tank 5)zpool import -f -R /tmp/mnt2 rpool ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
iirc if you use two hdd, you can import the zpool can you try to import -R with only two hdd at time Sent from my iPad Hung-Sheng Tsao ( LaoTsao) Ph.D On Aug 15, 2011, at 13:42, Stu Whitefish swhitef...@yahoo.com wrote: Unfortunately this panics the same exact way. Thanks for the suggestion though. - Original Message - From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 3:06:20 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! may be try the following 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris then choose single user mode(6)) 2)when ask to mount rpool just say no 3)mkdir /tmp/mnt1 /tmp/mnt2 4)zpool import -f -R /tmp/mnt1 tank 5)zpool import -f -R /tmp/mnt2 rpool ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
I am catching up here and wanted to see if I correctly understand the chain of events... 1. Install system to pair of mirrored disks (c0t2d0s0 c0t3d0s0), system works fine 2. add two more disks (c0t0d0s0 c0t1d0s0), create zpool tank, test and determine these disks are fine 3. copy data to save to rpool (c0t2d0s0 c0t3d0s0) 3. install OS to c0t0d0s0, c0t1d0s0 4. reboot, system still boots from old rpool (c0t2d0s0 c0t3d0s0) 5. change boot device and boot from new OS (c0t0d0s0 c0t1d0s0) 6. cannot import old rpool (c0t2d0s0 c0t3d0s0) with your data At this point could you still boot from the old rpool (c0t2d0s0 c0t3d0s0) ? something happens and 7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a kernel panic, even when booted from different OS versions Have you been using the same hardware for all of this ? -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Designer: Frankenstein, A New Musical (http://www.facebook.com/event.php?eid=123170297765140) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
I'm sorry, I don't understand this suggestion. The pool that won't import is a mirror on two drives. - Original Message - From: LaoTsao laot...@gmail.com To: Stu Whitefish swhitef...@yahoo.com Cc: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Sent: Monday, August 15, 2011 5:50:08 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! iirc if you use two hdd, you can import the zpool can you try to import -R with only two hdd at time Sent from my iPad Hung-Sheng Tsao ( LaoTsao) Ph.D On Aug 15, 2011, at 13:42, Stu Whitefish swhitef...@yahoo.com wrote: Unfortunately this panics the same exact way. Thanks for the suggestion though. - Original Message - From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 3:06:20 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! may be try the following 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris then choose single user mode(6)) 2)when ask to mount rpool just say no 3)mkdir /tmp/mnt1 /tmp/mnt2 4)zpool import -f -R /tmp/mnt1 tank 5)zpool import -f -R /tmp/mnt2 rpool ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
In message 1313431448.5331.yahoomail...@web121911.mail.ne1.yahoo.com, Stu Whi tefish writes: I'm sorry, I don't understand this suggestion. The pool that won't import is a mirror on two drives. Disconnect all but the two mirrored drives that you must import and try to import from a S11X LiveUSB. John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
Hi Paul, 1. Install system to pair of mirrored disks (c0t2d0s0 c0t3d0s0), system works fine I don't remember at this point which disks were which, but I believe it was 0 and 1 because during the first install there were only 2 drives in the box because I had only 2 drives. 2. add two more disks (c0t0d0s0 c0t1d0s0), create zpool tank, test and determine these disks are fine Again, probably was on disks 2 and 3 but in principle, correct. 3. copy data to save to rpool (c0t2d0s0 c0t3d0s0) I did this in a few steps that probably don't make sense because I had only 2 500G drives at the beginning when I did my install. Later I got two 320G and realized I should have the root pool on the smaller drives. But in the interim, I installed the new pair of 320G and moved a bunch of data onto that pool. After the initial installation when update 8 first came out, what happened next was something like: 1. I created tank mirror on the 2 320G drives and moved data from another system on to the tank. After I verified it was good I rebooted the box and checked again and everything was healthy, all pools were imported and mounted correctly. 2. Then I realized I should install on the 320s and use the 500s for storage so I copied everything I had just put on the 320s (tank) onto the 500s (root). I rebooted again and verified the data on root was good, then I deleted it from tank. 3. I installed a new install on the 320s (formerly tank) 4. I rebooted and it used my old root on the 500s as root, which surprised me but makes sense now because it was created as rpool during the very first install. 5. I rebooted in single user mode and tried to import the new install. It imported fine. 6. I don't know what happened next but I believe after that I rebooted again to see why Solaris didn't choose the new install, the tank pool could not be imported and I got the panic shown in the screenshot. 3. install OS to c0t0d0s0, c0t1d0s0 4. reboot, system still boots from old rpool (c0t2d0s0 c0t3d0s0) Correct. At some point I read you can change the name of the pool so I imported rpool as tank and that much worked. At this point both pools were still good, and now the install was correctly called rpool and my tank was called tank. 5. change boot device and boot from new OS (c0t0d0s0 c0t1d0s0) That was the surprising thing. I had already changed my BIOS to boot from the new pool, but that didn't stop Solaris from using the old install as the root pool, I guess because of the name. I thought originally as long as I specified the correct boot device I wouldn't have any problem, but even taking the old rpool out of the boot sequence and specifying only the newly installed pool as boot devices wasn't enough. 6. cannot import old rpool (c0t2d0s0 c0t3d0s0) with your data At this point could you still boot from the old rpool (c0t2d0s0 c0t3d0s0) ? Yes, I could use the newly installed pool to boot from, or import it from shell in several versions of Solaris/Sol 11, etc. Of course now I cannot, since I have installed so many times over that pool trying to get the other pool imported. something happens and 7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a kernel panic, even when booted from different OS versions Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest from Oracle) several times each as well as 2 new installs of Update 8. Have you been using the same hardware for all of this ? Yes, I have. Thanks for the help, Jim Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
Given I can boot to single user mode and elect not to import or mount any pools, and that later I can issue an import against only the pool I need, I don't understand how this can help. Still, given that nothing else seems to help I will try this and get back to you tomorrow. Thanks, Jim - Original Message - From: John D Groenveld jdg...@elvis.arl.psu.edu To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 6:12:37 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! In message 1313431448.5331.yahoomail...@web121911.mail.ne1.yahoo.com, Stu Whi tefish writes: I'm sorry, I don't understand this suggestion. The pool that won't import is a mirror on two drives. Disconnect all but the two mirrored drives that you must import and try to import from a S11X LiveUSB. John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
Hello Stu Whitefish and List, On August, 15 2011, 21:17 Stu Whitefish wrote in [1]: 7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a kernel panic, even when booted from different OS versions Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest from Oracle) several times each as well as 2 new installs of Update 8. When I understand you right is your primary interest to recover your data on tank pool. Have you check the way to boot from a Live-DVD, mount your safe place and copy the data on a other machine? -- Best Regards Alexander August, 15 2011 [1] mid:1313435871.14520.yahoomail...@web121919.mail.ne1.yahoo.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! assertion failed: zvol_get_stats(os, nv) == 0
System: snv_151a 64 bit on Intel. Error: panic[cpu0] assertion failed: zvol_get_stats(os, nv) == 0, file: ../../common/fs/zfs/zfs_ioctl.c, line: 1815 Failure first seen on Solaris 10, update 8 History: I recently received two 320G drives and realized from reading this list it would have been better if I would have done the install on the small drives but I didn't have them at the time. I added the two 320G drives and created tank mirror. I moved some data from other sources to the tank and then decided to go ahead and do a new install. In preparation for that I moved all the data I wanted to save onto the rpool mirror and then installed Solaris 10 update 8 again on the 320G drives. When my system rebooted after the installation, I saw for some reason it used my tank pool as root. I realize now since it was originally a root pool and had boot blocks this didn't help. Anyway I shut down, changed the boot order and then booted into my system. It paniced when trying to access the tank and instantly rebooted. I had to go through this several times until I caught a glimpse of one of the first messages: assertion failed: zvol_get_stats(os, nv) Here is what my system looks like when I boot into failsafe mode. # zpool import pool: rpool id: 16453600103421700325 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: rpool ONLINE mirror ONLINE c0t2d0s0 ONLINE c0t3d0s0 ONLINE pool: tank id: 12861119534757646169 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: tank ONLINE mirror ONLINE c0t0d0s0 ONLINE c0t1d0s0 ONLINE # zpool import tank cannot import 'tank': pool may be in use from other system use '-f' to import anyway I installed Solaris 11 Express USB via Hiroshi-san's Windows tool. Unfortunately it also panics trying to import the pool although zpool import shows the pool online with no errors just like in the above doc. http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ and here is an eerily identical photo capture made by somebody with a similar/identical error. http://prestonconnors.com/zvol_get_stats.jpg At first I thought it was a copy of my screenshot but I see his terminal is white and mine is black. Looks like the problem has been around since 2009 although my problem is with a newly created mirror pool that had plenty of space available (200G in use out of about 500G) and no snapshots were taken. Similar discussion with discouraging lack of follow up: http://opensolaris.org/jive/message.jspa?messageID=376366 Looks like the defect, it's closed and I see no resolution. https://defect.opensolaris.org/bz/show_bug.cgi?id=5682 I have about 200G of data on the tank pool, about 100G or so I don't have anywhere else. I created this pool specifically to make a safe place to store data that I had accumulated over several years and didn't have organized yet. I can't believe such a serious bug has been around for two years and hasn't been fixed. Can somebody please help me get this data back? Thank you. Jim I joined the forums but I didn't see my post on zfs-discuss mailing list which seems alot more active than the forum. Sorry if this is a duplicate for people on the mailing list. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
I am opening a new thread since I found somebody else reported a similar failure in May and I didn't see a resolution hopefully this post will be easier to find for people with similar problems. Original thread was http://opensolaris.org/jive/thread.jspa?threadID=140861 System: snv_151a 64 bit on Intel. Error: panic[cpu0] assertion failed: zvol_get_stats(os, nv) == 0, file: ../../common/fs/zfs/zfs_ioctl.c, line: 1815 Failure first seen on Solaris 10, update 8 History: I recently received two 320G drives and realized from reading this list it would have been better if I would have done the install on the small drives but I didn't have them at the time. I added the two 320G drives and created tank mirror. I moved some data from other sources to the tank and then decided to go ahead and do a new install. In preparation for that I moved all the data I wanted to save onto the rpool mirror and then installed Solaris 10 update 8 again on the 320G drives. When my system rebooted after the installation, I saw for some reason it used my tank pool as root. I realize now since it was originally a root pool and had boot blocks this didn't help. Anyway I shut down, changed the boot order and then booted into my system. It paniced when trying to access the tank and instantly rebooted. I had to go through this several times until I caught a glimpse of one of the first messages: assertion failed: zvol_get_stats(os, nv) Here is what my system looks like when I boot into failsafe mode. # zpool import pool: rpool id: 16453600103421700325 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: rpool ONLINE mirror ONLINE c0t2d0s0 ONLINE c0t3d0s0 ONLINE pool: tank id: 12861119534757646169 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: tank ONLINE mirror ONLINE c0t0d0s0 ONLINE c0t1d0s0 ONLINE # zpool import tank cannot import 'tank': pool may be in use from other system use '-f' to import anyway Here is a photo of my screen (hah hah old fashioned screen shot) when Sol 11 starts now that I tried importing my pool it fails constantly. # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I installed Solaris 11 Express USB via Hiroshi-san's Windows tool. Unfortunately it also panics trying to import the pool although zpool import shows the pool online with no errors just like in the above doc. and here is an eerily identical photo capture made by somebody with a similar/identical error. http://prestonconnors.com/zvol_get_stats.jpg At first I thought it was a copy of my screenshot but I see his terminal is white and mine is black. Looks like the problem has been around since 2009 although my problem is with a newly created mirror pool that had plenty of space available (200G in use out of about 500G) and no snapshots were taken. Similar discussion with discouraging lack of follow up: http://opensolaris.org/jive/message.jspa?messageID=376366 Looks like the defect, it's closed and I see no resolution. https://defect.opensolaris.org/bz/show_bug.cgi?id=5682 I have about 200G of data on the tank pool, about 100G or so I don't have anywhere else. I created this pool specifically to make a safe place to store data that I had accumulated over several years and didn't have organized yet. I can't believe such a serious bug has been around for two years and hasn't been fixed. Can somebody please help me get this data back? Thank you. Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish swhitef...@yahoo.com wrote: # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I encourage you to open a support case and ask for an escalation on CR 7056738. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on USB disk power loss
On Jan 15, 2011, at 10:33 AM, Reginald Beardsley wrote: I was copying a filesystem using zfs send | zfs receive and inadvertently unplugged the power to the USB disk that was the destination. Much to my horror this caused the system to panic. I recovered fine on rebooting, but it *really* unnerved me. I don't find anything about this online. I would expect it would trash the copy operation, but the panic seemed a bit extreme. It's an Ultra 20 running Solaris 10 Generic_137112-02 I've got a copy of U8 I'm planning to install as the U9 license seems to prohibit my using it. Suggestions? I'd like to understand what happened and why the system went down. Long, long ago the default failure mode for failed writes was panic. This was changed for several years ago with the introduction of the failmode property. Since ZFS is ported to Solaris 10, perhaps the failmode property is not available until you upgrade? To see: zpool get all poolname If there is no failmode property, then upgrade. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on USB disk power loss
--- On Wed, 1/19/11, Richard Elling richard.ell...@gmail.com wrote: From: Richard Elling richard.ell...@gmail.com Subject: Re: [zfs-discuss] kernel panic on USB disk power loss To: Reginald Beardsley pulask...@yahoo.com Cc: zfs-discuss@opensolaris.org Date: Wednesday, January 19, 2011, 8:59 AM On Jan 15, 2011, at 10:33 AM, Reginald Beardsley wrote: I was copying a filesystem using zfs send | zfs receive and inadvertently unplugged the power to the USB disk that was the destination. Much to my horror this caused the system to panic. I recovered fine on rebooting, but it *really* unnerved me. I don't find anything about this online. I would expect it would trash the copy operation, but the panic seemed a bit extreme. It's an Ultra 20 running Solaris 10 Generic_137112-02 I've got a copy of U8 I'm planning to install as the U9 license seems to prohibit my using it. Suggestions? I'd like to understand what happened and why the system went down. Long, long ago the default failure mode for failed writes was panic. This was changed for several years ago with the introduction of the failmode property. Since ZFS is ported to Solaris 10, perhaps the failmode property is not available until you upgrade? To see: zpool get all poolname If there is no failmode property, then upgrade. -- richard Thanks. That probably explains it. The last update on the system was before ZFS root was available. I'm in the long delayed process of upgrading to U8 and taking my main network offline w/ just a minimal system connected to the Internet. The browsers are just too vulnerable. Eventually I'll migrate to OpenIndiana for all my Solaris instances. But for now mirrored ZFS on U8 will have to do since U9 is So Larry's. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] kernel panic on USB disk power loss
I was copying a filesystem using zfs send | zfs receive and inadvertently unplugged the power to the USB disk that was the destination. Much to my horror this caused the system to panic. I recovered fine on rebooting, but it *really* unnerved me. I don't find anything about this online. I would expect it would trash the copy operation, but the panic seemed a bit extreme. It's an Ultra 20 running Solaris 10 Generic_137112-02 I've got a copy of U8 I'm planning to install as the U9 license seems to prohibit my using it. Suggestions? I'd like to understand what happened and why the system went down. Thanks, Reg -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic after upgrading from snv_138 to snv_140
Hi, my machine is a HP ProLiant ML350 G5 with 2 quad-core Xeons, 32GB RAM and a HP SmartArray E200i RAID controller with 3x160 and 3x500GB SATA discs connected to it. Two of the 160GB discs build the mirrored root pool (rpool), the third serves as a temporary data pool called tank, and the three 500G discs form a RAIDZ1 pool called daten. So far I successfully upgraded from OpenSolaris b134 to b138 by manually building ONNV. Recently I built b140, installed it, but unfortunately booting results in a kernel panic: ... NOTICE: zfs_parse_bootfs: error 22 Cannot mount root on rpool/187 fstype zfs panic[cpu0]/thread=fbc2f660: vfs_mountroot: cannot mount root fbc71ba0 genunix:vfs_mountroot+32e () fbc71bd0 genunix:main+136 () fbc71be0 unix:_locore_start+92 () panic: entering debugger (no dump device, continue to reboot) Welcome to kmdb Loaded modules: [ scsi_vhci mac uppc sd unix zfs krtld genunix specfs pcplusmp cpu.generic ] [0] Before the above attempt with b140, I tried to upgrade to OpenIndiana, but have quite the same problem; OI doesn't boot neither. See http://openindiana.org/pipermail/openindiana-discuss/2010-September/000504.html Any ideas what is causing this kernel panic? Regards Thorsten -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?
Brilliant. I set those parameters via /etc/system, rebooted, and the pool imported with just the f switch. I had seen this as an option earlier, although not that thread, but was not sure it applied to my case. Scrub is running now. Thank you very much! -Scott On 9/23/10 7:07 PM, David Blasingame Oracle david.blasing...@oracle.com wrote: Have you tried setting zfs_recover aok in /etc/system or setting it with the mdb? Read how to set via /etc/system http://opensolaris.org/jive/thread.jspa?threadID=114906 mdb debugger http://www.listware.net/201009/opensolaris-zfs/46706-re-zfs-discuss-how-to-set -zfszfsrecover1-and-aok1-in-grub-at-startup.html After you get the variables set and system booted, try importing, then running a scrub. Dave On 09/23/10 19:48, Scott Meilicke wrote: I posted this on the www.nexentastor.org http://www.nexentastor.org forums, but no answer so far, so I apologize if you are seeing this twice. I am also engaged with nexenta support, but was hoping to get some additional insights here. I am running nexenta 3.0.3 community edition, based on 134. The box crashed yesterday, and goes into a reboot loop (kernel panic) when trying to import my data pool, screenshot attached. What I have tried thus far: Boot off of DVD, both 3.0.3 and 3.0.4 beta 8. 'zpool import -f data01' causes the panic in both cases. Boot off of 3.0.4 beta 8, ran zpool import -fF data01 That gives me a message like Pool data01 returned to its stat as of ..., and then panics. The import -fF does seem to import the pool, but then immediately panic. So after booting off of DVD, I can boot from my hard disks, and the system will not import the pool because it was last imported from another system. I have moved /etc/zfs/zfs.cache out of the way, but no luck after a reboot and import. zpool import shows all of my disks are OK, and the pool itself is online. Is it time to start working with zdb? Any suggestions? This box is hosting development VMs, so I have some people idling their thumbs at the moment. Thanks everyone, -Scott ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?
I just realized that the email I sent to David and the list did not make the list (at least as jive can see it), so here is what I sent on the 23rd: Brilliant. I set those parameters via /etc/system, rebooted, and the pool imported with just the –f switch. I had seen this as an option earlier, although not that thread, but was not sure it applied to my case. Scrub is running now. Thank you very much! -Scott Update: The scrub finished with zero errors. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?
Have you tried setting zfs_recover aok in /etc/system or setting it with the mdb? Read how to set via /etc/system http://opensolaris.org/jive/thread.jspa?threadID=114906 mdb debugger http://www.listware.net/201009/opensolaris-zfs/46706-re-zfs-discuss-how-to-set-zfszfsrecover1-and-aok1-in-grub-at-startup.html After you get the variables set and system booted, try importing, then running a scrub. Dave On 09/23/10 19:48, Scott Meilicke wrote: I posted this on the www.nexentastor.org forums, but no answer so far, so I apologize if you are seeing this twice. I am also engaged with nexenta support, but was hoping to get some additional insights here. I am running nexenta 3.0.3 community edition, based on 134. The box crashed yesterday, and goes into a reboot loop (kernel panic) when trying to import my data pool, screenshot attached. What I have tried thus far: Boot off of DVD, both 3.0.3 and 3.0.4 beta 8. 'zpool import -f data01' causes the panic in both cases. Boot off of 3.0.4 beta 8, ran zpool import -fF data01 That gives me a message like Pool data01 returned to its stat as of ..., and then panics. The import -fF does seem to import the pool, but then immediately panic. So after booting off of DVD, I can boot from my hard disks, and the system will not import the pool because it was last imported from another system. I have moved /etc/zfs/zfs.cache out of the way, but no luck after a reboot and import. zpool import shows all of my disks are OK, and the pool itself is online. Is it time to start working with zdb? Any suggestions? This box is hosting development VMs, so I have some people idling their thumbs at the moment. Thanks everyone, -Scott ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic on import / interrupted zfs destroy
I have a box running snv_134 that had a little boo-boo. The problem first started a couple of weeks ago with some corruption on two filesystems in a 11 disk 10tb raidz2 set. I ran a couple of scrubs that revealed a handful of corrupt files on my 2 de-duplicated zfs filesystems. No biggie. I thought that my problems had something to do with de-duplication in 134, so I went about the process of creating new filesystems and copying over the good files to another box. Every time I touched the bad files I got a filesystem error 5. When trying to delete them manually, I got kernel panics - which eventually turned into reboot loops. I tried installing nexenta on another disk to see if that would allow me to get passed the reboot loop - which it did. I finished moving the good files over (using rsync, which skipped over the error 5 files, unlike cp or mv), and destroyed one of the two filesystems. Unfortunately, this caused a kernel panic in the middle of the destroy operation, which then became another panic / reboot loop. I was able to get in with milestone=none and delete the zfs cache, but now I have a new problem: Any attempt to import the pool results in a panic. I have tried from my snv_134 install, from the live cd, and from nexenta. I have tried various zdb incantations (with aok=1 and zfs:zfs_recover=1), to no avail - these error out after a few minutes. I have even tried another controller. I have zdb -e -bcsvL running now from 134 (without aok=1) which has been running for several hours. Can zdb recover from this kind of situation (with a half-destroyed filesystem that panics the kernel on import?) What is the impact of the above zdb operation without aok=1? Is there any likelihood of a recovery of non-affected filesystems? Any suggestions? Regards, Matthew Ellison ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
On Jul 9, 2010, at 4:27 AM, George wrote: I think it is quite likely to be possible to get readonly access to your data, but this requires modified ZFS binaries. What is your pool version? What build do you have installed on your system disk or available as LiveCD? For the record - using ZFS readonly import code backported to build 134 and slightly modified to account for specific corruptions of this case we've been able to import pool in readonly mode and George is now backing up his data. As soon as that completes I hope to have a chance to have another look into it to see what else we can learn from this case. regards victor [Prompted by an off-list e-mail from Victor asking if I was still having problems] Thanks for your reply, and apologies for not having replied here sooner - I was going to try something myself (which I'll explain shortly) but have been hampered by a flakey cdrom drive - something I won't have chance to sort until the weekend. In answer to your question the installed system is running 2009.06 (b111b) and the LiveCD I've been using is b134. The problem with the Installed system crashing when I tried to run zpool clean I believe is being caused by http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 which makes me think that the same command run from a later version should work fine. I haven't had any success doing this though and I believe the reason is that several of the ZFS commands won't work if the hostid of the machine to last access the pool is different from the current system (and the pool is exported/faulted), as happens when using a LiveCD. Where I was getting errors about storage2 does not exist I found it was writing errors to the syslog that the pool could not be loaded as it was last accessed by another system. I tried to get round this using the Dtrace hostid changing script I mentioned in one of my earlier messages but this seemed not to be able to fool system processes. I also tried exporting the pool from the Installed system to see if that would help but unfortunately it didn't. After having exported the pool zfs import run on the Installed system reported The pool can be imported despite missing or damaged devices. however when trying to import it (with or without -f) it refused to import it as one or more devices is currently unavailable. When booting the LiveCD after having exported the pool it still gave errors about having been last accessed by another system. I couldn't spot any method of modifying the LiveCD image to have a particular hostid so my plan therefore has been to try installing b134 onto the system, setting the hostid under /etc and seeing if things then behaved in a more straightforward fashion, which I haven't managed yet due to the cdrom problems. I also mentioned in one of my earlier e-mails that I was confused that the Installed system mentioned an unreadable intent log but the LiveCD said the problem was corrupted metadata. This seems to be caused by the functions print_import_config and print_statement_config having slightly different case statements and not a difference in the pool itself. Hopefully I'll be able to complete the reinstall soon and see if that fixes things or there's a deeper problem. Thanks again for your help, George -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
I think it is quite likely to be possible to get readonly access to your data, but this requires modified ZFS binaries. What is your pool version? What build do you have installed on your system disk or available as LiveCD? [Prompted by an off-list e-mail from Victor asking if I was still having problems] Thanks for your reply, and apologies for not having replied here sooner - I was going to try something myself (which I'll explain shortly) but have been hampered by a flakey cdrom drive - something I won't have chance to sort until the weekend. In answer to your question the installed system is running 2009.06 (b111b) and the LiveCD I've been using is b134. The problem with the Installed system crashing when I tried to run zpool clean I believe is being caused by http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 which makes me think that the same command run from a later version should work fine. I haven't had any success doing this though and I believe the reason is that several of the ZFS commands won't work if the hostid of the machine to last access the pool is different from the current system (and the pool is exported/faulted), as happens when using a LiveCD. Where I was getting errors about storage2 does not exist I found it was writing errors to the syslog that the pool could not be loaded as it was last accessed by another system. I tried to get round this using the Dtrace hostid changing script I mentioned in one of my earlier messages but this seemed not to be able to fool system processes. I also tried exporting the pool from the Installed system to see if that would help but unfortunately it didn't. After having exported the pool zfs import run on the Installed system reported The pool can be imported despite missing or damaged devices. however when trying to import it (with or without -f) it refused to import it as one or more devices is currently unavailable. When booting the LiveCD after having exported the pool it still gave errors about having been last accessed by another system. I couldn't spot any method of modifying the LiveCD image to have a particular hostid so my plan therefore has been to try installing b134 onto the system, setting the hostid under /etc and seeing if things then behaved in a more straightforward fashion, which I haven't managed yet due to the cdrom problems. I also mentioned in one of my earlier e-mails that I was confused that the Installed system mentioned an unreadable intent log but the LiveCD said the problem was corrupted metadata. This seems to be caused by the functions print_import_config and print_statement_config having slightly different case statements and not a difference in the pool itself. Hopefully I'll be able to complete the reinstall soon and see if that fixes things or there's a deeper problem. Thanks again for your help, George -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
On Jul 3, 2010, at 1:20 PM, George wrote: Because of that I'm thinking that I should try to change the hostid when booted from the CD to be the same as the previously installed system to see if that helps - unless that's likely to confuse it at all...? I've now tried changing the hostid using the code from http://forums.sun.com/thread.jspa?threadID=5075254 NB: you need to leave this running in a separate terminal. This changes the start of zpool import to pool: storage2 id: 14701046672203578408 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-72 but otherwise nothing is changed with respect to trying to import or clear the pool. The pool is 8TB and the machine has 4GB but as far as I can see via top the commands aren't failing due to a lack of memory. I'm a bit stumped now. The only thing else I can think to try is inserting c9t4d4 (the new drive) and removing c6t4d0 (which should be fine). The problem with this though is that it relies on c7t4d0 (which is faulty) and so it assumes that the errors can be cleared, the replace stopped and the drives swapped back before further errors happen. I think it is quite likely to be possible to get readonly access to your data, but this requires modified ZFS binaries. What is your pool version? What build do you have installed on your system disk or available as LiveCD? regards victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
I think it is quite likely to be possible to get readonly access to your data, but this requires modified ZFS binaries. What is your pool version? What build do you have installed on your system disk or available as LiveCD? Sorry, but does this mean if ZFS can't write to the drives, access to the pool won't be possible? If so, that's rather scary... Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
On Jun 28, 2010, at 11:27 PM, George wrote: Again this core dumps when I try to do zpool clear storage2 Does anyone have any suggestions what would be the best course of action now? Do you have any crahsdumps saved? First one is most interesting one... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
Because of that I'm thinking that I should try to change the hostid when booted from the CD to be the same as the previously installed system to see if that helps - unless that's likely to confuse it at all...? I've now tried changing the hostid using the code from http://forums.sun.com/thread.jspa?threadID=5075254 NB: you need to leave this running in a separate terminal. This changes the start of zpool import to pool: storage2 id: 14701046672203578408 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-72 but otherwise nothing is changed with respect to trying to import or clear the pool. The pool is 8TB and the machine has 4GB but as far as I can see via top the commands aren't failing due to a lack of memory. I'm a bit stumped now. The only thing else I can think to try is inserting c9t4d4 (the new drive) and removing c6t4d0 (which should be fine). The problem with this though is that it relies on c7t4d0 (which is faulty) and so it assumes that the errors can be cleared, the replace stopped and the drives swapped back before further errors happen. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
I think I'll try booting from a b134 Live CD and see that will let me fix things. Sadly it appears not - at least not straight away. Running zpool import now gives pool: storage2 id: 14701046672203578408 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: storage2 FAULTED corrupted data raidz1-0 FAULTED corrupted data c6t4d2 ONLINE c6t4d3 ONLINE c7t4d2 ONLINE c7t4d3 ONLINE raidz1-1 FAULTED corrupted data c7t4d0 ONLINE replacing-1 UNAVAIL insufficient replicas c6t4d0 FAULTED corrupted data c9t4d4 UNAVAIL cannot open c7t4d1 ONLINE c6t4d1 ONLINE If I do zpool import -f storage2 it complains about devices being faulted and suggests destroying the pool. If I do zpool clean storage2 or zpool clean storage2 c9t4d4 these say that storage2 does not exist. If I do zpool import -nF storage2 this says that the pool was last run on another system and prompts for -f. if I do zpool import -fnF storage2 this appears to quit silently. I don't really understand why the installed system is very specific about the problem being with the intent log (and suggesting it just needs clearing) but booting from the b134 CD doesn't pick up on that, unless it's being masked by the hostid mismatch error. Because of that I'm thinking that I should try to change the hostid when booted from the CD to be the same as the previously installed system to see if that helps - unless that's likely to confuse it at all...? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
I suggest you to try running 'zdb -bcsv storage2' and show the result. r...@crypt:/tmp# zdb -bcsv storage2 zdb: can't open storage2: No such device or address then I tried r...@crypt:/tmp# zdb -ebcsv storage2 zdb: can't open storage2: File exists George -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
On Jun 30, 2010, at 10:48 AM, George wrote: I suggest you to try running 'zdb -bcsv storage2' and show the result. r...@crypt:/tmp# zdb -bcsv storage2 zdb: can't open storage2: No such device or address then I tried r...@crypt:/tmp# zdb -ebcsv storage2 zdb: can't open storage2: File exists Please try zdb -U /dev/null -ebcsv storage2 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
Please try zdb -U /dev/null -ebcsv storage2 r...@crypt:~# zdb -U /dev/null -ebcsv storage2 zdb: can't open storage2: No such device or address If I try r...@crypt:~# zdb -C storage2 Then it prints what appears to be a valid configuration but then the same error message about being unable to find the device (output attached). George -- This message posted from opensolaris.orgr...@crypt:~# zdb -C storage2 version=14 name='storage2' state=0 txg=1807366 pool_guid=14701046672203578408 hostid=8522651 hostname='crypt' vdev_tree type='root' id=0 guid=14701046672203578408 children[0] type='raidz' id=0 guid=15861342641545291969 nparity=1 metaslab_array=14 metaslab_shift=35 ashift=9 asize=3999672565760 is_log=0 children[0] type='disk' id=0 guid=14390766171745861103 path='/dev/dsk/c9t4d2s0' devid='id1,s...@n600d0230006c8a5f0c3fd863ea736d00/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,2:a' whole_disk=1 DTL=301 children[1] type='disk' id=1 guid=14806610527738068493 path='/dev/dsk/c9t4d3s0' devid='id1,s...@n600d0230006c8a5f0c3fd8514ed8d900/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,3:a' whole_disk=1 DTL=300 children[2] type='disk' id=2 guid=4272121319363331595 path='/dev/dsk/c10t4d2s0' devid='id1,s...@n600d0230006c8a5f0c3fd84312aa6d00/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,2:a' whole_disk=1 DTL=299 children[3] type='disk' id=3 guid=16286569401176941639 path='/dev/dsk/c10t4d4s0' devid='id1,s...@n600d0230006c8a5f0c3fd8415c62ae00/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,4:a' whole_disk=1 DTL=296 children[1] type='raidz' id=1 guid=12601468074885676119 nparity=1 metaslab_array=172 metaslab_shift=35 ashift=9 asize=3999672565760 is_log=0 children[0] type='disk' id=0 guid=7040280703157905854 path='/dev/dsk/c10t4d0s0' devid='id1,s...@n600d0230006c8a5f0c3fd83eda0a4a00/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,0:a' whole_disk=1 DTL=305 children[1] type='replacing' id=1 guid=16928413524184799719 whole_disk=0 children[0] type='disk' id=0 guid=9102173991259789741 path='/dev/dsk/c9t4d0s0' devid='id1,s...@n600d0230006c8a5f0c3fd86eee69a300/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,0:a' whole_disk=1 DTL=304 children[1] type='disk' id=1 guid=16888611779137638814 path='/dev/dsk/c9t4d4s0' devid='id1,s...@n600d0230006c8a5f0c3fd8612edc7d00/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,4:a' whole_disk=1 DTL=321 children[2] type='disk' id=2 guid=4025009484028197162 path='/dev/dsk/c10t4d1s0' devid='id1,s...@n600d0230006c8a5f0c3fd8609d147700/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,1:a' whole_disk=1 DTL=303 children[3]
Re: [zfs-discuss] Kernel Panic on zpool clean
Aha: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 I think I'll try booting from a b134 Live CD and see that will let me fix things. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
Another related question - I have a second enclosure with blank disks which I would like to use to take a copy of the existing zpool as a precaution before attempting any fixes. The disks in this enclosure are larger than those that the one with a problem. What would be the best way to do this? If I were to clone the disks 1:1 would the difference in size cause any problems? I also had an idea that I might be able to DD the original disks into files on a ZFS on the second enclosure and mount the files but the few results I've turned up on the subject seem to say this is a bad idea. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
On Jun 29, 2010, at 1:30 AM, George wrote: I've attached the output of those commands. The machine is a v20z if that makes any difference. Stack trace is similar to one bug that I do not recall right now, and it indicates that there's likely a corruption in ZFS metadata. I suggest you to try running 'zdb -bcsv storage2' and show the result. victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool status -v (build 143)
I ran 'zpool scrub' and will report what happens once it's finished. (It will take pretty long.) The scrub finished successfully (with no errors) and 'zpool status -v' doesn't crash the kernel any more. Andrej smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel Panic on zpool clean
Hi, I have a machine running 2009.06 with 8 SATA drives in SCSI connected enclosure. I had a drive fail and accidentally replaced the wrong one, which unsurprisingly caused the rebuild to fail. The status of the zpool then ended up as: pool: storage2 state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAME STATE READ WRITE CKSUM storage2 FAULTED 0 0 1 bad intent log raidz1 ONLINE 0 0 0 c9t4d2 ONLINE 0 0 0 c9t4d3 ONLINE 0 0 0 c10t4d2ONLINE 0 0 0 c10t4d4ONLINE 0 0 0 raidz1 DEGRADED 0 0 6 c10t4d0UNAVAIL 0 0 0 cannot open replacing ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 c10t4d3 ONLINE 0 0 0 c10t4d1ONLINE 0 0 0 c9t4d1 ONLINE 0 0 0 running zpool clear storage2 caused the machine to dump and reboot. I've tried removing the spare and putting back the faulty drive to give: pool: storage2 state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAME STATE READ WRITE CKSUM storage2 FAULTED 0 0 1 bad intent log raidz1 ONLINE 0 0 0 c9t4d2 ONLINE 0 0 0 c9t4d3 ONLINE 0 0 0 c10t4d2ONLINE 0 0 0 c10t4d4ONLINE 0 0 0 raidz1 DEGRADED 0 0 6 c10t4d0FAULTED 0 0 0 corrupted data replacing DEGRADED 0 0 0 c9t4d0 ONLINE 0 0 0 c9t4d4 UNAVAIL 0 0 0 cannot open c10t4d1ONLINE 0 0 0 c9t4d1 ONLINE 0 0 0 Again this core dumps when I try to do zpool clear storage2 Does anyone have any suggestions what would be the best course of action now? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
On Jun 28, 2010, at 11:27 PM, George wrote: I've tried removing the spare and putting back the faulty drive to give: pool: storage2 state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAME STATE READ WRITE CKSUM storage2 FAULTED 0 0 1 bad intent log raidz1 ONLINE 0 0 0 c9t4d2 ONLINE 0 0 0 c9t4d3 ONLINE 0 0 0 c10t4d2ONLINE 0 0 0 c10t4d4ONLINE 0 0 0 raidz1 DEGRADED 0 0 6 c10t4d0FAULTED 0 0 0 corrupted data replacing DEGRADED 0 0 0 c9t4d0 ONLINE 0 0 0 c9t4d4 UNAVAIL 0 0 0 cannot open c10t4d1ONLINE 0 0 0 c9t4d1 ONLINE 0 0 0 Again this core dumps when I try to do zpool clear storage2 Does anyone have any suggestions what would be the best course of action now? I think first we need to understand why it does not like 'zpool clear', as that may provide better understanding of what is wrong. For that you need to create directory for saving crashdumps e.g. like this mkdir -p /var/crash/`uname -n` then run savecore and see if it would save a crash dump into that directory. If crashdump is there, then you need to perform some basic investigation: cd /var/crash/`uname -n` mdb dump number ::status ::stack ::spa -c ::spa -v ::spa -ve $q for a start. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic on zpool clean
I've attached the output of those commands. The machine is a v20z if that makes any difference. Thanks, George -- This message posted from opensolaris.orgmdb: logging to debug.txt ::status debugging crash dump vmcore.0 (64-bit) from crypt operating system: 5.11 snv_111b (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=ff00084fc660 addr=0 occurred in module unix due to a NULL pointer dereference dump content: kernel pages only ::stack mutex_enter+0xb() metaslab_free+0x12e(ff01c9fb3800, ff01cce64668, 1b9528, 0) zio_dva_free+0x26(ff01cce64608) zio_execute+0xa0(ff01cce64608) zio_nowait+0x5a(ff01cce64608) arc_free+0x197(ff01cf0c80c0, ff01c9fb3800, 1b9528, ff01d389bcf0, 0, 0) dsl_free+0x30(ff01cf0c80c0, ff01d389bcc0, 1b9528, ff01d389bcf0, 0, 0 ) dsl_dataset_block_kill+0x293(0, ff01d389bcf0, ff01cf0c80c0, ff01d18cfd80) dmu_objset_sync+0xc4(ff01cffe0080, ff01cf0c80c0, ff01d18cfd80) dsl_pool_sync+0x1ee(ff01d389bcc0, 1b9528) spa_sync+0x32a(ff01c9fb3800, 1b9528) txg_sync_thread+0x265(ff01d389bcc0) thread_start+8() ::spa -c ADDR STATE NAME ff01c8df3000ACTIVE rpool version=000e name='rpool' state= txg=056a6ad1 pool_guid=53825ef3c58abc97 hostid=00820b9b hostname='crypt' vdev_tree type='root' id= guid=53825ef3c58abc97 children[0] type='mirror' id= guid=e9b8daed37492cfe whole_disk= metaslab_array=0017 metaslab_shift=001d ashift=0009 asize=001114e0 is_log= children[0] type='disk' id= guid=ad7e5022f804365a path='/dev/dsk/c8t0d0s0' devid='id1,s...@sseagate_st373307lc__3hz76yyd743809wm/a' phys_path='/p...@0,0/pci1022,7...@a/pci17c2,1...@4/s...@0,0:a' whole_disk= DTL=0052 children[1] type='disk' id=0001 guid=2f7a03c75a4931ac path='/dev/dsk/c8t1d0s0' devid='id1,s...@sseagate_st373307lc__3hz80bdp743793pa/a' phys_path='/p...@0,0/pci1022,7...@a/pci17c2,1...@4/s...@1,0:a' whole_disk= DTL=0050 ff01c9fb3800ACTIVE storage2 version=000e name='storage2' state= txg=001b9406 pool_guid=cc049c0f1321fc28 hostid=00820b9b hostname='crypt' vdev_tree type='root' id= guid=cc049c0f1321fc28 children[0] type='raidz' id= guid=dc1ecf18721028c1 nparity=0001 metaslab_array=000e metaslab_shift=0023 ashift=0009 asize=03a33f10 is_log= children[0] type='disk' id= guid=c7b64596709ebdef path='/dev/dsk/c9t4d2s0' devid='id1,s...@n600d0230006c8a5f0c3fd863ea736d00/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,2:a' whole_disk=0001 DTL=012d children[1] type='disk' id=0001 guid=cd7ba5d38162fe0d path='/dev/dsk/c9t4d3s0' devid='id1,s...@n600d0230006c8a5f0c3fd8514ed8d900/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,3:a' whole_disk=0001 DTL=012c children[2] type='disk' id=0002 guid=3b499fb48e06460b path='/dev/dsk/c10t4d2s0' devid='id1,s...@n600d0230006c8a5f0c3fd84312aa6d00/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,2:a' whole_disk=0001 DTL=012b children[3] type='disk' id=0003 guid=e205849496e5e447 path='/dev/dsk/c10t4d4s0' devid='id1,s...@n600d0230006c8a5f0c3fd8415c62ae00/a' phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,4:a' whole_disk=0001 DTL=0128 children[1] type='raidz'
[zfs-discuss] Kernel panic on zpool status -v (build 143)
Hello, I got a zfs panic on build 143 (installed with onu) in the following unusual situation: 1) 'zpool scrub' found a corrupted snapshot on which two BEs were based. 2) I removed the first dependency with 'zfs promote'. 3) I removed the second dependency with 'zfs -pv send ... | zfs -v receive ...' 4) 'zfs destroy' said dataset busy when called on the old snapshot. So I rebooted. 5) After the reboot, the corrupted snapshot could be successfully destroyed. 6) One dataset and two other snapshots created on the way (in (3)) were removed. 7) Now 'zpool status -v' *crashed* the kernel. 8) After a reboot, 'zpool status -v' caused a crash again. I ran 'zpool scrub' and will report what happens once it's finished. (It will take pretty long.) An mdb session output is attached to this message. I can provide the full crash dump if you wish. (As for the ::stack at the end, I'm not sure if it's meaningful. This is (unfortunately) not a debugging kernel, so the first 6 arguments should not be stored on the stack.) Andrej ::status debugging crash dump vmcore.5 (64-bit) from helium operating system: 5.11 osnet143 (i86pc) panic message: assertion failed: 0 == dmu_bonus_hold(os, object, dl, dl-dl_dbuf) (0x0 == 0x16), file: ../../common/fs/zfs/dsl_deadlist.c, line: 80 dump content: kernel pages only ::msgbuf ! tail -21 panic[cpu4]/thread=ff02d59540a0: assertion failed: 0 == dmu_bonus_hold(os, object, dl, dl-dl_dbuf) (0x0 == 0x16), file: ../../common/fs/zfs/dsl_deadlist.c, line: 80 ff00106a0a50 genunix:assfail3+c1 () ff00106a0ad0 zfs:dsl_deadlist_open+ef () ff00106a0b80 zfs:dsl_dataset_get_ref+14c () ff00106a0bc0 zfs:dsl_dataset_hold_obj+2d () ff00106a0c20 zfs:dsl_dsobj_to_dsname+73 () ff00106a0c40 zfs:zfs_ioc_dsobj_to_dsname+23 () ff00106a0cc0 zfs:zfsdev_ioctl+176 () ff00106a0d00 genunix:cdev_ioctl+45 () ff00106a0d40 specfs:spec_ioctl+5a () ff00106a0dc0 genunix:fop_ioctl+7b () ff00106a0ec0 genunix:ioctl+18e () ff00106a0f10 unix:brand_sys_sysenter+1c9 () syncing file systems... done dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port ff02d59540a0::whatis ff02d59540a0 is allocated as a thread structure ff02d59540a0::print kthread_t t_procp | ::print proc_t p_user.u_psargs p_user.u_psargs = [ zpool status -v rpool ] ::stack vpanic() assfail3+0xc1(f7a2dff0, 0, f7a2e050, 16, f7a2e028, 50) dsl_deadlist_open+0xef(ff02f43dd7f0, ff02cff74080, 0) dsl_dataset_get_ref+0x14c(ff02d2ebacc0, 1b, f7a2865c, ff00106a0bd8) dsl_dataset_hold_obj+0x2d(ff02d2ebacc0, 1b, f7a2865c, ff00106a0bd8) dsl_dsobj_to_dsname+0x73(ff02f5f44000, 1b, ff02f5f44400) zfs_ioc_dsobj_to_dsname+0x23(ff02f5f44000) zfsdev_ioctl+0x176(b6, 5a25, 8042130, 13, ff02dae06460, ff00106a0de4) cdev_ioctl+0x45(b6, 5a25, 8042130, 13, ff02dae06460, ff00106a0de4) spec_ioctl+0x5a(ff02d5fd7900, 5a25, 8042130, 13, ff02dae06460, ff00106a0de4) fop_ioctl+0x7b(ff02d5fd7900, 5a25, 8042130, 13, ff02dae06460, ff00106a0de4) ioctl+0x18e(3, 5a25, 8042130) _sys_sysenter_post_swapgs+0x149() smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic - directed here from networking
Hi, I have been having problems with reboots, it usually happens when I am either sending or receiving data on the server, it can be over CIFS, or HTTP, NNTP. SO could be a networking problem, but they directed me here or to CIFS, but as it happens when I'm not using CIFS (but the service is still running) its probably not CIFS. I have checked for faulty RAM, ran memtest86+ (4.0), ran through multiple times without problem. The previous thread is http://opensolaris.org/jive/thread.jspa?threadID=116843 I have had 2 reboots today, within 10 minutes of each other. The previous 2 crashes produced the following: r...@nas:/var/crash/NAS# echo '$c' | mdb -k 11 page_create_va+0x314(fbc30210, ff016060d000, 2, 53, ff00048c25d0, ff016060d000) segkmem_page_create+0x8d(ff016060d000, 2, 4, fbc30210) segkmem_xalloc+0xc0(ff0146e1f000, 0, 2, 4, 0, fb880cb8) segkmem_alloc_vn+0xcd(ff0146e1f000, 2, 4, fbc30210) segkmem_alloc+0x24(ff0146e1f000, 2, 4) vmem_xalloc+0x546(ff0146e2, 2, 1000, 0, 0, 0) vmem_alloc+0x161(ff0146e2, 2, 4) kmem_slab_create+0x81(ff014890f858, 4) kmem_slab_alloc+0x5b(ff014890f858, 4) kmem_cache_alloc+0x130(ff014890f858, 4) zio_buf_alloc+0x2c(2) vdev_queue_io_to_issue+0x42f(ff014c9985a8, 23) vdev_queue_io_done+0x61(ff014d1180a8) zio_vdev_io_done+0x62(ff014d1180a8) zio_execute+0xa0(ff014d1180a8) taskq_thread+0x1b7(ff014c716688) thread_start+8() r...@nas:/var/crash/NAS# echo '$c' | mdb -k 12 fsflush_do_pages+0x1e4() fsflush+0x3a6() thread_start+8() Any help on finding out the problem would be great. Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zfs import (hardware failure)
Hey, On Sat, Oct 31, 2009 at 5:03 PM, Victor Latushkin victor.latush...@sun.com wrote: Donald Murray, P.Eng. wrote: Hi, I've got an OpenSolaris 2009.06 box that will reliably panic whenever I try to import one of my pools. What's the best practice for recovering (before I resort to nuking the pool and restoring from backup)? Could you please post panic stack backtrace? There are two pools on the system: rpool and tank. The rpool seems to be fine, since I can boot from a 2009.06 CD and 'zpool import -f rpool'; I can also 'zfs scrub rpool', and it doesn't find any errors. Hooray! Except I don't care about rpool. :-( If I boot from hard disk, the system begins importing zfs pools; once it's imported everything I usually have enough time to log in before it panics. If I boot from CD and 'zfs import -f tank', it panics. I've just started a 'zdb -e tank' which I found on the intertubes here: http://opensolaris.org/jive/thread.jspa?threadID=49020. Zdb seems to be ... doing something. Not sure _what_ it's doing, but it can't be making things worse for me right? Yes, zdb only reads, so it cannot make thing worse. I'm going to try adding the following to /etc/system, as mentioned here: http://opensolaris.org/jive/thread.jspa?threadID=114906 set zfs:zfs_recover=1 set aok=1 Please do not rush with these settings. Let's look at the stack backtrace first. Regards, Victor I think I've found the cause of my problem. I disconnected one side of each mirror, rebooted, and imported. The system didn't panic! So one of the disconnected drives (or cables, or controllers...) was the culprit. I've since narrowed it down to a single 500GB drive. When that drive is connected, a zpool import panics the system. When that drive is disconnected, the pool imports fine. r...@weyl:~# zpool status tank pool: tank state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed after 0h8m with 0 errors on Sun Nov 1 22:11:15 2009 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 7508645614192559694 FAULTED 0 0 0 was /dev/dsk/c7t0d0s0 c6t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t1d0 ONLINE 0 0 6 21.2G resilvered c7t0d0 ONLINE 0 0 0 errors: No known data errors r...@weyl:~# The first thing that's jumping out at me: why does the first mirror think the missing disk was c7t0d0? I have an old zpool status from before the problem began, and that disk used to be c6t0d0. r...@weyl:~# zpool status tank pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirrorONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 errors: No known data errors r...@weyl:~# Victor has been very helpful, living up to his reputation. Thanks Victor! If we determine a root cause, I'll update the list. Things I've learned along the way: - pools import automatically based on cached information in /etc/zfs/zpool.cache; if you move zpool.cache elsewhere, none of the pools will import upon rebooting; - import problematic pools via 'zpool import -f -R /a poolname'; this doesn't update the cachefile, and mounts the pool on /a; - adding the following to /etc/system didn't prevent a hardware-induced panic: set zfs:zfs_recover=1 set aok=1 - crash dumps are typically saved in /var/crash/$( uname -n ) - beadm is your friend; - redundancy is your friend (okay, I already knew that); - if you have a zfs problem, you want Victor Latushkin to be your friend; Cheers! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic on zfs import
Hi, I've got an OpenSolaris 2009.06 box that will reliably panic whenever I try to import one of my pools. What's the best practice for recovering (before I resort to nuking the pool and restoring from backup)? There are two pools on the system: rpool and tank. The rpool seems to be fine, since I can boot from a 2009.06 CD and 'zpool import -f rpool'; I can also 'zfs scrub rpool', and it doesn't find any errors. Hooray! Except I don't care about rpool. :-( If I boot from hard disk, the system begins importing zfs pools; once it's imported everything I usually have enough time to log in before it panics. If I boot from CD and 'zfs import -f tank', it panics. I've just started a 'zdb -e tank' which I found on the intertubes here: http://opensolaris.org/jive/thread.jspa?threadID=49020. Zdb seems to be ... doing something. Not sure _what_ it's doing, but it can't be making things worse for me right? I'm going to try adding the following to /etc/system, as mentioned here: http://opensolaris.org/jive/thread.jspa?threadID=114906 set zfs:zfs_recover=1 set aok=1 Suggestions? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zfs import
Donald Murray, P.Eng. wrote: Hi, I've got an OpenSolaris 2009.06 box that will reliably panic whenever I try to import one of my pools. What's the best practice for recovering (before I resort to nuking the pool and restoring from backup)? Could you please post panic stack backtrace? There are two pools on the system: rpool and tank. The rpool seems to be fine, since I can boot from a 2009.06 CD and 'zpool import -f rpool'; I can also 'zfs scrub rpool', and it doesn't find any errors. Hooray! Except I don't care about rpool. :-( If I boot from hard disk, the system begins importing zfs pools; once it's imported everything I usually have enough time to log in before it panics. If I boot from CD and 'zfs import -f tank', it panics. I've just started a 'zdb -e tank' which I found on the intertubes here: http://opensolaris.org/jive/thread.jspa?threadID=49020. Zdb seems to be ... doing something. Not sure _what_ it's doing, but it can't be making things worse for me right? Yes, zdb only reads, so it cannot make thing worse. I'm going to try adding the following to /etc/system, as mentioned here: http://opensolaris.org/jive/thread.jspa?threadID=114906 set zfs:zfs_recover=1 set aok=1 Please do not rush with these settings. Let's look at the stack backtrace first. Regards, Victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
We have the same problem since of today. The pool was to be renamed width zpool export, after an import it didn't come back online. A import -f results in a kernel panic. zpool status -v freports a degraded drive also. I'll also try to supply som,e traces and logs. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
Marc Althoff wrote: We have the same problem since of today. The pool was to be renamed width zpool export, after an import it didn't come back online. A import -f results in a kernel panic. zpool status -v freports a degraded drive also. I'll also try to supply som,e traces and logs. Please provide at least stack trace from console or /var/adm/messages for a start, please try to make sure that crashdump from the first panic is saved. victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
dear all, victor, i am most happy to report that the problems were somehwat hardware-related, caused by a damaged / dangling SATA cable which apparently caused long delays (sometimes working, disk on, disk off, ...) during normal zfs operations. Why the -f produced a kernel panic I'm unsure. Interestingly it all fit some symptoms other people have with a bad uberlblock, a defect spanned metadata structure (?) detected after a scrube tc. anyway, great that you guys answered to quickly. there was 6 TB of data on that pool. I stress-tested it for a week and 30 minutes prior to the incident deleted the old RAID set ... imagine my horror ;) have a good one marc -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
On 11.10.09 12:59, Darren Taylor wrote: I have searched the forums and google wide, but cannot find a fix for the issue I'm currently experiencing. Long story short - I'm now at a point where I cannot even import my zpool (zpool import -f tank) without causing a kernel panic I'm running OpenSolaris snv_111b and the zpool is version 14. This is the panic from /var/adm/messages; (full output attached); Where is full stack back trace? I do not see any attachment. victor genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528) This is the output I get from zpool import; # zpool import pool: tank id: 15136317365944618902 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tankONLINE raidz1ONLINE c9t4d0 ONLINE c9t5d0 ONLINE c9t6d0 ONLINE c9t7d0 ONLINE raidz1ONLINE c9t0d0 ONLINE c9t1d0 ONLINE c9t2d0 ONLINE c9t3d0 ONLINE I tried pulling back some info via this zdb command, but i'm not sure if i'm on the right track here (as zpool import seems to see the zpool without issue). This result is similar from all drives; # zdb -l /dev/dsk/c9t4d0 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 I also can complete zdb -e tank without issues – it lists all my snapshots and various objects without problem (this is still running on the machine at the moment) I have put the following into /etc/system; set zfs:zfs_recover=1 set aok=1 i've also tried mounting the zpool read only with zpool import -f -o ro tank but no luck.. I dont know where to go next? – am I meant to try and recover using an older txg? E. I would be extremely grateful to anyone who can offer advice on how to resolve this issue as the pool contains irreplaceable photos. Unfortunately I have not done any backups for a while as I thought raidz would be my savour. :( please help ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
Hi Victor, i have tried to re-attach the detail from /var/adm/messages -- This message posted from opensolaris.orgOct 11 17:16:55 opensolaris unix: [ID 836849 kern.notice] Oct 11 17:16:55 opensolaris ^Mpanic[cpu0]/thread=ff000b6f7c60: Oct 11 17:16:55 opensolaris genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528) Oct 11 17:16:55 opensolaris unix: [ID 10 kern.notice] Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f75f0 genunix:vcmn_err+2c () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f76e0 zfs:zfs_panic_recover+ae () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7770 zfs:space_map_remove+13c () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7820 zfs:space_map_load+260 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7860 zfs:metaslab_activate+64 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7920 zfs:metaslab_group_alloc+2b7 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7a00 zfs:metaslab_alloc_dva+295 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7aa0 zfs:metaslab_alloc+9b () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7ad0 zfs:zio_dva_allocate+3e () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b00 zfs:zio_execute+a0 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b60 zfs:zio_notify_parent+a6 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b90 zfs:zio_ready+188 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7bc0 zfs:zio_execute+a0 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7c40 genunix:taskq_thread+193 () Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7c50 unix:thread_start+8 () Oct 11 17:16:55 opensolaris unix: [ID 10 kern.notice] Oct 11 17:16:55 opensolaris genunix: [ID 672855 kern.notice] syncing file systems... Oct 11 17:16:55 opensolaris genunix: [ID 904073 kern.notice] done Oct 11 17:16:56 opensolaris genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel Oct 11 17:17:09 opensolaris genunix: [ID 409368 kern.notice] ^M100% done: 168706 pages dumped, compression ratio 3.58, Oct 11 17:17:09 opensolaris genunix: [ID 851671 kern.notice] dump succeeded___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
i have re run zdb -l /dev/dsk/c9t4d0s0 as i should have the first time (thanks Nicolas). Attached output. -- This message posted from opensolaris.org# zdb -l /dev/dsk/c9t4d0s0 LABEL 0 version=14 name='tank' state=0 txg=119170 pool_guid=15136317365944618902 hostid=290968 hostname='lexx' top_guid=1561201926038510280 guid=11292568128772689834 vdev_tree type='raidz' id=0 guid=1561201926038510280 nparity=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=4000766230528 is_log=0 children[0] type='disk' id=0 guid=11292568128772689834 path='/dev/dsk/c9t4d0s0' devid='id1,s...@n50014ee2588170a5/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a' whole_disk=1 children[1] type='disk' id=1 guid=10678319508898151547 path='/dev/dsk/c9t5d0s0' devid='id1,s...@n50014ee2032b9b04/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a' whole_disk=1 children[2] type='disk' id=2 guid=16523383997370950474 path='/dev/dsk/c9t6d0s0' devid='id1,s...@n50014ee2032b9b75/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@6,0:a' whole_disk=1 children[3] type='disk' id=3 guid=1710422830365926220 path='/dev/dsk/c9t7d0s0' devid='id1,s...@n50014ee2add68f2c/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@7,0:a' whole_disk=1 LABEL 1 version=14 name='tank' state=0 txg=119170 pool_guid=15136317365944618902 hostid=290968 hostname='lexx' top_guid=1561201926038510280 guid=11292568128772689834 vdev_tree type='raidz' id=0 guid=1561201926038510280 nparity=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=4000766230528 is_log=0 children[0] type='disk' id=0 guid=11292568128772689834 path='/dev/dsk/c9t4d0s0' devid='id1,s...@n50014ee2588170a5/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a' whole_disk=1 children[1] type='disk' id=1 guid=10678319508898151547 path='/dev/dsk/c9t5d0s0' devid='id1,s...@n50014ee2032b9b04/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a' whole_disk=1 children[2] type='disk' id=2 guid=16523383997370950474 path='/dev/dsk/c9t6d0s0' devid='id1,s...@n50014ee2032b9b75/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@6,0:a' whole_disk=1 children[3] type='disk' id=3 guid=1710422830365926220 path='/dev/dsk/c9t7d0s0' devid='id1,s...@n50014ee2add68f2c/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@7,0:a' whole_disk=1 LABEL 2 version=14 name='tank' state=0 txg=119170 pool_guid=15136317365944618902 hostid=290968 hostname='lexx' top_guid=1561201926038510280 guid=11292568128772689834 vdev_tree type='raidz' id=0 guid=1561201926038510280 nparity=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=4000766230528 is_log=0 children[0] type='disk' id=0 guid=11292568128772689834 path='/dev/dsk/c9t4d0s0' devid='id1,s...@n50014ee2588170a5/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a' whole_disk=1 children[1] type='disk' id=1 guid=10678319508898151547 path='/dev/dsk/c9t5d0s0' devid='id1,s...@n50014ee2032b9b04/a' phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a' whole_disk=1 children[2] type='disk' id=2 guid=16523383997370950474 path='/dev/dsk/c9t6d0s0'
[zfs-discuss] kernel panic on zpool import
I have searched the forums and google wide, but cannot find a fix for the issue I'm currently experiencing. Long story short - I'm now at a point where I cannot even import my zpool (zpool import -f tank) without causing a kernel panic I'm running OpenSolaris snv_111b and the zpool is version 14. This is the panic from /var/adm/messages; (full output attached); genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528) This is the output I get from zpool import; # zpool import pool: tank id: 15136317365944618902 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tankONLINE raidz1ONLINE c9t4d0 ONLINE c9t5d0 ONLINE c9t6d0 ONLINE c9t7d0 ONLINE raidz1ONLINE c9t0d0 ONLINE c9t1d0 ONLINE c9t2d0 ONLINE c9t3d0 ONLINE I tried pulling back some info via this zdb command, but i'm not sure if i'm on the right track here (as zpool import seems to see the zpool without issue). This result is similar from all drives; # zdb -l /dev/dsk/c9t4d0 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 I also can complete zdb -e tank without issues – it lists all my snapshots and various objects without problem (this is still running on the machine at the moment) I have put the following into /etc/system; set zfs:zfs_recover=1 set aok=1 i've also tried mounting the zpool read only with zpool import -f -o ro tank but no luck.. I dont know where to go next? – am I meant to try and recover using an older txg? E. I would be extremely grateful to anyone who can offer advice on how to resolve this issue as the pool contains irreplaceable photos. Unfortunately I have not done any backups for a while as I thought raidz would be my savour. :( please help -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
Darren Taylor wrote: I have searched the forums and google wide, but cannot find a fix for the issue I'm currently experiencing. Long story short - I'm now at a point where I cannot even import my zpool (zpool import -f tank) without causing a kernel panic I'm running OpenSolaris snv_111b and the zpool is version 14. This is the panic from /var/adm/messages; (full output attached); genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528) Have you tried importing to a system running a more recent build? The problem may have been fixed... -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic on zpool import
Hi Ian, I'm currently downloading build 124 to see if that helps... the download is running a bit slow so wont know until later tomorrow. Just an update that i have also tried; (forgot to mention above) * Pulling out each disk - tried mounting in degraded state - same kernel panic * Deleting the zpool.cache Fingers crossed i get something different with the newer build. Very strange, as i don't think this was a hardware issue? -- as all the drives appear to be working without issue and zpool import list all drives as ONLINE without any information pointing to corruption. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
Richard Elling wrote: Chris Gerhard wrote: My home server running snv_94 is tipping with the same assertion when someone list a particular file: Failed assertions indicate software bugs. Please file one. We learn something new every day! Gavin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
Chris Gerhard wrote: My home server running snv_94 is tipping with the same assertion when someone list a particular file: Failed assertions indicate software bugs. Please file one. http://en.wikipedia.org/wiki/Assertion_(computing) -- richard ::status Loading modules: [ unix genunix specfs dtrace cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba qlc fctl nca lofs zfs audiosup sd cpc random crypto fcip fcp smbsrv nfs logindmux ptm sppp nsctl sdbc sv ii rdc nsmb ipc mpt emlxs ] ::status debugging crash dump vmcore.17 (64-bit) from pearson operating system: 5.11 snv_94 (i86pc) panic message: assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../comm on/fs/zfs/zfs_fuid.c, line: 116 dump content: kernel pages only $c vpanic() assfail+0x7e(f83e3a10, f83e39f0, 74) zfs_fuid_table_load+0x1ed(ff025a1c2448, 0, ff025a231e88, ff025a231eb0) zfs_fuid_init+0xf8(ff025a231e40, 0) zfs_fuid_find_by_idx+0x3f(ff025a231e40, 40100) zfs_fuid_map_id+0x3f(ff025a231e40, 4010020c1, ff02672d0638, 2) zfs_zaccess_common+0x246(ff02bc62f4b0, 2, ff000cfcabd0, ff000cfcabd4, 0, ff02672d0638) zfs_zaccess+0x114(ff02bc62f4b0, 2, 0, 0, ff02672d0638) zfs_getacl+0x4c(ff02bc62f4b0, ff000cfcadd0, 0, ff02672d0638) zfs_getsecattr+0x81(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 0) fop_getsecattr+0x8f(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 0) cacl+0x5ae(6, 0, 0, ff02bf7f9740, ff000cfcae9c) acl+0x8d(80665d2, 6, 0, 0) sys_syscall32+0x101() ff02bf7f9740::print vnode_t { v_lock = { _opaque = [ 0 ] } v_flag = 0x1 v_count = 0x2 v_data = 0xff02bc62f4b0 v_vfsp = 0xff025a0055d0 v_stream = 0 v_type = 1 (VREG) v_rdev = 0x v_vfsmountedhere = 0 v_op = 0xff02520d2200 v_pages = 0 v_filocks = 0 v_shrlocks = 0 v_nbllock = { _opaque = [ 0 ] } v_cv = { _opaque = 0 } v_locality = 0 v_femhead = 0 v_path = 0xff02859d99c8 /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI v_rdcnt = 0 v_wrcnt = 0 v_mmap_read = 0 v_mmap_write = 0 v_mpssdata = 0 v_fopdata = 0 v_vsd = 0 v_xattrdir = 0 v_count_dnlc = 0x1 } An ls -l of /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI results in the system crashing. Need to investigate this further when I get home ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
Richard Elling wrote: Chris Gerhard wrote: My home server running snv_94 is tipping with the same assertion when someone list a particular file: Failed assertions indicate software bugs. Please file one. http://en.wikipedia.org/wiki/Assertion_(computing) A colleague pointed out that it is an exact match for bug 6746456 so I will upgrade to a later build and check that out. Alas in the mean time the power supply on the system has failed to I can't check this immediately. If it not fixed then I will file a new bug --chris -- richard ::status Loading modules: [ unix genunix specfs dtrace cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba qlc fctl nca lofs zfs audiosup sd cpc random crypto fcip fcp smbsrv nfs logindmux ptm sppp nsctl sdbc sv ii rdc nsmb ipc mpt emlxs ] ::status debugging crash dump vmcore.17 (64-bit) from pearson operating system: 5.11 snv_94 (i86pc) panic message: assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../comm on/fs/zfs/zfs_fuid.c, line: 116 dump content: kernel pages only $c vpanic() assfail+0x7e(f83e3a10, f83e39f0, 74) zfs_fuid_table_load+0x1ed(ff025a1c2448, 0, ff025a231e88, ff025a231eb0) zfs_fuid_init+0xf8(ff025a231e40, 0) zfs_fuid_find_by_idx+0x3f(ff025a231e40, 40100) zfs_fuid_map_id+0x3f(ff025a231e40, 4010020c1, ff02672d0638, 2) zfs_zaccess_common+0x246(ff02bc62f4b0, 2, ff000cfcabd0, ff000cfcabd4, 0, ff02672d0638) zfs_zaccess+0x114(ff02bc62f4b0, 2, 0, 0, ff02672d0638) zfs_getacl+0x4c(ff02bc62f4b0, ff000cfcadd0, 0, ff02672d0638) zfs_getsecattr+0x81(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 0) fop_getsecattr+0x8f(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 0) cacl+0x5ae(6, 0, 0, ff02bf7f9740, ff000cfcae9c) acl+0x8d(80665d2, 6, 0, 0) sys_syscall32+0x101() ff02bf7f9740::print vnode_t { v_lock = { _opaque = [ 0 ] } v_flag = 0x1 v_count = 0x2 v_data = 0xff02bc62f4b0 v_vfsp = 0xff025a0055d0 v_stream = 0 v_type = 1 (VREG) v_rdev = 0x v_vfsmountedhere = 0 v_op = 0xff02520d2200 v_pages = 0 v_filocks = 0 v_shrlocks = 0 v_nbllock = { _opaque = [ 0 ] } v_cv = { _opaque = 0 } v_locality = 0 v_femhead = 0 v_path = 0xff02859d99c8 /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI v_rdcnt = 0 v_wrcnt = 0 v_mmap_read = 0 v_mmap_write = 0 v_mpssdata = 0 v_fopdata = 0 v_vsd = 0 v_xattrdir = 0 v_count_dnlc = 0x1 } An ls -l of /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI results in the system crashing. Need to investigate this further when I get home -- Chris Gerhard. __o __o __o Systems TSC Chief Technologist_`\,`\,`\,_ Sun Microsystems Limited (*)/---/---/ (*) Phone: +44 (0) 1252 426033 (ext 26033) http://blogs.sun.com/chrisg smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
My home server running snv_94 is tipping with the same assertion when someone list a particular file: ::status Loading modules: [ unix genunix specfs dtrace cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba qlc fctl nca lofs zfs audiosup sd cpc random crypto fcip fcp smbsrv nfs logindmux ptm sppp nsctl sdbc sv ii rdc nsmb ipc mpt emlxs ] ::status debugging crash dump vmcore.17 (64-bit) from pearson operating system: 5.11 snv_94 (i86pc) panic message: assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../comm on/fs/zfs/zfs_fuid.c, line: 116 dump content: kernel pages only $c vpanic() assfail+0x7e(f83e3a10, f83e39f0, 74) zfs_fuid_table_load+0x1ed(ff025a1c2448, 0, ff025a231e88, ff025a231eb0) zfs_fuid_init+0xf8(ff025a231e40, 0) zfs_fuid_find_by_idx+0x3f(ff025a231e40, 40100) zfs_fuid_map_id+0x3f(ff025a231e40, 4010020c1, ff02672d0638, 2) zfs_zaccess_common+0x246(ff02bc62f4b0, 2, ff000cfcabd0, ff000cfcabd4, 0, ff02672d0638) zfs_zaccess+0x114(ff02bc62f4b0, 2, 0, 0, ff02672d0638) zfs_getacl+0x4c(ff02bc62f4b0, ff000cfcadd0, 0, ff02672d0638) zfs_getsecattr+0x81(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 0) fop_getsecattr+0x8f(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 0) cacl+0x5ae(6, 0, 0, ff02bf7f9740, ff000cfcae9c) acl+0x8d(80665d2, 6, 0, 0) sys_syscall32+0x101() ff02bf7f9740::print vnode_t { v_lock = { _opaque = [ 0 ] } v_flag = 0x1 v_count = 0x2 v_data = 0xff02bc62f4b0 v_vfsp = 0xff025a0055d0 v_stream = 0 v_type = 1 (VREG) v_rdev = 0x v_vfsmountedhere = 0 v_op = 0xff02520d2200 v_pages = 0 v_filocks = 0 v_shrlocks = 0 v_nbllock = { _opaque = [ 0 ] } v_cv = { _opaque = 0 } v_locality = 0 v_femhead = 0 v_path = 0xff02859d99c8 /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI v_rdcnt = 0 v_wrcnt = 0 v_mmap_read = 0 v_mmap_write = 0 v_mpssdata = 0 v_fopdata = 0 v_vsd = 0 v_xattrdir = 0 v_count_dnlc = 0x1 } An ls -l of /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI results in the system crashing. Need to investigate this further when I get home -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
Do you guys have any more information about this? I've tried the offset methods, zfs_recover, aok=1, mounting read only, yada yada, with still 0 luck. I have about 3TBs of data on my array, and I would REALLY hate to lose it. Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
I can reliably reproduce this panic with a similar stack trace on a newly installed Solaris 10 10/08 system (I know, not OpenSolaris but it appears to be the same problem). I just opened a support case w/ Sun but then discovered what appear to be the specific steps for me to reproduce it. My setup is a Sol10u6 server, with /export/olddata a ZFS filesystem with sharenfs=root=zeus.mattwilson.local zeus.mattwilson.local is an Ubuntu Linux system. I mount the NFS share with no options, just mount athena:/export/olddata /mnt What I think is causing the problem is that if I copy a file, as root, with owner UID 4294967294 to the Solaris NFS share, using the -a option to GNU cp on the Linux box (which, among other things, preserves the owner), the panic occurs. Other files, with more reasonable owners, don't panic the server. In my case I can avoid the problem by fixing the bad owner ID on the file I'm copying, but not sure if this helps with your situation. My stack was: SolarisCAT(vmcore.2/10X) stack unix:vpanic_common+0x165() unix:0xfb84d7c2() genunix:0xfb9f0c63() zfs:zfs_fuid_table_load+0xac() zfs:zfs_fuid_init+0x53() zfs:zfs_fuid_find_by_idx+0x87() zfs:zfs_fuid_map_id+0x47() zfs:zfs_fuid_map_ids+0x42() zfs:zfs_getattr+0xbc() zfs:zfs_shim_getattr+0x15() genunix:fop_getattr+0x25() nfssrv:rfs4_delegated_getattr+0x9() nfssrv:rfs3_setattr+0x19d() nfssrv:common_dispatch+0x5b8() nfssrv:rfs_dispatch+0x21() rpcmod:svc_getreq+0x209() rpcmod:svc_run+0x124() rpcmod:svc_do_run+0x88() nfs:nfssys+0x16a() unix:_sys_sysenter_post_swapgs+0x14b() -- switch to user thread's user stack -- panic string: assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../common/fs/zfs/zfs_fuid.c, line: 95 On Tue, Sep 9, 2008 at 7:56 AM, Mark Shellenbaum [EMAIL PROTECTED] wrote: David Bartley wrote: On Tue, Sep 9, 2008 at 11:43 AM, Mark Shellenbaum [EMAIL PROTECTED] wrote: David Bartley wrote: Hello, We're repeatedly seeing a kernel panic on our disk server. We've been unable to determine exactly how to reproduce it, but it seems to occur fairly frequently (a few times a day). This is happening on both snv91 and snv96. We've run 'zpool scrub' and this has reported no errors. I can try to provide more information if needed. Is there a way to turn on more logging/debugging? -- David -- Have you been using the CIFS server? You should only be going down that path for Windows created files and its trying to load Windows domain SID table. No. We have a bunch of linux NFS clients. The machines mount from the server using a mixture of NFSv3, NFSv4, sys auth, and krb5 auth. What is the history of this file system? Was is created prior to snv_77 and then upgraded? You most likely have a bad uid/gid on one or more files. Can you post the dump so I can download it? -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matthew R. Wilson http://www.mattwilson.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
Matthew R. Wilson wrote: I can reliably reproduce this panic with a similar stack trace on a newly installed Solaris 10 10/08 system (I know, not OpenSolaris but it appears to be the same problem). I just opened a support case w/ Sun but then discovered what appear to be the specific steps for me to reproduce it. My setup is a Sol10u6 server, with /export/olddata a ZFS filesystem with sharenfs=root=zeus.mattwilson.local zeus.mattwilson.local is an Ubuntu Linux system. I mount the NFS share with no options, just mount athena:/export/olddata /mnt What I think is causing the problem is that if I copy a file, as root, with owner UID 4294967294 to the Solaris NFS share, using the -a option to GNU cp on the Linux box (which, among other things, preserves the owner), the panic occurs. Other files, with more reasonable owners, don't panic the server. In my case I can avoid the problem by fixing the bad owner ID on the file I'm copying, but not sure if this helps with your situation. I believe this panic shouldn't happen on OpenSolaris. It has some extra protection to prevent the panic that doesn't exist in the S10 code base. Are there any ACLs on the parent directory that would be inherited to the newly created file you tried to copy? If so what are they? My stack was: SolarisCAT(vmcore.2/10X) stack unix:vpanic_common+0x165() unix:0xfb84d7c2() genunix:0xfb9f0c63() zfs:zfs_fuid_table_load+0xac() zfs:zfs_fuid_init+0x53() zfs:zfs_fuid_find_by_idx+0x87() zfs:zfs_fuid_map_id+0x47() zfs:zfs_fuid_map_ids+0x42() zfs:zfs_getattr+0xbc() zfs:zfs_shim_getattr+0x15() genunix:fop_getattr+0x25() nfssrv:rfs4_delegated_getattr+0x9() nfssrv:rfs3_setattr+0x19d() nfssrv:common_dispatch+0x5b8() nfssrv:rfs_dispatch+0x21() rpcmod:svc_getreq+0x209() rpcmod:svc_run+0x124() rpcmod:svc_do_run+0x88() nfs:nfssys+0x16a() unix:_sys_sysenter_post_swapgs+0x14b() -- switch to user thread's user stack -- panic string: assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../common/fs/zfs/zfs_fuid.c, line: 95 On Tue, Sep 9, 2008 at 7:56 AM, Mark Shellenbaum [EMAIL PROTECTED] wrote: David Bartley wrote: On Tue, Sep 9, 2008 at 11:43 AM, Mark Shellenbaum [EMAIL PROTECTED] wrote: David Bartley wrote: Hello, We're repeatedly seeing a kernel panic on our disk server. We've been unable to determine exactly how to reproduce it, but it seems to occur fairly frequently (a few times a day). This is happening on both snv91 and snv96. We've run 'zpool scrub' and this has reported no errors. I can try to provide more information if needed. Is there a way to turn on more logging/debugging? -- David -- Have you been using the CIFS server? You should only be going down that path for Windows created files and its trying to load Windows domain SID table. No. We have a bunch of linux NFS clients. The machines mount from the server using a mixture of NFSv3, NFSv4, sys auth, and krb5 auth. What is the history of this file system? Was is created prior to snv_77 and then upgraded? You most likely have a bad uid/gid on one or more files. Can you post the dump so I can download it? -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
On Sun, Nov 2, 2008 at 4:30 PM, Mark Shellenbaum [EMAIL PROTECTED] wrote: I believe this panic shouldn't happen on OpenSolaris. It has some extra protection to prevent the panic that doesn't exist in the S10 code base. Are there any ACLs on the parent directory that would be inherited to the newly created file you tried to copy? If so what are they? Nope, no ACL other than regular POSIX mode 755. I did confirm that copying the same file to an snv_99 system does not cause the panic, it looks like the ID gets remapped to the user 'nobody'. Thanks, Matthew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on ZFS snapshot destroy
Hi, i try to destroy a snapshop1 on opensolaris SunOS storage11 5.11 snv_98 i86pc i386 i86pc and my box reboots leaving a crash-file in /var/crash/storage11. This is repoducable... for this one snapshot1 - other snapshots was destroyable (without crash) How can i help somebody to track down this problem ? At the moment, i can't work with this pool. regards Danny P.S.: the snapshot1 depends on a clone1 depending on snapshot2 depending on a zfs-volume (created by zfs create -V ...) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel Panic
Hello, We're repeatedly seeing a kernel panic on our disk server. We've been unable to determine exactly how to reproduce it, but it seems to occur fairly frequently (a few times a day). This is happening on both snv91 and snv96. We've run 'zpool scrub' and this has reported no errors. I can try to provide more information if needed. Is there a way to turn on more logging/debugging? Sep 9 09:32:23 ginseng unix: [ID 836849 kern.notice] Sep 9 09:32:23 ginseng ^Mpanic[cpu1]/thread=ff01598d6820: Sep 9 09:32:23 ginseng genunix: [ID 403854 kern.notice] assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../common/fs/zfs/zfs_fuid.c, line: 116 Sep 9 09:32:23 ginseng unix: [ID 10 kern.notice] Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03010 genunix:assfail+7e () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d030b0 zfs:zfs_fuid_table_load+1ed () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03100 zfs:zfs_fuid_init+f8 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03140 zfs:zfs_fuid_find_by_idx+3f () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d031a0 zfs:zfs_fuid_map_id+3f () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03250 zfs:zfs_zaccess_common+253 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d032b0 zfs:zfs_zaccess_delete+9f () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03310 zfs:zfs_zaccess_rename+64 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03400 zfs:zfs_rename+2e1 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03490 genunix:fop_rename+c2 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03770 nfssrv:rfs3_rename+3ad () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03a70 nfssrv:common_dispatch+439 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03a90 nfssrv:rfs_dispatch+2d () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03b80 rpcmod:svc_getreq+1c6 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03bf0 rpcmod:svc_run+185 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03c30 rpcmod:svc_do_run+85 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03ec0 nfs:nfssys+770 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03f10 unix:brand_sys_sysenter+1e6 () Sep 9 09:32:23 ginseng unix: [ID 10 kern.notice] Sep 9 09:32:23 ginseng genunix: [ID 672855 kern.notice] syncing file systems... Sep 9 09:32:23 ginseng genunix: [ID 904073 kern.notice] done Sep 9 09:32:24 ginseng genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c5d1s1, offset 429391872, content: kernel Sep 9 09:32:41 ginseng genunix: [ID 409368 kern.notice] ^M100% done: 265125 pages dumped, compression ratio 3.52, Sep 9 09:32:41 ginseng genunix: [ID 851671 kern.notice] dump succeeded -- David -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
David Bartley wrote: Hello, We're repeatedly seeing a kernel panic on our disk server. We've been unable to determine exactly how to reproduce it, but it seems to occur fairly frequently (a few times a day). This is happening on both snv91 and snv96. We've run 'zpool scrub' and this has reported no errors. I can try to provide more information if needed. Is there a way to turn on more logging/debugging? Sep 9 09:32:23 ginseng unix: [ID 836849 kern.notice] Sep 9 09:32:23 ginseng ^Mpanic[cpu1]/thread=ff01598d6820: Sep 9 09:32:23 ginseng genunix: [ID 403854 kern.notice] assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../common/fs/zfs/zfs_fuid.c, line: 116 Sep 9 09:32:23 ginseng unix: [ID 10 kern.notice] Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03010 genunix:assfail+7e () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d030b0 zfs:zfs_fuid_table_load+1ed () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03100 zfs:zfs_fuid_init+f8 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03140 zfs:zfs_fuid_find_by_idx+3f () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d031a0 zfs:zfs_fuid_map_id+3f () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03250 zfs:zfs_zaccess_common+253 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d032b0 zfs:zfs_zaccess_delete+9f () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03310 zfs:zfs_zaccess_rename+64 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03400 zfs:zfs_rename+2e1 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03490 genunix:fop_rename+c2 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03770 nfssrv:rfs3_rename+3ad () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03a70 nfssrv:common_dispatch+439 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03a90 nfssrv:rfs_dispatch+2d () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03b80 rpcmod:svc_getreq+1c6 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03bf0 rpcmod:svc_run+185 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03c30 rpcmod:svc_do_run+85 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03ec0 nfs:nfssys+770 () Sep 9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03f10 unix:brand_sys_sysenter+1e6 () Sep 9 09:32:23 ginseng unix: [ID 10 kern.notice] Sep 9 09:32:23 ginseng genunix: [ID 672855 kern.notice] syncing file systems... Sep 9 09:32:23 ginseng genunix: [ID 904073 kern.notice] done Sep 9 09:32:24 ginseng genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c5d1s1, offset 429391872, content: kernel Sep 9 09:32:41 ginseng genunix: [ID 409368 kern.notice] ^M100% done: 265125 pages dumped, compression ratio 3.52, Sep 9 09:32:41 ginseng genunix: [ID 851671 kern.notice] dump succeeded -- David -- Have you been using the CIFS server? You should only be going down that path for Windows created files and its trying to load Windows domain SID table. -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel Panic
David Bartley wrote: On Tue, Sep 9, 2008 at 11:43 AM, Mark Shellenbaum [EMAIL PROTECTED] wrote: David Bartley wrote: Hello, We're repeatedly seeing a kernel panic on our disk server. We've been unable to determine exactly how to reproduce it, but it seems to occur fairly frequently (a few times a day). This is happening on both snv91 and snv96. We've run 'zpool scrub' and this has reported no errors. I can try to provide more information if needed. Is there a way to turn on more logging/debugging? -- David -- Have you been using the CIFS server? You should only be going down that path for Windows created files and its trying to load Windows domain SID table. No. We have a bunch of linux NFS clients. The machines mount from the server using a mixture of NFSv3, NFSv4, sys auth, and krb5 auth. What is the history of this file system? Was is created prior to snv_77 and then upgraded? You most likely have a bad uid/gid on one or more files. Can you post the dump so I can download it? -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
A little update on the subject. After great help of Victor Latushkin the content of the pools is recovered. The cause of the problem is still under investigation, but what is clear that both config objects where corrupted. What has been done to recover data: Victor has a zfs module which allows to import pools in readonly mode bypassing reading of config objects. After installing it he was able to import pools and we manages to save almost everything apart from couple of log files. This module seems to be the only way to read content of the pools in situations like mine, where pool cannot be imported, and therefor cannot be checked/fixed by scrubbing. I hope Victor will post sort of instruction along with the module on how to use it. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
From what I can predict, and *nobody* has provided any panic essages to confirm, ZFS likely had difficulty writing. For Solaris 10u5 Panic stack is looking pretty much the same as panic on imprt, and cannot be correlated to write failure: Aug 5 12:01:27 omases11 unix: [ID 836849 kern.notice] Aug 5 12:01:27 omases11 ^Mpanic[cpu3]/thread=fe800279ac80: Aug 5 12:01:27 omases11 genunix: [ID 809409 kern.notice] ZFS: bad checksum (read on unknown off 0: zio fe8353c23640 [L0 packe d nvlist] 4000L/600P DVA[0]=0:d4200:600 DVA[1]=0:904200:600 fletcher4 lzjb LE contiguous birth=3637241 fill=1 cksum=6a85 cbad8b:60029922bbbf:2eb217a6bbefd5:1045aa85ce3521e3): error 50 Aug 5 12:01:27 omases11 unix: [ID 10 kern.notice] Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279aac0 zfs:zfsctl_ops_root+3008f24c () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279aad0 zfs:zio_next_stage+65 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab00 zfs:zio_wait_for_children+49 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab10 zfs:zio_wait_children_done+15 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab20 zfs:zio_next_stage+65 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab60 zfs:zio_vdev_io_assess+84 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab70 zfs:zio_next_stage+65 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279abd0 zfs:vdev_mirror_io_done+c1 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279abe0 zfs:zio_vdev_io_done+14 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279ac60 genunix:taskq_thread+bc () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279ac70 unix:thread_start+8 () Aug 5 12:01:28 omases11 unix: [ID 10 kern.notice] Aug 5 12:01:28 omases11 genunix: [ID 672855 kern.notice] syncing file systems... Aug 5 12:01:28 omases11 genunix: [ID 733762 kern.notice] 7 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
This panic message seems consistent with bugid 6322646, which was fixed in NV b77 (post S10u5 freeze). http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6322646 -- richard Borys Saulyak wrote: From what I can predict, and *nobody* has provided any panic essages to confirm, ZFS likely had difficulty writing. For Solaris 10u5 Panic stack is looking pretty much the same as panic on imprt, and cannot be correlated to write failure: Aug 5 12:01:27 omases11 unix: [ID 836849 kern.notice] Aug 5 12:01:27 omases11 ^Mpanic[cpu3]/thread=fe800279ac80: Aug 5 12:01:27 omases11 genunix: [ID 809409 kern.notice] ZFS: bad checksum (read on unknown off 0: zio fe8353c23640 [L0 packe d nvlist] 4000L/600P DVA[0]=0:d4200:600 DVA[1]=0:904200:600 fletcher4 lzjb LE contiguous birth=3637241 fill=1 cksum=6a85 cbad8b:60029922bbbf:2eb217a6bbefd5:1045aa85ce3521e3): error 50 Aug 5 12:01:27 omases11 unix: [ID 10 kern.notice] Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279aac0 zfs:zfsctl_ops_root+3008f24c () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279aad0 zfs:zio_next_stage+65 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab00 zfs:zio_wait_for_children+49 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab10 zfs:zio_wait_children_done+15 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab20 zfs:zio_next_stage+65 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab60 zfs:zio_vdev_io_assess+84 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab70 zfs:zio_next_stage+65 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279abd0 zfs:vdev_mirror_io_done+c1 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279abe0 zfs:zio_vdev_io_done+14 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279ac60 genunix:taskq_thread+bc () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279ac70 unix:thread_start+8 () Aug 5 12:01:28 omases11 unix: [ID 10 kern.notice] Aug 5 12:01:28 omases11 genunix: [ID 672855 kern.notice] syncing file systems... Aug 5 12:01:28 omases11 genunix: [ID 733762 kern.notice] 7 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
Borys Saulyak wrote: May I remind you that I issue occurred on Solaris 10, not on OpenSolaris. I believe you. If you review the life cycle of a bug, http://www.sun.com/bigadmin/hubs/documentation/patch/patch-docs/abugslife.pdf then you will recall that bugs are fixed in NV and then backported to Solaris 10 as patches. We would all appreciate a more rapid patch availability process for Solaris 10, but that is a discussion more appropriate for another forum. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
Suppose that ZFS detects an error in the first case. It can't tellbr the storage array something's wrong, please fix it (since thebr storage array doesn't provide for this with checksums and intelligentbr recovery), so all it can do is tell the user this file is corrupt,br recover it from backups.br Just to remind you. System was working fine with no sign of any failures. Data got corrupted at export operation. If storage was somehow misbehaving I would expect ZFS to complain about it on any operation which did not finish succesfully. I had NONE issues on the system with quite extensive read/write activity. System panicked on export and messed everything such that pools could not be imported. At what moment ZFS whould do better if I had even raid1 configuration? I assume that this mess would be written on both disks and how this would help me in recovering. I do understand that having more disks would be better in case of failure of one or several of them. But only if it's related to disks. I'm almost sure disks were fine during failure. Is there anything you can improve apart from ZFS to cope with such issues? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
Ask your hardware vendor. The hardware corrupted your data, not ZFS. Right, that's all because of these storage vendors. All problems come from them! Never from ZFS :-) I have similar answer from them: ask Sun, ZFS is buggy. Our storage is always fine. That is really ridiculous! People pay huge money on storage and its support plus same for hardware and OS to get at the end both parties blaming each other with no intention to look deeper. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
Borys Saulyak wrote: Suppose that ZFS detects an error in the first case. It can't tellbr the storage array something's wrong, please fix it (since thebr storage array doesn't provide for this with checksums and intelligentbr recovery), so all it can do is tell the user this file is corrupt,br recover it from backups.br Just to remind you. System was working fine with no sign of any failures. Data got corrupted at export operation. If storage was somehow misbehaving I would expect ZFS to complain about it on any operation which did not finish succesfully. From what I can predict, and *nobody* has provided any panic messages to confirm, ZFS likely had difficulty writing. For Solaris 10u5 and previous updates, ZFS will panic when writes cannot be completed successfully. This will be clearly logged. For later releases, the policy set in the pool's failmode property will be followed. Or, to say this another way, the only failmode property in Solaris 10u5 or NV builds prior to build 77 (October 2007) is panic. For later releases, the default failmode is wait, but you can change it. I had NONE issues on the system with quite extensive read/write activity. System panicked on export and messed everything such that pools could not be imported. At what moment ZFS whould do better if I had even raid1 configuration? I assume that this mess would be written on both disks and how this would help me in recovering. I do understand that having more disks would be better in case of failure of one or several of them. But only if it's related to disks. I'm almost sure disks were fine during failure. Is there anything you can improve apart from ZFS to cope with such issues? I think that nobody will be able to pinpoint the cause until someone looks at the messages and fma logs. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
Borys Saulyak borys.saulyak at eumetsat.int writes: Your pools have no redundancy... Box is connected to two fabric switches via different HBAs, storage is RAID5, MPxIP is ON, and all after that my pools have no redundancy?!?! As Darren said: no, there is no redundancy that ZFS can use. It is important to understand that your setup _prevents_ ZFS from self-healing itself. You need a ZFS-redundant pool (mirror, raidz or raidz2) or an fs with the attribute copies=2 to enable self-healing. I would recommend you to make multiple LUNs visible to ZFS, and create redundant pools out of them. Browse he past 2 years or so of the zfs-discuss@ archives to give you an idea about how others with the same kind of hardware as you are doing it. For example, export each disk as a LUN, and create multiple raidz vdevs. Or create 2 hardware raid5 arrays and mirror them with ZFS, etc. ...and got corrupted, therefore there is nothing ZFS This is exactly what I would like to know. HOW this could happened? Ask your hardware vendor. The hardware corrupted your data, not ZFS. I'm just questioning myself. Is it really reliable filesystem as presented, or it's better to keep away from it on production environment. Consider yourself lucky that the corruption was reported by ZFS. Other filesystems would have returned silently corrupted data and it would have maybe taken you days/weeks to troubleshoot it. As to myself, I use ZFS in production to backup 10+ million files, have seen occurences of hw causing data corruption, and have seen ZFS self-heal itself. So yes I trust it. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
I would recommend you to make multiple LUNs visible to ZFS, and create So, you are saying that ZFS will cope better with failures then any other storage system, right? I'm just trying to imagine... I've got, lets say, 10 disks in the storage. They are currently in RAID5 configuration and given to my box as one LUN. You suggest to create 10 LUNs instead, and give them to ZFS, where they will be part of one raidz, right? So what sort of protection will I gain by that? What kind of failure will be eliminated? Sorry, but I cannot catch it... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
On Thu, Aug 14, 2008 at 07:42, Borys Saulyak [EMAIL PROTECTED] wrote: I've got, lets say, 10 disks in the storage. They are currently in RAID5 configuration and given to my box as one LUN. You suggest to create 10 LUNs instead, and give them to ZFS, where they will be part of one raidz, right? So what sort of protection will I gain by that? What kind of failure will be eliminated? Sorry, but I cannot catch it... Suppose that ZFS detects an error in the first case. It can't tell the storage array something's wrong, please fix it (since the storage array doesn't provide for this with checksums and intelligent recovery), so all it can do is tell the user this file is corrupt, recover it from backups. In the second case, ZFS can use the parity or mirrored data to reconstruct plausible blocks, and then see if they match the checksum. Once it finds one that matches (which will happen as long as sufficient parity remains), it can write the corrected data back to the disk that had junk on it, and report to the user there were problems over here, but I fixed them. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
To further clarify Will's point... Your current setup provides excellent hardware protection, but absolutely no data protection. ZFS provides excellent data protection when it has multiple copies of the data blocks (1 hardware devices). Combine the two, provide 1 hardware devices to ZFS, and you have a really nice solution. If you can spare the space, setup your arrays and things to provide exactly 2 identical LUNs to your ZFS box and create your zpool with those in a mirror. The best of all worlds. On Thu, Aug 14, 2008 at 9:41 AM, Will Murnane [EMAIL PROTECTED]wrote: On Thu, Aug 14, 2008 at 07:42, Borys Saulyak [EMAIL PROTECTED] wrote: I've got, lets say, 10 disks in the storage. They are currently in RAID5 configuration and given to my box as one LUN. You suggest to create 10 LUNs instead, and give them to ZFS, where they will be part of one raidz, right? So what sort of protection will I gain by that? What kind of failure will be eliminated? Sorry, but I cannot catch it... Suppose that ZFS detects an error in the first case. It can't tell the storage array something's wrong, please fix it (since the storage array doesn't provide for this with checksums and intelligent recovery), so all it can do is tell the user this file is corrupt, recover it from backups. In the second case, ZFS can use the parity or mirrored data to reconstruct plausible blocks, and then see if they match the checksum. Once it finds one that matches (which will happen as long as sufficient parity remains), it can write the corrected data back to the disk that had junk on it, and report to the user there were problems over here, but I fixed them. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- chris -at- microcozm -dot- net === Si Hoc Legere Scis Nimium Eruditionis Habes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
mb == Marc Bevand [EMAIL PROTECTED] writes: mb Ask your hardware vendor. The hardware corrupted your data, mb not ZFS. You absolutely do NOT have adequate basis to make this statement. I would further argue that you are probably wrong, and that I think based on what we know that the pool was probably corrupted by a bug in ZFS. Simply because ZFS is (a) able to detect problems with hardware when they exist, and (b) ringing an alarm bell of some sort, does NOT exhonerate ZFS. and AIUI that is your position. Further, ZFS's ability to use zpool-level redundancy heal problems created by its own bugs is not a cause for celebration or an improvement over filesystems without bugs. The virtue of the self-healing is for when hardware actually does fail. If self-healing also helps with corruption created by bugs in ZFS, that does not shift blame for unhealed bug-corruption back to the hardware, nor make ZFS more robust than a different filesystem without corruption bugs. mb Other filesystems would have returned silently corrupted mb data and it would have maybe taken you days/weeks to mb troubleshoot possibly. very likely, other filesystems would have handled it fine. Boris, have a look at the two links I posted earlier about ``simon sez, import!'' incantations, and required patches. http://opensolaris.org/jive/message.jspa?messageID=192572#194209 http://sunsolve.sun.com/search/document.do?assetkey=1-66-233602-1 panic-on-import, sounds a lot like your problem. Jonathan also posted http://www.opensolaris.org/jive/thread.jspa?messageID=220125 which seems to be incomplete instructions on how to choose a different ueberblock which helped someone else with a corrupted pool, but the OP in that thread never wrote it up in recipe form for ignorant sysadmins like me to follow so it might not be widely useful. In short, ZFS is unstable and prone to corruption, but may improve substantially when patched up to the latest revision. And many fixes are available now, but some which are in SXCE right now will be available in the stable binary-only Solaris not until u6 so we haven't yet gained experience with how much improvement the patches provide. And finally, there is no way to back up a ZFS filesystem with lots of clones which is similarly robust to past Unix backup systems---your best bet for space-efficient backups is to zfs send/recv data onto a separate ZFS pool. In more detail, I think there is some experience here that when a single storage subsystem hosting both ZFS pools and vxfs filesystems goes away, ZFS pools sometimes become corrupt while vxfs rolls its log and continues. so, in stable Sol10u5, ZFS is probably more prone to metadata corruption causing whole-pool-failure than other logging filesystems. some fixes are around the corner, and others are apparently the subject of some philosophical debate. pgpWGngZltSqj.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss