Re: [zfs-discuss] kernel panic during zfs import [UPDATE]

2012-04-17 Thread Carsten John
Hello everybody,

just to let you know what happened in the meantime:

I was able to open a Service Request at Oracle.

The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf)

The bug has bin fixed (according to Oracle support) since build 164, but there 
is no fix for Solaris 11 available so far (will be fixed in S11U7?).

There is a workaround available that works (partly), but my system crashed 
again when trying to rebuild the offending zfs within the affected zpool.

At the moment I'm waiting for a so called interim diagnostic relief patch


cu

Carsten

-- 
Max Planck Institut fuer marine Mikrobiologie
- Network Administration -
Celsiustr. 1
D-28359 Bremen
Tel.: +49 421 2028568
Fax.: +49 421 2028565
PGP public key:http://www.mpi-bremen.de/Carsten_John.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import [UPDATE]

2012-04-17 Thread Enda O'Connor

On 17/04/2012 16:40, Carsten John wrote:

Hello everybody,

just to let you know what happened in the meantime:

I was able to open a Service Request at Oracle.

The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf)

The bug has bin fixed (according to Oracle support) since build 164, but there 
is no fix for Solaris 11 available so far (will be fixed in S11U7?).

There is a workaround available that works (partly), but my system crashed 
again when trying to rebuild the offending zfs within the affected zpool.

At the moment I'm waiting for a so called interim diagnostic relief patch


so are you on s11, can I see pkg info entire

this bug is fixed in FCS s11 release, as that is 175b, and it got fixed 
in build 164. So if you have solaris 11 that CR is fixed.


In solaris 10 it is fixed in 147440-14/147441-14 ( sparc/x86 )


Enda



cu

Carsten



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import [UPDATE]

2012-04-17 Thread Stephan Budach

Hi Carsten,


Am 17.04.12 17:40, schrieb Carsten John:

Hello everybody,

just to let you know what happened in the meantime:

I was able to open a Service Request at Oracle.

The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf)

The bug has bin fixed (according to Oracle support) since build 164, but there 
is no fix for Solaris 11 available so far (will be fixed in S11U7?).

There is a workaround available that works (partly), but my system crashed 
again when trying to rebuild the offending zfs within the affected zpool.

At the moment I'm waiting for a so called interim diagnostic relief patch


cu

Carsten




Afaik, bug 6742788 is fixed in S11 FCS (release) but you might be 
hitting this bug: 7098658. This bug, according to MOS, is still 
unresolved. My solution is to mount the affected zfs fs in read-only 
mode upon importing the zpool and setting it to rw afterwards.


Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

2012-03-30 Thread John D Groenveld
In message 4f735451.2020...@oracle.com, Deepak Honnalli writes:
 Thanks for your reply. I would love to take a look at the core
 file. If there is a way this can somehow be transferred to
 the internal cores server, I can work on the bug.

 I am not sure about the modalities of transferring the core
 file though. I will ask around and see if I can help you here.

How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1]

John
groenv...@acm.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

2012-03-30 Thread Stephan Budach

Am 30.03.12 21:45, schrieb John D Groenveld:

In message4f735451.2020...@oracle.com, Deepak Honnalli writes:

 Thanks for your reply. I would love to take a look at the core
 file. If there is a way this can somehow be transferred to
 the internal cores server, I can work on the bug.

 I am not sure about the modalities of transferring the core
 file though. I will ask around and see if I can help you here.

How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1]

John
groenv...@acm.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



https://supportiles.sun.com ist the place to send those files to.

Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

2012-03-30 Thread Carsten John
-Original message-
To: zfs-discuss@opensolaris.org; 
From:   John D Groenveld jdg...@elvis.arl.psu.edu
Sent:   Fri 30-03-2012 21:47
Subject:Re: [zfs-discuss] kernel panic during zfs import [ORACLE should 
notice this]
 In message 4f735451.2020...@oracle.com, Deepak Honnalli writes:
  Thanks for your reply. I would love to take a look at the core
  file. If there is a way this can somehow be transferred to
  the internal cores server, I can work on the bug.
 
  I am not sure about the modalities of transferring the core
  file though. I will ask around and see if I can help you here.
 
 How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1]
 
 John
 groenv...@acm.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

Hi John,

in the meantime I managed to open a service request at Oracle. There is a 
webportal https://supportfiles.sun.com. There you can upload the files...


cu

Carsten

-- 
Max Planck Institut fuer marine Mikrobiologie
- Network Administration -
Celsiustr. 1
D-28359 Bremen
Tel.: +49 421 2028568
Fax.: +49 421 2028565
PGP public key:http://www.mpi-bremen.de/Carsten_John.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import

2012-03-28 Thread Deepak Honnalli

Hi Carsten,

This was supposed to be fixed in build 164 of Nevada (6742788). If 
you are still seeing this
issue in S11, I think you should raise a bug with relevant details. 
As Paul has suggested,

this could also be due to incomplete snapshot.

I have seen interrupted zfs recv's causing weired bugs.

Thanks,
Deepak.

On 03/27/12 12:44 PM, Carsten John wrote:

Hallo everybody,

I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic 
during the import of a zpool (some 30TB) containing ~500 zfs filesystems after 
reboot. This causes a reboot loop, until booted single user and removed 
/etc/zfs/zpool.cache.


 From /var/adm/messages:

savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) 
rp=ff002f9cec50 addr=20 occurred in module zfs due to a NULL pointer 
dereference
savecore: [ID 882351 auth.error] Saving compressed system crash dump in 
/var/crash/vmdump.2

This is what mdb tells:

mdb unix.2 vmcore.2
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp 
scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs 
random fcp idm sata fcip cpc crypto ufs logindmux ptm sppp ]
$c
zap_leaf_lookup_closest+0x45(ff0700ca2a98, 0, 0, ff002f9cedb0)
fzap_cursor_retrieve+0xcd(ff0700ca2a98, ff002f9ceed0, ff002f9cef10)
zap_cursor_retrieve+0x195(ff002f9ceed0, ff002f9cef10)
zfs_purgedir+0x4d(ff0721d32c20)
zfs_rmnode+0x57(ff0721d32c20)
zfs_zinactive+0xb4(ff0721d32c20)
zfs_inactive+0x1a3(ff0721d3a700, ff07149dc1a0, 0)
fop_inactive+0xb1(ff0721d3a700, ff07149dc1a0, 0)
vn_rele+0x58(ff0721d3a700)
zfs_unlinked_drain+0xa7(ff07022dab40)
zfsvfs_setup+0xf1(ff07022dab40, 1)
zfs_domount+0x152(ff07223e3c70, ff0717830080)
zfs_mount+0x4e3(ff07223e3c70, ff07223e5900, ff002f9cfe20, 
ff07149dc1a0)
fsop_mount+0x22(ff07223e3c70, ff07223e5900, ff002f9cfe20, 
ff07149dc1a0)
domount+0xd2f(0, ff002f9cfe20, ff07223e5900, ff07149dc1a0, 
ff002f9cfe18)
mount+0xc0(ff0713612c78, ff002f9cfe98)
syscall_ap+0x92()
_sys_sysenter_post_swapgs+0x149()


I can import the pool readonly.

The server is a mirror for our primary file server and is synced via zfs 
send/receive.

I saw a similar effect some time ago on a opensolaris box (build 111b). That 
time my final solution was to copy over the read only mounted stuff to a newly 
created pool. As it is the second time this failure occures (on different 
machines) I'm really concerned about overall reliability



Any suggestions?


thx

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import

2012-03-28 Thread Carsten John
-Original message-
To: ZFS Discussions zfs-discuss@opensolaris.org; 
From:   Paul Kraus p...@kraus-haus.org
Sent:   Tue 27-03-2012 15:05
Subject:Re: [zfs-discuss] kernel panic during zfs import
 On Tue, Mar 27, 2012 at 3:14 AM, Carsten John cj...@mpi-bremen.de wrote:
  Hallo everybody,
 
  I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic 
 during the import of a zpool (some 30TB) containing ~500 zfs filesystems 
 after 
 reboot. This causes a reboot loop, until booted single user and removed 
 /etc/zfs/zpool.cache.
 
 
  From /var/adm/messages:
 
  savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf 
 Page fault) rp=ff002f9cec50 addr=20 occurred in module zfs due to a 
 NULL 
 pointer dereference
  savecore: [ID 882351 auth.error] Saving compressed system crash dump in 
 /var/crash/vmdump.2
 
 
 I ran into a very similar problem with Solaris 10U9 and the
 replica (zfs send | zfs recv destination) of a zpool of about 25 TB of
 data. The problem was an incomplete snapshot (the zfs send | zfs recv
 had been interrupted). On boot the system was trying to import the
 zpool and as part of that it was trying to destroy the offending
 (incomplete) snapshot. This was zpool version 22 and destruction of
 snapshots is handled as a single TXG. The problem was that the
 operation was running the system out of RAM (32 GB worth). There is a
 fix for this and it is in zpool 26 (or newer), but any snapshots
 created while the zpool is at a version prior to 26 will have the
 problem on-disk. We have support with Oracle and were able to get a
 loaner system with 128 GB RAM to clean up the zpool (it took about 75
 GB RAM to do so).
 
 If you are at zpool 26 or later this is not your problem. If you
 are at zpool  26, then test for an incomplete snapshot by importing
 the pool read only, then `zdb -d zpool | grep '%'` as the incomplete
 snapshot will have a '%' instead of a '@' as the dataset / snapshot
 separator. You can also run the zdb against the _un_imported_ zpool
 using the -e option to zdb.
 
 See the following Oracle Bugs for more information.
 
 CR# 6876953
 CR# 6910767
 CR# 7082249
 
 CR#7082249 has been marked as a duplicate of CR# 6948890
 
 P.S. I have a suspect that the incomplete snapshot was also corrupt in
 some strange way, but could never make a solid determination of that.
 We think what caused the zfs send | zfs recv to be interrupted was
 hitting an e1000g Ethernet device driver bug.
 
 -- 
 {1-2-3-4-5-6-7-}
 Paul Kraus
 - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
 - Sound Coordinator, Schenectady Light Opera Company (
 http://www.sloctheater.org/ )
 - Technical Advisor, Troy Civic Theatre Company
 - Technical Advisor, RPI Players
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

Hi,


this scenario seems to fit. The machine that was sending the snapshot is on 
OpenSolaris Build 111b (which is running zpool version 14).

I rebooted the receiving machine due to a hanging zfs receive that couldn't 
be killed.

zdb -d -e pool does not give any useful information:

zdb -d -e san_pool   
Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects


When importing the pool readonly, I get an error about two datasets:

zpool import -o readonly=on san_pool
cannot set property for 'san_pool/home/someuser': dataset is read-only
cannot set property for 'san_pool/home/someotheruser': dataset is read-only

As this is a mirror machine, I still have the option to destroy the pool and 
copy over the stuff via send/receive from the primary. But nobody knows how 
long this will work until I'm hit again

If an interrupted send/receive can screw up a 30TB target pool, then 
send/receive isn't an option for replication data at all, furthermore it should 
be flagged as don't use it if your target pool might contain any valuable data

I wil reproduce the crash once more and try to file a bug report for S11 as 
recommended by Deepak (not so easy these days...).



thanks



Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

2012-03-28 Thread Carsten John
-Original message-
To: zfs-discuss@opensolaris.org; 
From:   Deepak Honnalli deepak.honna...@oracle.com
Sent:   Wed 28-03-2012 09:12
Subject:Re: [zfs-discuss] kernel panic during zfs import
 Hi Carsten,
 
  This was supposed to be fixed in build 164 of Nevada (6742788). If 
 you are still seeing this
  issue in S11, I think you should raise a bug with relevant details. 
 As Paul has suggested,
  this could also be due to incomplete snapshot.
 
  I have seen interrupted zfs recv's causing weired bugs.
 
 Thanks,
 Deepak.


Hi Deepak,

I just spent about an hour (or two) trying to file a bug report regarding the 
issue without success.

Seems to me, that I'm too stupid to use this MyOracleSupport portal.

So, as I'm getting paid for keeping systems running and not clicking through 
flash overloaded support portals searching for CSIs, I'm giving the relevant 
information to the list now.

Perhaps, someone at Oracle, reading the list, is able to file a bug report, or 
contact me off list.



Background:

Machine A
- Sun X4270 
- Opensolaris Build 111b
- zpool version 14
- primary file server
- sending snapshots via zfs send
- direct attached Sun J4400 SAS JBODs with totally 40 TB storage

Machine B
- Sun X4270
- Solaris 11
- zpool version 33
- mirror server
- receiving snapshots via zfs receive
- FC attached Storagetek FLX280 storage 


Incident:

After a zfs send/receive run machine B had a hanging zfs receive process. To 
get rid of the process, I rebooted the machine. During reboot the kernel 
panics, resulting in a reboot loop.

To bring up the system, I rebooted single user, removed /etc/zfs/zpool.cache 
and rebooted again.

The damaged pool can imported readonly, giving a warning:

   $zpool import -o readonly=on san_pool
   cannot set property for 'san_pool/home/someuser': dataset is read-only
   cannot set property for 'san_pool/home/someotheruser': dataset is read-only

The ZFS debugger zdb does not give any additional information:

   $zdb -d -e san_pool
   Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects


The issue can reproduced by trying to import the pool r/w, resulting in a 
kernel panic.


The fmdump utility gives the following information for the relevant UUID:

   $fmdump -Vp -u 91da1503-74c5-67c2-b7c1-d4e245e4d968
   TIME   UUID 
SUNW-MSG-ID
   Mar 28 2012 12:54:26.563203000 91da1503-74c5-67c2-b7c1-d4e245e4d968 
SUNOS-8000-KL

 TIME CLASS ENA
 Mar 28 12:54:24.2698 ireport.os.sunos.panic.dump_available 
0x
 Mar 28 12:54:05.9826 ireport.os.sunos.panic.dump_pending_on_device 
0x

   nvlist version: 0
version = 0x0
class = list.suspect
uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
code = SUNOS-8000-KL
diag-time = 1332932066 541092
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
__case_state = 0x1
topo-uuid = 3b4117e0-0ac7-cde5-b434-b9735176d591
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru = 
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
resource = 
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
savecore-succcess = 1
dump-dir = /var/crash
dump-files = vmdump.0
os-instance-uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
panicstr = BAD TRAP: type=e (#pf Page fault) 
rp=ff002f6dcc50 addr=20 occurred in module zfs due to a NULL pointer 
dereference
panicstack = unix:die+d8 () | unix:trap+152b () | 
unix:cmntrap+e6 () | zfs:zap_leaf_lookup_closest+45 () | 
zfs:fzap_cursor_retrieve+cd () | zfs:zap_cursor_retrieve+195 () | 
zfs:zfs_purgedir+4d () |   zfs:zfs_rmnode+57 () | zfs:zfs_zinactive+b4 () | 
zfs:zfs_inactive+1a3 () | genunix:fop_inactive+b1 () | genunix:vn_rele+58 () | 
zfs:zfs_unlinked_drain+a7 () | zfs:zfsvfs_setup+f1 () | zfs:zfs_domount+152 () 
| zfs:zfs_mount+4e3 () | genunix:fsop_mount+22 () | genunix:domount+d2f () | 
genunix:mount+c0 () | genunix:syscall_ap+92 () | unix:brand_sys_sysenter+1cf () 
| 
crashtime = 1332931339
panic-time = March 28, 2012 12:42:19 PM CEST CEST
(end fault-list[0])

fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x4f72ede2 0x2191cbb8


The 'first view' debugger output looks like:

   mdb unix.0 vmcore.0 
   Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp 
scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs 
random idm sppp crypto sata fcip cpc fcp ufs logindmux ptm ]
$c
   zap_leaf_lookup_closest+0x45(ff0728eac588, 0

Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

2012-03-28 Thread John D Groenveld
In message zarafa.4f7307dd.297a.5713b0445a582...@zarafa.mpi-bremen.de, =?utf-
8?Q?Carsten_John?= writes:
I just spent about an hour (or two) trying to file a bug report regarding the 
issue without success.

Seems to me, that I'm too stupid to use this MyOracleSupport portal.

So, as I'm getting paid for keeping systems running and not clicking through f
lash overloaded support portals searching for CSIs, I'm giving the relevant in
formation to the list now.

If the Flash interface is broken, try the non-Flash MOS site:
URL:http://SupportHTML.Oracle.COM/

John
groenv...@acm.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

2012-03-28 Thread Deepak Honnalli

Hi Carsten,

Thanks for your reply. I would love to take a look at the core
file. If there is a way this can somehow be transferred to
the internal cores server, I can work on the bug.

I am not sure about the modalities of transferring the core
file though. I will ask around and see if I can help you here.

Thanks,
Deepak.

On Wednesday 28 March 2012 06:15 PM, Carsten John wrote:

-Original message-
To: zfs-discuss@opensolaris.org;
From:   Deepak Honnallideepak.honna...@oracle.com
Sent:   Wed 28-03-2012 09:12
Subject:Re: [zfs-discuss] kernel panic during zfs import

Hi Carsten,

  This was supposed to be fixed in build 164 of Nevada (6742788). If
you are still seeing this
  issue in S11, I think you should raise a bug with relevant details.
As Paul has suggested,
  this could also be due to incomplete snapshot.

  I have seen interrupted zfs recv's causing weired bugs.

Thanks,
Deepak.


Hi Deepak,

I just spent about an hour (or two) trying to file a bug report regarding the 
issue without success.

Seems to me, that I'm too stupid to use this MyOracleSupport portal.

So, as I'm getting paid for keeping systems running and not clicking through 
flash overloaded support portals searching for CSIs, I'm giving the relevant 
information to the list now.

Perhaps, someone at Oracle, reading the list, is able to file a bug report, or 
contact me off list.



Background:

Machine A
- Sun X4270
- Opensolaris Build 111b
- zpool version 14
- primary file server
- sending snapshots via zfs send
- direct attached Sun J4400 SAS JBODs with totally 40 TB storage

Machine B
- Sun X4270
- Solaris 11
- zpool version 33
- mirror server
- receiving snapshots via zfs receive
- FC attached Storagetek FLX280 storage


Incident:

After a zfs send/receive run machine B had a hanging zfs receive process. To 
get rid of the process, I rebooted the machine. During reboot the kernel 
panics, resulting in a reboot loop.

To bring up the system, I rebooted single user, removed /etc/zfs/zpool.cache 
and rebooted again.

The damaged pool can imported readonly, giving a warning:

$zpool import -o readonly=on san_pool
cannot set property for 'san_pool/home/someuser': dataset is read-only
cannot set property for 'san_pool/home/someotheruser': dataset is read-only

The ZFS debugger zdb does not give any additional information:

$zdb -d -e san_pool
Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects


The issue can reproduced by trying to import the pool r/w, resulting in a 
kernel panic.


The fmdump utility gives the following information for the relevant UUID:

$fmdump -Vp -u 91da1503-74c5-67c2-b7c1-d4e245e4d968
TIME   UUID 
SUNW-MSG-ID
Mar 28 2012 12:54:26.563203000 91da1503-74c5-67c2-b7c1-d4e245e4d968 
SUNOS-8000-KL

  TIME CLASS ENA
  Mar 28 12:54:24.2698 ireport.os.sunos.panic.dump_available 
0x
  Mar 28 12:54:05.9826 ireport.os.sunos.panic.dump_pending_on_device 
0x

nvlist version: 0
 version = 0x0
 class = list.suspect
 uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
 code = SUNOS-8000-KL
 diag-time = 1332932066 541092
 de = fmd:///module/software-diagnosis
 fault-list-sz = 0x1
 __case_state = 0x1
 topo-uuid = 3b4117e0-0ac7-cde5-b434-b9735176d591
 fault-list = (array of embedded nvlists)
 (start fault-list[0])
 nvlist version: 0
 version = 0x0
 class = defect.sunos.kernel.panic
 certainty = 0x64
 asru = 
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
 resource = 
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
 savecore-succcess = 1
 dump-dir = /var/crash
 dump-files = vmdump.0
 os-instance-uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
 panicstr = BAD TRAP: type=e (#pf Page fault) rp=ff002f6dcc50 addr=20 
occurred in module zfs due to a NULL pointer dereference
 panicstack = unix:die+d8 () | unix:trap+152b () | 
unix:cmntrap+e6 () | zfs:zap_leaf_lookup_closest+45 () | 
zfs:fzap_cursor_retrieve+cd () | zfs:zap_cursor_retrieve+195 () | 
zfs:zfs_purgedir+4d () |   zfs:zfs_rmnode+57 () | zfs:zfs_zinactive+b4 () | 
zfs:zfs_inactive+1a3 () | genunix:fop_inactive+b1 () | genunix:vn_rele+58 () | 
zfs:zfs_unlinked_drain+a7 () | zfs:zfsvfs_setup+f1 () | zfs:zfs_domount+152 () 
| zfs:zfs_mount+4e3 () | genunix:fsop_mount+22 () | genunix:domount+d2f () | 
genunix:mount+c0 () | genunix:syscall_ap+92 () | unix:brand_sys_sysenter+1cf () 
|
 crashtime = 1332931339
 panic-time = March 28, 2012 12:42:19 PM CEST CEST
 (end fault-list[0])

 fault-status = 0x1

Re: [zfs-discuss] kernel panic during zfs import

2012-03-27 Thread Jim Klimov

2012-03-27 11:14, Carsten John write:

I saw a similar effect some time ago on a opensolaris box (build 111b). That 
time my final solution was to copy over the read only mounted stuff to a newly 
created pool. As it is the second time this failure occures (on different 
machines) I'm really concerned about overall reliability



Any suggestions?


A couple of months ago I reported a similar issue (though with
a different stacktrace and code path). I tracked it to code in
freeing of deduped blocks where a valid code path could return
a NULL pointer, but further routines used the pointer as if it
is always valid - thus a NULL dereference when the pool was
imported RW and tried to release blocks marked for deletion.

Adding a check for non-NULLness in my private rebuild of oi_151a
has fixed the issue. I wouldn't be surprised to see similar
slackiness in other parts of the code now. Not checking input
values in routines seems like an arrogant mistake waiting to
fire (and it did for us).

I am not sure how to make a webrev and ultimately a signed-off
contribution upstream, but I posted my patch and research on
the list and in illumos bugtracker.

I am not sure how you can fix a S11 system though.
If it is at zpool v28 or older, you can try to import it into
an openindiana installation, perhaps rebuilt for similar
patched code that would check for NULLs and fix your pool
(and then reuse it in S11 if you must). The source is there
on http://src.illumos.org and your stacktrace should tell you
in which functions you should start looking...

Good luck,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import

2012-03-27 Thread Paul Kraus
On Tue, Mar 27, 2012 at 3:14 AM, Carsten John cj...@mpi-bremen.de wrote:
 Hallo everybody,

 I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic 
 during the import of a zpool (some 30TB) containing ~500 zfs filesystems 
 after reboot. This causes a reboot loop, until booted single user and removed 
 /etc/zfs/zpool.cache.


 From /var/adm/messages:

 savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf 
 Page fault) rp=ff002f9cec50 addr=20 occurred in module zfs due to a 
 NULL pointer dereference
 savecore: [ID 882351 auth.error] Saving compressed system crash dump in 
 /var/crash/vmdump.2


I ran into a very similar problem with Solaris 10U9 and the
replica (zfs send | zfs recv destination) of a zpool of about 25 TB of
data. The problem was an incomplete snapshot (the zfs send | zfs recv
had been interrupted). On boot the system was trying to import the
zpool and as part of that it was trying to destroy the offending
(incomplete) snapshot. This was zpool version 22 and destruction of
snapshots is handled as a single TXG. The problem was that the
operation was running the system out of RAM (32 GB worth). There is a
fix for this and it is in zpool 26 (or newer), but any snapshots
created while the zpool is at a version prior to 26 will have the
problem on-disk. We have support with Oracle and were able to get a
loaner system with 128 GB RAM to clean up the zpool (it took about 75
GB RAM to do so).

If you are at zpool 26 or later this is not your problem. If you
are at zpool  26, then test for an incomplete snapshot by importing
the pool read only, then `zdb -d zpool | grep '%'` as the incomplete
snapshot will have a '%' instead of a '@' as the dataset / snapshot
separator. You can also run the zdb against the _un_imported_ zpool
using the -e option to zdb.

See the following Oracle Bugs for more information.

CR# 6876953
CR# 6910767
CR# 7082249

CR#7082249 has been marked as a duplicate of CR# 6948890

P.S. I have a suspect that the incomplete snapshot was also corrupt in
some strange way, but could never make a solid determination of that.
We think what caused the zfs send | zfs recv to be interrupted was
hitting an e1000g Ethernet device driver bug.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, Troy Civic Theatre Company
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-19 Thread Stu Whitefish


 It seems that obtaining an Oracle support contract or a contract renewal is 
 equally frustrating.

I don't have any axe to grind with Oracle. I'm new to the Solaris thing and 
wanted to see if it was for me.

If I was using this box to make money then sure I wouldn't have any problem 
paying for support. I don't expect
handouts and I don't mind paying.

I trusted ZFS because I heard it's for enterprise use and now I have 200G of 
data offline and not a peep from Oracle.
Looking on the net I found another guy who had the same exact failure.

To my way of thinking somebody needs to standup and get this fixed for us and 
make sure it doesn't happen to anybody
else. If that happens I have no grudge against Oracle or Solaris. If it 
doesn't that's a pretty sour experience for someone
to go through and it will definitely make me look at this whole thing in 
another light.

I still believe somebody over there will do the right thing. I don't believe 
Oracle needs to hold people's data hostage to make money.
I am sure they have enough good products and services to make money honestly.

Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-19 Thread Tim Cook
On Fri, Aug 19, 2011 at 4:43 AM, Stu Whitefish swhitef...@yahoo.com wrote:


  It seems that obtaining an Oracle support contract or a contract renewal
 is equally frustrating.

 I don't have any axe to grind with Oracle. I'm new to the Solaris thing and
 wanted to see if it was for me.

 If I was using this box to make money then sure I wouldn't have any problem
 paying for support. I don't expect
 handouts and I don't mind paying.

 I trusted ZFS because I heard it's for enterprise use and now I have 200G
 of data offline and not a peep from Oracle.
 Looking on the net I found another guy who had the same exact failure.

 To my way of thinking somebody needs to standup and get this fixed for us
 and make sure it doesn't happen to anybody
 else. If that happens I have no grudge against Oracle or Solaris. If it
 doesn't that's a pretty sour experience for someone
 to go through and it will definitely make me look at this whole thing in
 another light.

 I still believe somebody over there will do the right thing. I don't
 believe Oracle needs to hold people's data hostage to make money.
 I am sure they have enough good products and services to make money
 honestly.

 Jim



You digitally signed a license agreement stating the following:
*No Technical Support*
Our technical support organization will not provide technical support, phone
support, or updates to you for the Programs licensed under this agreement.

To turn around and keep repeating that they're holding your data hostage
is disingenuous at best.  Nobody is holding your data hostage.  You
voluntarily put it on an operating system that explicitly states doesn't
offer support from the parent company.  Nobody from Oracle is going to show
up with a patch for you on this mailing list because none of the Oracle
employees want to lose their job and subsequently be subjected to a
lawsuit.  If that's what you're planning on waiting for, I'd suggest you
take a new approach.

Sorry to be a downer, but that's reality.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-19 Thread John D Groenveld
In message 1313687977.77375.yahoomail...@web121903.mail.ne1.yahoo.com, Stu Wh
itefish writes:
Nope, not a clue how to do that and I have installed Windows on this box inste
ad of Solaris since I can't get my data back from ZFS.
I have my two drives the pool is on disconnected so if this ever gets resolved
 I can reinstall Solaris and start learning again.

I believe you can configure VirtualBox for Windows to pass thru
the disk with your unimportable rpool to guest OSs.
Can OpenIndiana or FreeBSD guest import the pool?
Does Solaris 11X crash at the same place when run from within
VirtualBox?

John
groenv...@acm.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-18 Thread Thomas Gouverneur
You're probably hitting bug 7056738 - http://wesunsolve.net/bugid/id/7056738
Looks like it's not fixed yet @ oracle anyway...

Were you using crypto on your datasets ?


Regards,

Thomas

On Tue, 16 Aug 2011 09:33:34 -0700 (PDT)
Stu Whitefish swhitef...@yahoo.com wrote:

 - Original Message -
 
  From: Alexander Lesle gro...@tierarzt-mueller.de
  To: zfs-discuss@opensolaris.org
  Cc: 
  Sent: Monday, August 15, 2011 8:37:42 PM
  Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
  inaccessible!
  
  Hello Stu Whitefish and List,
  
  On August, 15 2011, 21:17 Stu Whitefish wrote in [1]:
  
   7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a
   kernel panic, even when booted from different OS versions
  
   Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest
   from Oracle) several times each as well as 2 new installs of Update 8.
  
  When I understand you right is your primary interest to recover your
  data on tank pool.
  
  Have you check the way to boot from a Live-DVD, mount your safe 
  place
  and copy the data on a other machine?
 
 Hi Alexander,
 
 Yes of course...the problem is no version of Solaris can import the pool. 
 Please refer to the first message in the thread.
 
 Thanks,
 
 Jim
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Gouverneur Thomas t...@ians.be
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-18 Thread Thomas Gouverneur

Have you already extracted the core file of the kernel crash ?
(and btw activated dump device for such dumping happen at next reboot...)

Have you also tried applying the latest kernel/zfs patches and try 
importing the pool afterwards ?



Thomas

On 08/18/2011 06:40 PM, Stu Whitefish wrote:

Hi Thomas,

Thanks for that link. That's very similar but not identical. There's a 
different line number in zfs_ioctl.c, mine and Preston's fail on line 1815. It 
could be because of a difference in levels in that module of course, but the 
traceback is not identical either. Ours show brand_sysenter and the one you 
linked to shows brand_sys_syscall. I don't know what all that means but it is 
different. Anyway at least two of us have identical failures.

I was not using crypto, just a plain jane mirror on 2 drives. Possibly I had 
compression on a few file systems but everything else was allowed to default.

Here are our screenshots in case anybody doesn't want to go through the thread.


http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/

http://prestonconnors.com/zvol_get_stats.jpg


I hope somebody can help with this. It's not a good feeling having so much data 
gone.

Thanks for your help. Oracle, are you listening?

Jim



- Original Message -
   

From: Thomas Gouverneurt...@ians.be
To: zfs-discuss@opensolaris.org
Cc: Stu Whitefishswhitef...@yahoo.com
Sent: Thursday, August 18, 2011 1:57:29 PM
Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
inaccessible!

You're probably hitting bug 7056738 -
http://wesunsolve.net/bugid/id/7056738
Looks like it's not fixed yet @ oracle anyway...

Were you using crypto on your datasets ?


Regards,

Thomas
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-18 Thread Stu Whitefish
 From: Thomas Gouverneur t...@ians.be

 To: zfs-discuss@opensolaris.org
 Cc: 
 Sent: Thursday, August 18, 2011 5:11:16 PM
 Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
 Have you already extracted the core file of the kernel crash ?

Nope, not a clue how to do that and I have installed Windows on this box 
instead of Solaris since I can't get my data back from ZFS.
I have my two drives the pool is on disconnected so if this ever gets resolved 
I can reinstall Solaris and start learning again.

 (and btw activated dump device for such dumping happen at next reboot...)

This was a development box for me to see how I get along with Solaris. I'm 
afraid I don't have any experience in Solaris to understand your question.

 Have you also tried applying the latest kernel/zfs patches and try importing 
 the pool afterwards ?

Wish I had them and knew what to do with them if I had them. Somebody on OTN 
noted this is supposed to be fixed by 142910 but
I didn't hear back yet whether it fixes an pool ZFS won't import, or it only 
stops it from happening in the first place. Don't have a service
contract as I say this box was my first try with Solaris and it is a homebrew 
system not on Oracle's support list.

I am sure if there is a patch for this or a way to get my 200G of data back 
some kind soul at Oracle will certainly help me since I lost
my data and getting it back isn't a matter of convenience. What an opportunity 
to generate some old fashioned goodwill!  :-)

Jim

 
 
 Thomas
 
 On 08/18/2011 06:40 PM, Stu Whitefish wrote:
  Hi Thomas,
 
  Thanks for that link. That's very similar but not identical. 
 There's a different line number in zfs_ioctl.c, mine and Preston's fail 
 on line 1815. It could be because of a difference in levels in that module of 
 course, but the traceback is not identical either. Ours show brand_sysenter 
 and 
 the one you linked to shows brand_sys_syscall. I don't know what all that 
 means but it is different. Anyway at least two of us have identical failures.
 
  I was not using crypto, just a plain jane mirror on 2 drives. Possibly I 
 had compression on a few file systems but everything else was allowed to 
 default.
 
  Here are our screenshots in case anybody doesn't want to go through the 
 thread.
 
 
  http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/
 
  http://prestonconnors.com/zvol_get_stats.jpg
 
 
  I hope somebody can help with this. It's not a good feeling having so 
 much data gone.
 
  Thanks for your help. Oracle, are you listening?
 
  Jim
 
 
 
  - Original Message -
     
  From: Thomas Gouverneurt...@ians.be
  To: zfs-discuss@opensolaris.org
  Cc: Stu Whitefishswhitef...@yahoo.com
  Sent: Thursday, August 18, 2011 1:57:29 PM
  Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
  You're probably hitting bug 7056738 -
  http://wesunsolve.net/bugid/id/7056738
  Looks like it's not fixed yet @ oracle anyway...
 
  Were you using crypto on your datasets ?
 
 
  Regards,
 
  Thomas
       
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
     
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-18 Thread Bob Friesenhahn

On Fri, 19 Aug 2011, Edho Arief wrote:


Asking Oracle for help without support contract would be like shouting
in vacuum space...


It seems that obtaining an Oracle support contract or a contract 
renewal is equally frustrating.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-16 Thread Stu Whitefish
- Original Message -

 From: Alexander Lesle gro...@tierarzt-mueller.de
 To: zfs-discuss@opensolaris.org
 Cc: 
 Sent: Monday, August 15, 2011 8:37:42 PM
 Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
 Hello Stu Whitefish and List,
 
 On August, 15 2011, 21:17 Stu Whitefish wrote in [1]:
 
  7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a
  kernel panic, even when booted from different OS versions
 
  Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest
  from Oracle) several times each as well as 2 new installs of Update 8.
 
 When I understand you right is your primary interest to recover your
 data on tank pool.
 
 Have you check the way to boot from a Live-DVD, mount your safe 
 place
 and copy the data on a other machine?

Hi Alexander,

Yes of course...the problem is no version of Solaris can import the pool. 
Please refer to the first message in the thread.

Thanks,

Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-16 Thread Stu Whitefish
- Original Message -

 From: John D Groenveld jdg...@elvis.arl.psu.edu
 To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org
 Cc: 
 Sent: Monday, August 15, 2011 6:12:37 PM
 Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
 In message 1313431448.5331.yahoomail...@web121911.mail.ne1.yahoo.com, 
 Stu Whi
 tefish writes:
 I'm sorry, I don't understand this suggestion.
 
 The pool that won't import is a mirror on two drives.
 
 Disconnect all but the two mirrored drives that you must import
 and try to import from a S11X LiveUSB.

Hi John,

Thanks for the suggestion, but it fails the same way. It panics and reboots too 
fast for me to capture the messages but they're the same as what I posted in 
the opening post of this thread.

This is a snap of zpool import before I tried importing it. Everything looks 
normal except it's odd the controller numbers keep changing.

http://imageshack.us/photo/my-images/705/sol11expresslive.jpg/

Thanks,

Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Stu Whitefish
 On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish

 swhitef...@yahoo.com wrote:
  # zpool import -f tank
 
  http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/
 
 I encourage you to open a support case and ask for an escalation on CR 
 7056738.
 
 -- 
 Mike Gerdts

Hi Mike,

Unfortunately I don't have a support contract. I've been trying to set up a 
development system on Solaris and learn it.
Until this happened, I was pretty happy with it. Even so, I don't have 
supported hardware so I couldn't buy a contract
until I bought another machine and I really have enough machines so I cannot 
justify the expense right now. And I
refuse to believe Oracle would hold people hostage in a situation like this, 
but I do believe they could generate a lot of
goodwill by fixing this for me and whoever else it happened to and telling us 
what level of Solaris 10 this is fixed at so
this doesn't continue happening. It's a pretty serious failure and I'm not the 
only one who it happened to.

It's incredible but in all the years I have been using computers I don't ever 
recall losing data due to a filesystem or OS issue.
That includes DOS, Windows, Linux, etc.

I cannot believe ZFS on Intel is so fragile that people lose hundreds of gigs 
of data and that's just the way it is. There
must be a way to recover this data and some advice on preventing it from 
happening again.

Thanks,
Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.

may be try the following
1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris 
then choose single user mode(6))

2)when ask to mount rpool just say no
3)mkdir /tmp/mnt1 /tmp/mnt2
4)zpool  import -f -R /tmp/mnt1 tank
5)zpool import -f -R /tmp/mnt2 rpool


On 8/15/2011 9:12 AM, Stu Whitefish wrote:

On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish
swhitef...@yahoo.com  wrote:

  # zpool import -f tank

  http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/

I encourage you to open a support case and ask for an escalation on CR 7056738.

--
Mike Gerdts

Hi Mike,

Unfortunately I don't have a support contract. I've been trying to set up a 
development system on Solaris and learn it.
Until this happened, I was pretty happy with it. Even so, I don't have 
supported hardware so I couldn't buy a contract
until I bought another machine and I really have enough machines so I cannot 
justify the expense right now. And I
refuse to believe Oracle would hold people hostage in a situation like this, 
but I do believe they could generate a lot of
goodwill by fixing this for me and whoever else it happened to and telling us 
what level of Solaris 10 this is fixed at so
this doesn't continue happening. It's a pretty serious failure and I'm not the 
only one who it happened to.

It's incredible but in all the years I have been using computers I don't ever 
recall losing data due to a filesystem or OS issue.
That includes DOS, Windows, Linux, etc.

I cannot believe ZFS on Intel is so fragile that people lose hundreds of gigs 
of data and that's just the way it is. There
must be a way to recover this data and some advice on preventing it from 
happening again.

Thanks,
Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Stu Whitefish


Hi. Thanks I have tried this on update 8 and Sol 11 Express.

The import always results in a kernel panic as shown in the picture.

I did not try an alternate mountpoint though. Would it make that much 
difference?


- Original Message -
 From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com
 To: zfs-discuss@opensolaris.org
 Cc: 
 Sent: Monday, August 15, 2011 3:06:20 PM
 Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
 may be try the following
 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris 
 then choose single user mode(6))
 2)when ask to mount rpool just say no
 3)mkdir /tmp/mnt1 /tmp/mnt2
 4)zpool  import -f -R /tmp/mnt1 tank
 5)zpool import -f -R /tmp/mnt2 rpool
 
 
 On 8/15/2011 9:12 AM, Stu Whitefish wrote:
  On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish
  swhitef...@yahoo.com  wrote:
    # zpool import -f tank
 
   http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/
  I encourage you to open a support case and ask for an escalation on CR 
 7056738.
 
  -- 
  Mike Gerdts
  Hi Mike,
 
  Unfortunately I don't have a support contract. I've been trying to 
 set up a development system on Solaris and learn it.
  Until this happened, I was pretty happy with it. Even so, I don't have 
 supported hardware so I couldn't buy a contract
  until I bought another machine and I really have enough machines so I 
 cannot justify the expense right now. And I
  refuse to believe Oracle would hold people hostage in a situation like 
 this, but I do believe they could generate a lot of
  goodwill by fixing this for me and whoever else it happened to and telling 
 us what level of Solaris 10 this is fixed at so
  this doesn't continue happening. It's a pretty serious failure and 
 I'm not the only one who it happened to.
 
  It's incredible but in all the years I have been using computers I 
 don't ever recall losing data due to a filesystem or OS issue.
  That includes DOS, Windows, Linux, etc.
 
  I cannot believe ZFS on Intel is so fragile that people lose hundreds of 
 gigs of data and that's just the way it is. There
  must be a way to recover this data and some advice on preventing it from 
 happening again.
 
  Thanks,
  Jim
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.



On 8/15/2011 11:25 AM, Stu Whitefish wrote:


Hi. Thanks I have tried this on update 8 and Sol 11 Express.

The import always results in a kernel panic as shown in the picture.

I did not try an alternate mountpoint though. Would it make that much 
difference?

try it



- Original Message -

From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.laot...@gmail.com
To: zfs-discuss@opensolaris.org
Cc:
Sent: Monday, August 15, 2011 3:06:20 PM
Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
inaccessible!

may be try the following
1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris
then choose single user mode(6))
2)when ask to mount rpool just say no
3)mkdir /tmp/mnt1 /tmp/mnt2
4)zpool  import -f -R /tmp/mnt1 tank
5)zpool import -f -R /tmp/mnt2 rpool


On 8/15/2011 9:12 AM, Stu Whitefish wrote:

  On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish
  swhitef...@yahoo.com   wrote:

# zpool import -f tank

   http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/

  I encourage you to open a support case and ask for an escalation on CR

7056738.
  -- 
  Mike Gerdts

  Hi Mike,

  Unfortunately I don't have a support contract. I've been trying to

set up a development system on Solaris and learn it.

  Until this happened, I was pretty happy with it. Even so, I don't have

supported hardware so I couldn't buy a contract

  until I bought another machine and I really have enough machines so I

cannot justify the expense right now. And I

  refuse to believe Oracle would hold people hostage in a situation like

this, but I do believe they could generate a lot of

  goodwill by fixing this for me and whoever else it happened to and telling

us what level of Solaris 10 this is fixed at so

  this doesn't continue happening. It's a pretty serious failure and

I'm not the only one who it happened to.

  It's incredible but in all the years I have been using computers I

don't ever recall losing data due to a filesystem or OS issue.

  That includes DOS, Windows, Linux, etc.

  I cannot believe ZFS on Intel is so fragile that people lose hundreds of

gigs of data and that's just the way it is. There

  must be a way to recover this data and some advice on preventing it from

happening again.

  Thanks,
  Jim
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Stu Whitefish
Unfortunately this panics the same exact way. Thanks for the suggestion though.



- Original Message -
 From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com
 To: zfs-discuss@opensolaris.org
 Cc: 
 Sent: Monday, August 15, 2011 3:06:20 PM
 Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
 may be try the following
 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris 
 then choose single user mode(6))
 2)when ask to mount rpool just say no
 3)mkdir /tmp/mnt1 /tmp/mnt2
 4)zpool  import -f -R /tmp/mnt1 tank
 5)zpool import -f -R /tmp/mnt2 rpool

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread LaoTsao
iirc if you use two hdd, you can import the zpool
can you try to import -R with only two hdd at time

Sent from my iPad
Hung-Sheng Tsao ( LaoTsao) Ph.D

On Aug 15, 2011, at 13:42, Stu Whitefish swhitef...@yahoo.com wrote:

 Unfortunately this panics the same exact way. Thanks for the suggestion 
 though.
 
 
 
 - Original Message -
 From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com
 To: zfs-discuss@opensolaris.org
 Cc: 
 Sent: Monday, August 15, 2011 3:06:20 PM
 Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
 may be try the following
 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris 
 then choose single user mode(6))
 2)when ask to mount rpool just say no
 3)mkdir /tmp/mnt1 /tmp/mnt2
 4)zpool  import -f -R /tmp/mnt1 tank
 5)zpool import -f -R /tmp/mnt2 rpool
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Paul Kraus
I am catching up here and wanted to see if I correctly understand the
chain of events...

1. Install system to pair of mirrored disks (c0t2d0s0 c0t3d0s0),
system works fine
2. add two more disks (c0t0d0s0 c0t1d0s0), create zpool tank, test and
determine these disks are fine
3. copy data to save to rpool (c0t2d0s0 c0t3d0s0)
3. install OS to c0t0d0s0, c0t1d0s0
4. reboot, system still boots from old rpool (c0t2d0s0 c0t3d0s0)
5. change boot device and boot from new OS (c0t0d0s0 c0t1d0s0)
6. cannot import old rpool (c0t2d0s0 c0t3d0s0) with your data

At this point could you still boot from the old rpool (c0t2d0s0 c0t3d0s0) ?

something happens and

7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a
kernel panic, even when booted from different OS versions

Have you been using the same hardware for all of this ?

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Designer: Frankenstein, A New Musical
(http://www.facebook.com/event.php?eid=123170297765140)
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Stu Whitefish
I'm sorry, I don't understand this suggestion.

The pool that won't import is a mirror on two drives.



- Original Message -
 From: LaoTsao laot...@gmail.com
 To: Stu Whitefish swhitef...@yahoo.com
 Cc: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org
 Sent: Monday, August 15, 2011 5:50:08 PM
 Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
 iirc if you use two hdd, you can import the zpool
 can you try to import -R with only two hdd at time
 
 Sent from my iPad
 Hung-Sheng Tsao ( LaoTsao) Ph.D
 
 On Aug 15, 2011, at 13:42, Stu Whitefish swhitef...@yahoo.com wrote:
 
  Unfortunately this panics the same exact way. Thanks for the suggestion 
 though.
 
 
 
  - Original Message -
  From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. 
 laot...@gmail.com
  To: zfs-discuss@opensolaris.org
  Cc: 
  Sent: Monday, August 15, 2011 3:06:20 PM
  Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
  may be try the following
  1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris 
  then choose single user mode(6))
  2)when ask to mount rpool just say no
  3)mkdir /tmp/mnt1 /tmp/mnt2
  4)zpool  import -f -R /tmp/mnt1 tank
  5)zpool import -f -R /tmp/mnt2 rpool
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread John D Groenveld
In message 1313431448.5331.yahoomail...@web121911.mail.ne1.yahoo.com, Stu Whi
tefish writes:
I'm sorry, I don't understand this suggestion.

The pool that won't import is a mirror on two drives.

Disconnect all but the two mirrored drives that you must import
and try to import from a S11X LiveUSB.

John
groenv...@acm.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Stu Whitefish
Hi Paul,

 1. Install system to pair of mirrored disks (c0t2d0s0 c0t3d0s0),

 system works fine

I don't remember at this point which disks were which, but I believe it was 0 
and 1 because during the first install there were only 2 drives in the box 
because I had only 2 drives.

 2. add two more disks (c0t0d0s0 c0t1d0s0), create zpool tank, test and
 determine these disks are fine

Again, probably was on disks 2 and 3 but in principle, correct.

 3. copy data to save to rpool (c0t2d0s0 c0t3d0s0)

I did this in a few steps that probably don't make sense because I had only 2 
500G drives at the beginning when I did my install. Later I got two 320G and 
realized I should have the root pool on the smaller drives. But in the interim, 
I installed the new pair of 320G and moved a bunch of data onto that pool. 
After the initial installation when update 8 first came out, what happened next 
was something like:

1. I created tank mirror on the 2 320G drives and moved data from another 
system on to the tank. After I verified it was good I rebooted the box and 
checked again and everything was healthy, all pools were imported and mounted 
correctly.

2. Then I realized I should install on the 320s and use the 500s for storage so 
I copied everything I had just put on the 320s (tank) onto the 500s (root). I 
rebooted again and verified the data on root was good, then I deleted it from 
tank.

3. I installed a new install on the 320s (formerly tank)

4. I rebooted and it used my old root on the 500s as root, which surprised me 
but makes sense now because it was created as rpool during the very first 
install.

5. I rebooted in single user mode and tried to import the new install. It 
imported fine.

6. I don't know what happened next but I believe after that I rebooted again to 
see why Solaris didn't choose the new install, the tank pool could not be 
imported and I got the panic shown in the screenshot.

 3. install OS to c0t0d0s0, c0t1d0s0
 4. reboot, system still boots from old rpool (c0t2d0s0 c0t3d0s0)

Correct. At some point I read you can change the name of the pool so I imported 
rpool as tank and that much worked. At this point both pools were still good, 
and now the install was correctly called rpool and my tank was called tank.

 5. change boot device and boot from new OS (c0t0d0s0 c0t1d0s0)

That was the surprising thing. I had already changed my BIOS to boot from the 
new pool, but that didn't stop Solaris from using the old install as the root 
pool, I guess because of the name. I thought originally as long as I specified 
the correct boot device I wouldn't have any problem, but even taking the old 
rpool out of the boot sequence and specifying only the newly installed pool as 
boot devices wasn't enough.

 6. cannot import old rpool (c0t2d0s0 c0t3d0s0) with your data
 
 At this point could you still boot from the old rpool (c0t2d0s0 c0t3d0s0) ?

Yes, I could use the newly installed pool to boot from, or import it from shell 
in several versions of Solaris/Sol 11, etc. Of course now I cannot, since I 
have installed so many times over that pool trying to get the other pool 
imported.

 
 something happens and
 
 7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a
 kernel panic, even when booted from different OS versions

Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest from Oracle) 
several times each as well as 2 new installs of Update 8.

 Have you been using the same hardware for all of this ?

Yes, I have. 

Thanks for the help,

Jim


Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Stu Whitefish
Given I can boot to single user mode and elect not to import or mount any 
pools, and that later I can issue an import against only the pool I need, I 
don't understand how this can help.

Still, given that nothing else seems to help I will try this and get back to 
you tomorrow.

Thanks,

Jim



- Original Message -
 From: John D Groenveld jdg...@elvis.arl.psu.edu
 To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org
 Cc: 
 Sent: Monday, August 15, 2011 6:12:37 PM
 Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data 
 inaccessible!
 
 In message 1313431448.5331.yahoomail...@web121911.mail.ne1.yahoo.com, 
 Stu Whi
 tefish writes:
 I'm sorry, I don't understand this suggestion.
 
 The pool that won't import is a mirror on two drives.
 
 Disconnect all but the two mirrored drives that you must import
 and try to import from a S11X LiveUSB.
 
 John
 groenv...@acm.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-15 Thread Alexander Lesle
Hello Stu Whitefish and List,

On August, 15 2011, 21:17 Stu Whitefish wrote in [1]:

 7. cannot import old rpool (c0t2d0s0 c0t3d0s0), any attempt causes a
 kernel panic, even when booted from different OS versions

 Right. I have tried OpenIndiana 151 and Solaris 11 Express (latest
 from Oracle) several times each as well as 2 new installs of Update 8.

When I understand you right is your primary interest to recover your
data on tank pool.

Have you check the way to boot from a Live-DVD, mount your safe place
and copy the data on a other machine?

-- 
Best Regards
Alexander
August, 15 2011

[1] mid:1313435871.14520.yahoomail...@web121919.mail.ne1.yahoo.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! assertion failed: zvol_get_stats(os, nv) == 0

2011-08-05 Thread Stu Whitefish
System: snv_151a 64 bit on Intel.
Error: panic[cpu0] assertion failed: zvol_get_stats(os, nv) == 0,
file: ../../common/fs/zfs/zfs_ioctl.c, line: 1815

Failure first seen on Solaris 10, update 8

History:

I recently received two 320G drives and realized from reading this list it
would have been better if I would have done the install on the small drives
but I didn't have them at the time. I added the two 320G drives and created
tank mirror.

I moved some data from other sources to the tank and then decided to go
ahead and do a new install. In preparation for that I moved all the data I
wanted to save onto the rpool mirror and then installed Solaris 10 update 8
again on the 320G drives.

When my system rebooted after the installation, I saw for some reason it
used my tank pool as root. I realize now since it was originally a root pool
and had boot blocks this didn't help. Anyway I shut down, changed the boot
order and then booted into my system. It paniced when trying to access the
tank and instantly rebooted. I had to go through this several times until I
caught a glimpse of one of the first messages:

assertion failed: zvol_get_stats(os, nv)

Here is what my system looks like when I boot into failsafe mode.

# zpool import
pool: rpool
id: 16453600103421700325
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

rpool ONLINE
mirror ONLINE
c0t2d0s0 ONLINE
c0t3d0s0 ONLINE

pool: tank
id: 12861119534757646169
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

tank ONLINE
mirror ONLINE
c0t0d0s0 ONLINE
c0t1d0s0 ONLINE

# zpool import tank
cannot import 'tank': pool may be in use from other system
use '-f' to import anyway

I
 installed Solaris 11 Express USB via Hiroshi-san's Windows tool. 
Unfortunately it also 

panics trying to import the pool although zpool 
import shows the pool online with no errors 

just like in the above doc.

http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/

and here is an eerily identical photo capture made by somebody with a 
similar/identical 

error. http://prestonconnors.com/zvol_get_stats.jpg

At first I thought it was a copy of my screenshot but I see his terminal is 
white and mine is black.

Looks
 like the problem has been around since 2009 although my problem is with
 a newly created 

mirror pool that had plenty of space available (200G in use out of about 500G) 
and no snapshots 

were taken.

Similar discussion with discouraging lack of follow up:
http://opensolaris.org/jive/message.jspa?messageID=376366

Looks like the defect, it's closed and I see no resolution.

https://defect.opensolaris.org/bz/show_bug.cgi?id=5682

I have about 200G of data on the tank pool, about 100G or so I don't have
anywhere else. I created this pool specifically to make a safe place to
store data that I had accumulated over several years and didn't have
organized
 yet. I can't believe such a serious bug has been around for two years
and hasn't been fixed. Can somebody please help me get this data back?

Thank you.

Jim 


I joined the forums but I didn't see my post on zfs-discuss mailing list which
seems alot more active than the forum. Sorry if this is a duplicate for people 
on the mailing list.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-05 Thread Stuart James Whitefish
I am opening a new thread since I found somebody else reported a similar 
failure in May and I didn't see a resolution hopefully this post will be easier 
to find for people with similar problems. Original thread was 
http://opensolaris.org/jive/thread.jspa?threadID=140861

System: snv_151a 64 bit on Intel.
Error: panic[cpu0] assertion failed: zvol_get_stats(os, nv) == 0,
file: ../../common/fs/zfs/zfs_ioctl.c, line: 1815

Failure first seen on Solaris 10, update 8

History:

I recently received two 320G drives and realized from reading this list it
would have been better if I would have done the install on the small drives
but I didn't have them at the time. I added the two 320G drives and created
tank mirror.

I moved some data from other sources to the tank and then decided to go
ahead and do a new install. In preparation for that I moved all the data I
wanted to save onto the rpool mirror and then installed Solaris 10 update 8
again on the 320G drives.

When my system rebooted after the installation, I saw for some reason it
used my tank pool as root. I realize now since it was originally a root pool
and had boot blocks this didn't help. Anyway I shut down, changed the boot
order and then booted into my system. It paniced when trying to access the
tank and instantly rebooted. I had to go through this several times until I
caught a glimpse of one of the first messages:

assertion failed: zvol_get_stats(os, nv)

Here is what my system looks like when I boot into failsafe mode.

# zpool import
pool: rpool
id: 16453600103421700325
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

rpool ONLINE
mirror ONLINE
c0t2d0s0 ONLINE
c0t3d0s0 ONLINE

pool: tank
id: 12861119534757646169
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

tank ONLINE
mirror ONLINE
c0t0d0s0 ONLINE
c0t1d0s0 ONLINE

# zpool import tank
cannot import 'tank': pool may be in use from other system
use '-f' to import anyway

Here is a photo of my screen (hah hah old fashioned screen shot) when Sol 11 
starts now that I tried importing my pool it fails constantly.

# zpool import -f tank

http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/

I installed Solaris 11 Express USB via Hiroshi-san's Windows tool. 
Unfortunately it also panics trying to import the pool although zpool import 
shows the pool online with no errors just like in the above doc.

and here is an eerily identical photo capture made by somebody with a 
similar/identical error. http://prestonconnors.com/zvol_get_stats.jpg

At first I thought it was a copy of my screenshot but I see his terminal is 
white and mine is black.

Looks like the problem has been around since 2009 although my problem is with a 
newly created mirror pool that had plenty of space available (200G in use out 
of about 500G) and no snapshots were taken.

Similar discussion with discouraging lack of follow up:
http://opensolaris.org/jive/message.jspa?messageID=376366

Looks like the defect, it's closed and I see no resolution.

https://defect.opensolaris.org/bz/show_bug.cgi?id=5682

I have about 200G of data on the tank pool, about 100G or so I don't have
anywhere else. I created this pool specifically to make a safe place to
store data that I had accumulated over several years and didn't have
organized yet. I can't believe such a serious bug has been around for two years 
and hasn't been fixed. Can somebody please help me get this data back?

Thank you.

Jim
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-05 Thread Mike Gerdts
On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish
swhitef...@yahoo.com wrote:
 # zpool import -f tank

 http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/

I encourage you to open a support case and ask for an escalation on CR 7056738.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on USB disk power loss

2011-01-19 Thread Richard Elling
On Jan 15, 2011, at 10:33 AM, Reginald Beardsley wrote:

 I was copying a filesystem using zfs send | zfs receive and inadvertently 
 unplugged the power to the USB disk that was the destination.   Much to my 
 horror this caused the system to panic.  I recovered fine on rebooting, but 
 it *really* unnerved me.
 
 I don't find anything about this online.  I would expect it would trash the 
 copy operation, but the panic seemed a bit extreme.
 
 It's an Ultra 20 running Solaris 10 Generic_137112-02
 
 I've got a copy of U8 I'm planning to install as the U9 license seems to 
 prohibit my using it.
 
 Suggestions?  I'd like to understand what happened and why the system went 
 down.

Long, long ago the default failure mode for failed writes was panic.
This was changed for several years ago with the introduction of the
failmode property.  Since ZFS is ported to Solaris 10, perhaps the
failmode property is not available until you upgrade?  To see:
zpool get all poolname
If there is no failmode property, then upgrade.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on USB disk power loss

2011-01-19 Thread Reginald Beardsley


--- On Wed, 1/19/11, Richard Elling richard.ell...@gmail.com wrote:

 From: Richard Elling richard.ell...@gmail.com
 Subject: Re: [zfs-discuss] kernel panic on USB disk power loss
 To: Reginald Beardsley pulask...@yahoo.com
 Cc: zfs-discuss@opensolaris.org
 Date: Wednesday, January 19, 2011, 8:59 AM
 On Jan 15, 2011, at 10:33 AM,
 Reginald Beardsley wrote:
 
  I was copying a filesystem using zfs send | zfs
 receive and inadvertently unplugged the power to the USB
 disk that was the destination.   Much to my
 horror this caused the system to panic.  I recovered
 fine on rebooting, but it *really* unnerved me.
  
  I don't find anything about this online.  I would
 expect it would trash the copy operation, but the panic
 seemed a bit extreme.
  
  It's an Ultra 20 running Solaris 10 Generic_137112-02
  
  I've got a copy of U8 I'm planning to install as the
 U9 license seems to prohibit my using it.
  
  Suggestions?  I'd like to understand what
 happened and why the system went down.
 
 Long, long ago the default failure mode for failed writes
 was panic.
 This was changed for several years ago with the
 introduction of the
 failmode property.  Since ZFS is ported to Solaris
 10, perhaps the
 failmode property is not available until you upgrade? 
 To see:
     zpool get all poolname
 If there is no failmode property, then upgrade.
  -- richard
 
 

Thanks.  That probably explains it.  The last update on the system was before 
ZFS root was available.  I'm in the long delayed process of upgrading to U8 and 
taking my main network offline w/ just a minimal system connected to the 
Internet.  The browsers are just too vulnerable.

Eventually I'll migrate to OpenIndiana for all my Solaris instances.  But for 
now mirrored ZFS on U8 will have to do since U9 is So Larry's.



  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] kernel panic on USB disk power loss

2011-01-18 Thread Reginald Beardsley
I was copying a filesystem using zfs send | zfs receive and inadvertently 
unplugged the power to the USB disk that was the destination.   Much to my 
horror this caused the system to panic.  I recovered fine on rebooting, but it 
*really* unnerved me.

I don't find anything about this online.  I would expect it would trash the 
copy operation, but the panic seemed a bit extreme.

It's an Ultra 20 running Solaris 10 Generic_137112-02

I've got a copy of U8 I'm planning to install as the U9 license seems to 
prohibit my using it.

Suggestions?  I'd like to understand what happened and why the system went down.

Thanks,
Reg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel panic after upgrading from snv_138 to snv_140

2010-10-06 Thread Thorsten Heit
Hi,

my machine is a HP ProLiant ML350 G5 with 2 quad-core Xeons, 32GB RAM and a HP 
SmartArray E200i RAID controller with 3x160 and 3x500GB SATA discs connected to 
it. Two of the 160GB discs build the mirrored root pool (rpool), the third 
serves as a temporary data pool called tank, and the three 500G discs form a 
RAIDZ1 pool called daten.

So far I successfully upgraded from OpenSolaris b134 to b138 by manually 
building ONNV. Recently I built b140, installed it, but unfortunately booting 
results in a kernel panic:

...
NOTICE: zfs_parse_bootfs: error 22
Cannot mount root on rpool/187 fstype zfs

panic[cpu0]/thread=fbc2f660: vfs_mountroot: cannot mount root

fbc71ba0 genunix:vfs_mountroot+32e ()
fbc71bd0 genunix:main+136 ()
fbc71be0 unix:_locore_start+92 ()

panic: entering debugger (no dump device, continue to reboot)

Welcome to kmdb
Loaded modules: [ scsi_vhci mac uppc sd unix zfs krtld genunix specfs pcplusmp 
cpu.generic ]
[0]


Before the above attempt with b140, I tried to upgrade to OpenIndiana, but have 
quite the same problem; OI doesn't boot neither. See 
http://openindiana.org/pipermail/openindiana-discuss/2010-September/000504.html

Any ideas what is causing this kernel panic?


Regards

Thorsten
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?

2010-09-28 Thread Meilicke, Scott
Brilliant. I set those parameters via /etc/system, rebooted, and the pool
imported with just the ­f switch. I had seen this as an option earlier,
although not that thread, but was not sure it applied to my case.

Scrub is running now. Thank you very much!

-Scott


On 9/23/10 7:07 PM, David Blasingame Oracle david.blasing...@oracle.com
wrote:

 Have you tried setting zfs_recover  aok in /etc/system or setting it with the
 mdb?
 
 Read how to set via /etc/system
 http://opensolaris.org/jive/thread.jspa?threadID=114906
 
 mdb debugger
 http://www.listware.net/201009/opensolaris-zfs/46706-re-zfs-discuss-how-to-set
 -zfszfsrecover1-and-aok1-in-grub-at-startup.html
 
 After you get the variables set and system booted, try importing, then running
 a scrub. 
 
 Dave
 
 On 09/23/10 19:48, Scott Meilicke wrote:
  
 I posted this on the www.nexentastor.org http://www.nexentastor.org
 forums, but no answer so far, so I apologize if you are seeing this twice. I
 am also engaged with nexenta support, but was hoping to get some additional
 insights here. 
 
 I am running nexenta 3.0.3 community edition, based on 134. The box crashed
 yesterday, and goes into a reboot loop (kernel panic) when trying to import
 my data pool, screenshot attached. What I have tried thus far:
 
 Boot off of DVD, both 3.0.3 and 3.0.4 beta 8. 'zpool import -f data01' causes
 the panic in both cases.
 Boot off of 3.0.4 beta 8, ran zpool import -fF data01
 That gives me a message like Pool data01 returned to its stat as of ...,
 and then panics.
 
 The import -fF does seem to import the pool, but then immediately panic. So
 after booting off of DVD, I can boot from my hard disks, and the system will
 not import the pool because it was last imported from another system.
 
 I have moved /etc/zfs/zfs.cache out of the way, but no luck after a reboot
 and import.
 
 zpool import shows all of my disks are OK, and the pool itself is online.
 
 Is it time to start working with zdb? Any suggestions?
 
 This box is hosting development VMs, so I have some people idling their
 thumbs at the moment.
 
 Thanks everyone,
 
 -Scott
   
  
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?

2010-09-27 Thread Scott Meilicke
I just realized that the email I sent to David and the list did not make the 
list (at least as jive can see it), so here is what I sent on the 23rd:

Brilliant. I set those parameters via /etc/system, rebooted, and the pool 
imported with just the –f switch. I had seen this as an option earlier, 
although not that thread, but was not sure it applied to my case.

Scrub is running now. Thank you very much! 

-Scott

Update: The scrub finished with zero errors.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?

2010-09-23 Thread David Blasingame Oracle
Have you tried setting zfs_recover  aok in /etc/system or setting it 
with the mdb?


Read how to set via /etc/system
http://opensolaris.org/jive/thread.jspa?threadID=114906

mdb debugger
http://www.listware.net/201009/opensolaris-zfs/46706-re-zfs-discuss-how-to-set-zfszfsrecover1-and-aok1-in-grub-at-startup.html

After you get the variables set and system booted, try importing, then 
running a scrub.


Dave

On 09/23/10 19:48, Scott Meilicke wrote:
I posted this on the www.nexentastor.org forums, but no answer so far, so I apologize if you are seeing this twice. I am also engaged with nexenta support, but was hoping to get some additional insights here. 


I am running nexenta 3.0.3 community edition, based on 134. The box crashed 
yesterday, and goes into a reboot loop (kernel panic) when trying to import my 
data pool, screenshot attached. What I have tried thus far:

Boot off of DVD, both 3.0.3 and 3.0.4 beta 8. 'zpool import -f data01' causes 
the panic in both cases.
Boot off of 3.0.4 beta 8, ran zpool import -fF data01
That gives me a message like Pool data01 returned to its stat as of ..., and 
then panics.

The import -fF does seem to import the pool, but then immediately panic. So after booting off of DVD, I can boot from my hard disks, and the system will not import the pool because it was last imported from another system. 


I have moved /etc/zfs/zfs.cache out of the way, but no luck after a reboot and 
import.

zpool import shows all of my disks are OK, and the pool itself is online.

Is it time to start working with zdb? Any suggestions?

This box is hosting development VMs, so I have some people idling their thumbs 
at the moment.

Thanks everyone,

-Scott
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



--


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel panic on import / interrupted zfs destroy

2010-08-18 Thread Matthew Ellison
I have a box running snv_134 that had a little boo-boo.

The problem first started a couple of weeks ago with some corruption on two 
filesystems in a 11 disk 10tb raidz2 set.  I ran a couple of scrubs that 
revealed a handful of corrupt files on my 2 de-duplicated zfs filesystems.  No 
biggie.

I thought that my problems had something to do with de-duplication in 134, so I 
went about the process of creating new filesystems and copying over the good 
files to another box.  Every time I touched the bad files I got a filesystem 
error 5.  When trying to delete them manually, I got kernel panics - which 
eventually turned into reboot loops.

I tried installing nexenta on another disk to see if that would allow me to get 
passed the reboot loop - which it did.  I finished moving the good files over 
(using rsync, which skipped over the error 5 files, unlike cp or mv), and 
destroyed one of the two filesystems.  Unfortunately, this caused a kernel 
panic in the middle of the destroy operation, which then became another panic / 
reboot loop.

I was able to get in with milestone=none and delete the zfs cache, but now I 
have a new problem:  Any attempt to import the pool results in a panic.  I have 
tried from my snv_134 install, from the live cd, and from nexenta.  I have 
tried various zdb incantations (with aok=1 and zfs:zfs_recover=1), to no avail 
- these error out after a few minutes.  I have even tried another controller.

I have zdb -e -bcsvL running now from 134 (without aok=1) which has been 
running for several hours.  Can zdb recover from this kind of situation (with a 
half-destroyed filesystem that panics the kernel on import?)  What is the 
impact of the above zdb operation without aok=1?  Is there any likelihood of a 
recovery of non-affected filesystems?

Any suggestions?

Regards,

Matthew Ellison
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-08-17 Thread Victor Latushkin

On Jul 9, 2010, at 4:27 AM, George wrote:

 I think it is quite likely to be possible to get
 readonly access to your data, but this requires
 modified ZFS binaries. What is your pool version?
 What build do you have installed on your system disk
 or available as LiveCD?

For the record - using ZFS readonly import code backported to build 134 and 
slightly modified to account for specific corruptions of this case we've been 
able to import pool in readonly mode and George is now backing up his data.

As soon as that completes I hope to have a chance to have another look into it 
to see what else we can learn from this case.

regards
victor

 
 [Prompted by an off-list e-mail from Victor asking if I was still having 
 problems]
 
 Thanks for your reply, and apologies for not having replied here sooner - I 
 was going to try something myself (which I'll explain shortly) but have been 
 hampered by a flakey cdrom drive - something I won't have chance to sort 
 until the weekend.
 
 In answer to your question the installed system is running 2009.06 (b111b) 
 and the LiveCD I've been using is b134.
 
 The problem with the Installed system crashing when I tried to run zpool 
 clean I believe is being caused by 
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 which 
 makes me think that the same command run from a later version should work 
 fine.
 
 I haven't had any success doing this though and I believe the reason is that 
 several of the ZFS commands won't work if the hostid of the machine to last 
 access the pool is different from the current system (and the pool is 
 exported/faulted), as happens when using a LiveCD. Where I was getting errors 
 about storage2 does not exist I found it was writing errors to the syslog 
 that the pool could not be loaded as it was last accessed by another 
 system. I tried to get round this using the Dtrace hostid changing script I 
 mentioned in one of my earlier messages but this seemed not to be able to 
 fool system processes.
 
 I also tried exporting the pool from the Installed system to see if that 
 would help but unfortunately it didn't. After having exported the pool zfs 
 import run on the Installed system reported The pool can be imported 
 despite missing or damaged devices. however when trying to import it (with 
 or without -f) it refused to import it as one or more devices is currently 
 unavailable. When booting the LiveCD after having exported the pool it still 
 gave errors about having been last accessed by another system.
 
 I couldn't spot any method of modifying the LiveCD image to have a particular 
 hostid so my plan therefore has been to try installing b134 onto the system, 
 setting the hostid under /etc and seeing if things then behaved in a more 
 straightforward fashion, which I haven't managed yet due to the cdrom 
 problems.
 
 I also mentioned in one of my earlier e-mails that I was confused that the 
 Installed system mentioned an unreadable intent log but the LiveCD said the 
 problem was corrupted metadata. This seems to be caused by the functions 
 print_import_config and print_statement_config having slightly different case 
 statements and not a difference in the pool itself.
 
 Hopefully I'll be able to complete the reinstall soon and see if that fixes 
 things or there's a deeper problem.
 
 Thanks again for your help,
 
 George
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-07-08 Thread George
 I think it is quite likely to be possible to get
 readonly access to your data, but this requires
 modified ZFS binaries. What is your pool version?
 What build do you have installed on your system disk
 or available as LiveCD?

[Prompted by an off-list e-mail from Victor asking if I was still having 
problems]

Thanks for your reply, and apologies for not having replied here sooner - I was 
going to try something myself (which I'll explain shortly) but have been 
hampered by a flakey cdrom drive - something I won't have chance to sort until 
the weekend.

In answer to your question the installed system is running 2009.06 (b111b) and 
the LiveCD I've been using is b134.

The problem with the Installed system crashing when I tried to run zpool 
clean I believe is being caused by 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 which makes 
me think that the same command run from a later version should work fine.

I haven't had any success doing this though and I believe the reason is that 
several of the ZFS commands won't work if the hostid of the machine to last 
access the pool is different from the current system (and the pool is 
exported/faulted), as happens when using a LiveCD. Where I was getting errors 
about storage2 does not exist I found it was writing errors to the syslog 
that the pool could not be loaded as it was last accessed by another system. 
I tried to get round this using the Dtrace hostid changing script I mentioned 
in one of my earlier messages but this seemed not to be able to fool system 
processes.

I also tried exporting the pool from the Installed system to see if that would 
help but unfortunately it didn't. After having exported the pool zfs import 
run on the Installed system reported The pool can be imported despite missing 
or damaged devices. however when trying to import it (with or without -f) it 
refused to import it as one or more devices is currently unavailable. When 
booting the LiveCD after having exported the pool it still gave errors about 
having been last accessed by another system.

I couldn't spot any method of modifying the LiveCD image to have a particular 
hostid so my plan therefore has been to try installing b134 onto the system, 
setting the hostid under /etc and seeing if things then behaved in a more 
straightforward fashion, which I haven't managed yet due to the cdrom problems.

I also mentioned in one of my earlier e-mails that I was confused that the 
Installed system mentioned an unreadable intent log but the LiveCD said the 
problem was corrupted metadata. This seems to be caused by the functions 
print_import_config and print_statement_config having slightly different case 
statements and not a difference in the pool itself.

Hopefully I'll be able to complete the reinstall soon and see if that fixes 
things or there's a deeper problem.

Thanks again for your help,

George
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-07-06 Thread Victor Latushkin

On Jul 3, 2010, at 1:20 PM, George wrote:

 Because of that I'm thinking that I should try
 to change the hostid when booted from the CD to be
 the same as the previously installed system to see if
 that helps - unless that's likely to confuse it at
 all...?
 
 I've now tried changing the hostid using the code from 
 http://forums.sun.com/thread.jspa?threadID=5075254 NB: you need to leave this 
 running in a separate terminal.
 
 This changes the start of zpool import to
 
  pool: storage2
id: 14701046672203578408
 state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-72
 
 
 but otherwise nothing is changed with respect to trying to import or clear 
 the pool. The pool is 8TB and the machine has 4GB but as far as I can see via 
 top the commands aren't failing due to a lack of memory.
 
 I'm a bit stumped now. The only thing else I can think to try is inserting 
 c9t4d4 (the new drive) and removing c6t4d0 (which should be fine). The 
 problem with this though is that it relies on c7t4d0 (which is faulty) and so 
 it assumes that the errors can be cleared, the replace stopped and the drives 
 swapped back before further errors happen.

I think it is quite likely to be possible to get readonly access to your data, 
but this requires modified ZFS binaries. What is your pool version? What build 
do you have installed on your system disk or available as LiveCD?

regards
victor

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-07-06 Thread Roy Sigurd Karlsbakk
 I think it is quite likely to be possible to get readonly access to
 your data, but this requires modified ZFS binaries. What is your pool
 version? What build do you have installed on your system disk or
 available as LiveCD?

Sorry, but does this mean if ZFS can't write to the drives, access to the pool 
won't be possible? If so, that's rather scary...

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-07-06 Thread Victor Latushkin

On Jun 28, 2010, at 11:27 PM, George wrote:

 Again this core dumps when I try to do zpool clear storage2
 
 Does anyone have any suggestions what would be the best course of action now?

Do you have any crahsdumps saved? First one is most interesting one...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-07-03 Thread George
 Because of that I'm thinking that I should try
 to change the hostid when booted from the CD to be
 the same as the previously installed system to see if
 that helps - unless that's likely to confuse it at
 all...?

I've now tried changing the hostid using the code from 
http://forums.sun.com/thread.jspa?threadID=5075254 NB: you need to leave this 
running in a separate terminal.

This changes the start of zpool import to

  pool: storage2
id: 14701046672203578408
 state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-72


but otherwise nothing is changed with respect to trying to import or clear the 
pool. The pool is 8TB and the machine has 4GB but as far as I can see via top 
the commands aren't failing due to a lack of memory.

I'm a bit stumped now. The only thing else I can think to try is inserting 
c9t4d4 (the new drive) and removing c6t4d0 (which should be fine). The problem 
with this though is that it relies on c7t4d0 (which is faulty) and so it 
assumes that the errors can be cleared, the replace stopped and the drives 
swapped back before further errors happen.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-07-02 Thread George
 I think I'll try booting from a b134 Live CD and see
 that will let me fix things.

Sadly it appears not - at least not straight away.

Running zpool import now gives

  pool: storage2
id: 14701046672203578408
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

storage2 FAULTED  corrupted data
  raidz1-0   FAULTED  corrupted data
c6t4d2   ONLINE
c6t4d3   ONLINE
c7t4d2   ONLINE
c7t4d3   ONLINE
  raidz1-1   FAULTED  corrupted data
c7t4d0   ONLINE
replacing-1  UNAVAIL  insufficient replicas
  c6t4d0 FAULTED  corrupted data
  c9t4d4 UNAVAIL  cannot open
c7t4d1   ONLINE
c6t4d1   ONLINE

If I do zpool import -f storage2 it complains about devices being faulted and 
suggests destroying the pool.
If I do zpool clean storage2 or zpool clean storage2 c9t4d4 these say that 
storage2 does not exist.
If I do zpool import -nF storage2 this says that the pool was last run on 
another system and prompts for -f.
if I do zpool import -fnF storage2 this appears to quit silently.

I don't really understand why the installed system is very specific about the 
problem being with the intent log (and suggesting it just needs clearing) but 
booting from the b134 CD doesn't pick up on that, unless it's being masked by 
the hostid mismatch error. Because of that I'm thinking that I should try to 
change the hostid when booted from the CD to be the same as the previously 
installed system to see if that helps - unless that's likely to confuse it at 
all...?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-06-30 Thread George
 I suggest you to try running 'zdb -bcsv storage2' and
 show the result.

r...@crypt:/tmp# zdb -bcsv storage2
zdb: can't open storage2: No such device or address

then I tried

r...@crypt:/tmp# zdb -ebcsv storage2
zdb: can't open storage2: File exists

George
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-06-30 Thread Victor Latushkin

On Jun 30, 2010, at 10:48 AM, George wrote:

 I suggest you to try running 'zdb -bcsv storage2' and
 show the result.
 
 r...@crypt:/tmp# zdb -bcsv storage2
 zdb: can't open storage2: No such device or address
 
 then I tried
 
 r...@crypt:/tmp# zdb -ebcsv storage2
 zdb: can't open storage2: File exists

Please try 

zdb -U /dev/null -ebcsv storage2
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-06-30 Thread George
 Please try 
 
 zdb -U /dev/null -ebcsv storage2

r...@crypt:~# zdb -U /dev/null -ebcsv storage2
zdb: can't open storage2: No such device or address

If I try

r...@crypt:~# zdb -C storage2

Then it prints what appears to be a valid configuration but then the same error 
message about being unable to find the device (output attached).

George
-- 
This message posted from opensolaris.orgr...@crypt:~# zdb -C storage2
version=14
name='storage2'
state=0
txg=1807366
pool_guid=14701046672203578408
hostid=8522651
hostname='crypt'
vdev_tree
type='root'
id=0
guid=14701046672203578408
children[0]
type='raidz'
id=0
guid=15861342641545291969
nparity=1
metaslab_array=14
metaslab_shift=35
ashift=9
asize=3999672565760
is_log=0
children[0]
type='disk'
id=0
guid=14390766171745861103
path='/dev/dsk/c9t4d2s0'
devid='id1,s...@n600d0230006c8a5f0c3fd863ea736d00/a'

phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,2:a'
whole_disk=1
DTL=301
children[1]
type='disk'
id=1
guid=14806610527738068493
path='/dev/dsk/c9t4d3s0'
devid='id1,s...@n600d0230006c8a5f0c3fd8514ed8d900/a'

phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,3:a'
whole_disk=1
DTL=300
children[2]
type='disk'
id=2
guid=4272121319363331595
path='/dev/dsk/c10t4d2s0'
devid='id1,s...@n600d0230006c8a5f0c3fd84312aa6d00/a'

phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,2:a'
whole_disk=1
DTL=299
children[3]
type='disk'
id=3
guid=16286569401176941639
path='/dev/dsk/c10t4d4s0'
devid='id1,s...@n600d0230006c8a5f0c3fd8415c62ae00/a'

phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,4:a'
whole_disk=1
DTL=296
children[1]
type='raidz'
id=1
guid=12601468074885676119
nparity=1
metaslab_array=172
metaslab_shift=35
ashift=9
asize=3999672565760
is_log=0
children[0]
type='disk'
id=0
guid=7040280703157905854
path='/dev/dsk/c10t4d0s0'
devid='id1,s...@n600d0230006c8a5f0c3fd83eda0a4a00/a'

phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,0:a'
whole_disk=1
DTL=305
children[1]
type='replacing'
id=1
guid=16928413524184799719
whole_disk=0
children[0]
type='disk'
id=0
guid=9102173991259789741
path='/dev/dsk/c9t4d0s0'

devid='id1,s...@n600d0230006c8a5f0c3fd86eee69a300/a'

phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,0:a'
whole_disk=1
DTL=304
children[1]
type='disk'
id=1
guid=16888611779137638814
path='/dev/dsk/c9t4d4s0'

devid='id1,s...@n600d0230006c8a5f0c3fd8612edc7d00/a'

phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,4:a'
whole_disk=1
DTL=321
children[2]
type='disk'
id=2
guid=4025009484028197162
path='/dev/dsk/c10t4d1s0'
devid='id1,s...@n600d0230006c8a5f0c3fd8609d147700/a'

phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,1:a'
whole_disk=1
DTL=303
children[3]
 

Re: [zfs-discuss] Kernel Panic on zpool clean

2010-06-30 Thread George
Aha:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136

I think I'll try booting from a b134 Live CD and see that will let me fix 
things.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-06-29 Thread George
Another related question - 

I have a second enclosure with blank disks which I would like to use to take a 
copy of the existing zpool as a precaution before attempting any fixes. The 
disks in this enclosure are larger than those that the one with a problem.

What would be the best way to do this?

If I were to clone the disks 1:1 would the difference in size cause any 
problems? I also had an idea that I might be able to DD the original disks into 
files on a ZFS on the second enclosure and mount the files but the few results 
I've turned up on the subject seem to say this is a bad idea.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-06-29 Thread Victor Latushkin

On Jun 29, 2010, at 1:30 AM, George wrote:

 I've attached the output of those commands. The machine is a v20z if that 
 makes any difference.

Stack trace is similar to one bug that I do not recall right now, and it 
indicates that there's likely a corruption in ZFS metadata.

I suggest you to try running 'zdb -bcsv storage2' and show the result.

victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zpool status -v (build 143)

2010-06-28 Thread Andrej Podzimek

I ran 'zpool scrub' and will report what happens once it's finished. (It will 
take pretty long.)


The scrub finished successfully (with no errors) and 'zpool status -v' doesn't 
crash the kernel any more.

Andrej



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel Panic on zpool clean

2010-06-28 Thread George
Hi,

I have a machine running 2009.06 with 8 SATA drives in SCSI connected enclosure.

I had a drive fail and accidentally replaced the wrong one, which 
unsurprisingly caused the rebuild to fail. The status of the zpool then ended 
up as:

 pool: storage2
 state: FAULTED
status: An intent log record could not be read.
Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
storage2   FAULTED  0 0 1  bad intent log
raidz1   ONLINE   0 0 0
c9t4d2 ONLINE   0 0 0
c9t4d3 ONLINE   0 0 0
c10t4d2ONLINE   0 0 0
c10t4d4ONLINE   0 0 0
raidz1   DEGRADED 0 0 6
c10t4d0UNAVAIL  0 0 0  cannot open
replacing  ONLINE   0 0 0
c9t4d0   ONLINE   0 0 0
c10t4d3  ONLINE   0 0 0
c10t4d1ONLINE   0 0 0
c9t4d1 ONLINE   0 0 0

running zpool clear storage2 caused the machine to dump and reboot.
I've tried removing the spare and putting back the faulty drive to give:

  pool: storage2
 state: FAULTED
status: An intent log record could not be read.
Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
storage2   FAULTED  0 0 1  bad intent log
raidz1   ONLINE   0 0 0
c9t4d2 ONLINE   0 0 0
c9t4d3 ONLINE   0 0 0
c10t4d2ONLINE   0 0 0
c10t4d4ONLINE   0 0 0
raidz1   DEGRADED 0 0 6
c10t4d0FAULTED  0 0 0  corrupted data
replacing  DEGRADED 0 0 0
c9t4d0   ONLINE   0 0 0
c9t4d4   UNAVAIL  0 0 0  cannot open
c10t4d1ONLINE   0 0 0
c9t4d1 ONLINE   0 0 0

Again this core dumps when I try to do zpool clear storage2

Does anyone have any suggestions what would be the best course of action now?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-06-28 Thread Victor Latushkin

On Jun 28, 2010, at 11:27 PM, George wrote:

 I've tried removing the spare and putting back the faulty drive to give:
 
  pool: storage2
 state: FAULTED
 status: An intent log record could not be read.
Waiting for adminstrator intervention to fix the faulted pool.
 action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
 config:
 
NAME   STATE READ WRITE CKSUM
storage2   FAULTED  0 0 1  bad intent log
raidz1   ONLINE   0 0 0
c9t4d2 ONLINE   0 0 0
c9t4d3 ONLINE   0 0 0
c10t4d2ONLINE   0 0 0
c10t4d4ONLINE   0 0 0
raidz1   DEGRADED 0 0 6
c10t4d0FAULTED  0 0 0  corrupted data
replacing  DEGRADED 0 0 0
c9t4d0   ONLINE   0 0 0
c9t4d4   UNAVAIL  0 0 0  cannot open
c10t4d1ONLINE   0 0 0
c9t4d1 ONLINE   0 0 0
 
 Again this core dumps when I try to do zpool clear storage2
 
 Does anyone have any suggestions what would be the best course of action now?

I think first we need to understand why it does not like 'zpool clear', as that 
may provide better understanding of what is wrong.

For that you need to create directory for saving crashdumps e.g. like this

mkdir -p /var/crash/`uname -n`

then run savecore and see if it would save a crash dump into that directory.

If crashdump is there, then you need to perform some basic investigation:

cd /var/crash/`uname -n`

mdb dump number

::status
::stack
::spa -c
::spa -v
::spa -ve
$q

for a start.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic on zpool clean

2010-06-28 Thread George
I've attached the output of those commands. The machine is a v20z if that makes 
any difference.

Thanks,

George
-- 
This message posted from opensolaris.orgmdb: logging to debug.txt
 ::status
debugging crash dump vmcore.0 (64-bit) from crypt
operating system: 5.11 snv_111b (i86pc)
panic message: 
BAD TRAP: type=e (#pf Page fault) rp=ff00084fc660 addr=0 occurred in module 
unix due to a NULL pointer dereference
dump content: kernel pages only



 ::stack
mutex_enter+0xb()
metaslab_free+0x12e(ff01c9fb3800, ff01cce64668, 1b9528, 0)
zio_dva_free+0x26(ff01cce64608)
zio_execute+0xa0(ff01cce64608)
zio_nowait+0x5a(ff01cce64608)
arc_free+0x197(ff01cf0c80c0, ff01c9fb3800, 1b9528, ff01d389bcf0, 0, 
0)
dsl_free+0x30(ff01cf0c80c0, ff01d389bcc0, 1b9528, ff01d389bcf0, 0, 0
)
dsl_dataset_block_kill+0x293(0, ff01d389bcf0, ff01cf0c80c0, 
ff01d18cfd80)
dmu_objset_sync+0xc4(ff01cffe0080, ff01cf0c80c0, ff01d18cfd80)
dsl_pool_sync+0x1ee(ff01d389bcc0, 1b9528)
spa_sync+0x32a(ff01c9fb3800, 1b9528)
txg_sync_thread+0x265(ff01d389bcc0)
thread_start+8()



 ::spa -c
ADDR STATE NAME
ff01c8df3000ACTIVE rpool

version=000e
name='rpool'
state=
txg=056a6ad1
pool_guid=53825ef3c58abc97
hostid=00820b9b
hostname='crypt'
vdev_tree
type='root'
id=
guid=53825ef3c58abc97
children[0]
type='mirror'
id=
guid=e9b8daed37492cfe
whole_disk=
metaslab_array=0017
metaslab_shift=001d
ashift=0009
asize=001114e0
is_log=
children[0]
type='disk'
id=
guid=ad7e5022f804365a
path='/dev/dsk/c8t0d0s0'
devid='id1,s...@sseagate_st373307lc__3hz76yyd743809wm/a'
phys_path='/p...@0,0/pci1022,7...@a/pci17c2,1...@4/s...@0,0:a'
whole_disk=
DTL=0052
children[1]
type='disk'
id=0001
guid=2f7a03c75a4931ac
path='/dev/dsk/c8t1d0s0'
devid='id1,s...@sseagate_st373307lc__3hz80bdp743793pa/a'
phys_path='/p...@0,0/pci1022,7...@a/pci17c2,1...@4/s...@1,0:a'
whole_disk=
DTL=0050
ff01c9fb3800ACTIVE storage2

version=000e
name='storage2'
state=
txg=001b9406
pool_guid=cc049c0f1321fc28
hostid=00820b9b
hostname='crypt'
vdev_tree
type='root'
id=
guid=cc049c0f1321fc28
children[0]
type='raidz'
id=
guid=dc1ecf18721028c1
nparity=0001
metaslab_array=000e
metaslab_shift=0023
ashift=0009
asize=03a33f10
is_log=
children[0]
type='disk'
id=
guid=c7b64596709ebdef
path='/dev/dsk/c9t4d2s0'
devid='id1,s...@n600d0230006c8a5f0c3fd863ea736d00/a'
phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,2:a'
whole_disk=0001
DTL=012d
children[1]
type='disk'
id=0001
guid=cd7ba5d38162fe0d
path='/dev/dsk/c9t4d3s0'
devid='id1,s...@n600d0230006c8a5f0c3fd8514ed8d900/a'
phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1/s...@4,3:a'
whole_disk=0001
DTL=012c
children[2]
type='disk'
id=0002
guid=3b499fb48e06460b
path='/dev/dsk/c10t4d2s0'
devid='id1,s...@n600d0230006c8a5f0c3fd84312aa6d00/a'
phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,2:a'
whole_disk=0001
DTL=012b
children[3]
type='disk'
id=0003
guid=e205849496e5e447
path='/dev/dsk/c10t4d4s0'
devid='id1,s...@n600d0230006c8a5f0c3fd8415c62ae00/a'
phys_path='/p...@0,0/pci1022,7...@b/pci9005,4...@1,1/s...@4,4:a'
whole_disk=0001
DTL=0128
children[1]
type='raidz'

[zfs-discuss] Kernel panic on zpool status -v (build 143)

2010-06-27 Thread Andrej Podzimek

Hello,

I got a zfs panic on build 143 (installed with onu) in the following unusual 
situation:

1) 'zpool scrub' found a corrupted snapshot on which two BEs were based.
2) I removed the first dependency with 'zfs promote'.
3) I removed the second dependency with 'zfs -pv send ... | zfs -v 
receive ...'
4) 'zfs destroy' said dataset busy when called on the old snapshot. 
So I rebooted.
5) After the reboot, the corrupted snapshot could be successfully 
destroyed.
6) One dataset and two other snapshots created on the way (in (3)) were 
removed.
7) Now 'zpool status -v' *crashed* the kernel.
8) After a reboot, 'zpool status -v' caused a crash again.

I ran 'zpool scrub' and will report what happens once it's finished. (It will 
take pretty long.)

An mdb session output is attached to this message. I can provide the full crash 
dump if you wish. (As for the ::stack at the end, I'm not sure if it's 
meaningful. This is (unfortunately) not a debugging kernel, so the first 6 
arguments should not be stored on the stack.)

Andrej
 ::status
debugging crash dump vmcore.5 (64-bit) from helium
operating system: 5.11 osnet143 (i86pc)
panic message: assertion failed: 0 == dmu_bonus_hold(os, object, dl, 
dl-dl_dbuf) (0x0 == 0x16), file: ../../common/fs/zfs/dsl_deadlist.c, line: 80
dump content: kernel pages only


 ::msgbuf ! tail -21
panic[cpu4]/thread=ff02d59540a0: 
assertion failed: 0 == dmu_bonus_hold(os, object, dl, dl-dl_dbuf) (0x0 == 
0x16), file: ../../common/fs/zfs/dsl_deadlist.c, line: 80


ff00106a0a50 genunix:assfail3+c1 ()
ff00106a0ad0 zfs:dsl_deadlist_open+ef ()
ff00106a0b80 zfs:dsl_dataset_get_ref+14c ()
ff00106a0bc0 zfs:dsl_dataset_hold_obj+2d ()
ff00106a0c20 zfs:dsl_dsobj_to_dsname+73 ()
ff00106a0c40 zfs:zfs_ioc_dsobj_to_dsname+23 ()
ff00106a0cc0 zfs:zfsdev_ioctl+176 ()
ff00106a0d00 genunix:cdev_ioctl+45 ()
ff00106a0d40 specfs:spec_ioctl+5a ()
ff00106a0dc0 genunix:fop_ioctl+7b ()
ff00106a0ec0 genunix:ioctl+18e ()
ff00106a0f10 unix:brand_sys_sysenter+1c9 ()

syncing file systems...
 done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port


 ff02d59540a0::whatis  
ff02d59540a0 is allocated as a thread structure


 ff02d59540a0::print kthread_t t_procp | ::print proc_t p_user.u_psargs
p_user.u_psargs = [ zpool status -v rpool ]


 ::stack
vpanic()
assfail3+0xc1(f7a2dff0, 0, f7a2e050, 16, f7a2e028, 50)
dsl_deadlist_open+0xef(ff02f43dd7f0, ff02cff74080, 0)
dsl_dataset_get_ref+0x14c(ff02d2ebacc0, 1b, f7a2865c, 
ff00106a0bd8)
dsl_dataset_hold_obj+0x2d(ff02d2ebacc0, 1b, f7a2865c, 
ff00106a0bd8)
dsl_dsobj_to_dsname+0x73(ff02f5f44000, 1b, ff02f5f44400)
zfs_ioc_dsobj_to_dsname+0x23(ff02f5f44000)
zfsdev_ioctl+0x176(b6, 5a25, 8042130, 13, ff02dae06460, 
ff00106a0de4)
cdev_ioctl+0x45(b6, 5a25, 8042130, 13, ff02dae06460, 
ff00106a0de4)
spec_ioctl+0x5a(ff02d5fd7900, 5a25, 8042130, 13, ff02dae06460, 
ff00106a0de4)
fop_ioctl+0x7b(ff02d5fd7900, 5a25, 8042130, 13, ff02dae06460, 
ff00106a0de4)
ioctl+0x18e(3, 5a25, 8042130)
_sys_sysenter_post_swapgs+0x149()



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel panic - directed here from networking

2009-11-21 Thread David Bond
Hi,

I have been having problems with reboots, it usually happens when I am either 
sending or receiving data on the server, it can be over CIFS, or HTTP, NNTP. SO 
could be a networking problem, but they directed me here or to CIFS, but as it 
happens when I'm not using CIFS (but the service is still running) its probably 
not CIFS. I have checked for faulty RAM, ran memtest86+ (4.0), ran through 
multiple times without problem.

The previous thread is http://opensolaris.org/jive/thread.jspa?threadID=116843

I have had 2 reboots today, within 10 minutes of each other.
The previous 2 crashes produced the following:

r...@nas:/var/crash/NAS# echo '$c' | mdb -k 11
page_create_va+0x314(fbc30210, ff016060d000, 2, 53,
ff00048c25d0, ff016060d000)
segkmem_page_create+0x8d(ff016060d000, 2, 4, fbc30210)
segkmem_xalloc+0xc0(ff0146e1f000, 0, 2, 4, 0, fb880cb8)
segkmem_alloc_vn+0xcd(ff0146e1f000, 2, 4, fbc30210)
segkmem_alloc+0x24(ff0146e1f000, 2, 4)
vmem_xalloc+0x546(ff0146e2, 2, 1000, 0, 0, 0)
vmem_alloc+0x161(ff0146e2, 2, 4)
kmem_slab_create+0x81(ff014890f858, 4)
kmem_slab_alloc+0x5b(ff014890f858, 4)
kmem_cache_alloc+0x130(ff014890f858, 4)
zio_buf_alloc+0x2c(2)
vdev_queue_io_to_issue+0x42f(ff014c9985a8, 23)
vdev_queue_io_done+0x61(ff014d1180a8)
zio_vdev_io_done+0x62(ff014d1180a8)
zio_execute+0xa0(ff014d1180a8)
taskq_thread+0x1b7(ff014c716688)
thread_start+8()

r...@nas:/var/crash/NAS# echo '$c' | mdb -k 12
fsflush_do_pages+0x1e4()
fsflush+0x3a6()
thread_start+8()

Any help on finding out the problem would be great.

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zfs import (hardware failure)

2009-11-02 Thread Donald Murray, P.Eng.
Hey,


On Sat, Oct 31, 2009 at 5:03 PM, Victor Latushkin
victor.latush...@sun.com wrote:
 Donald Murray, P.Eng. wrote:

 Hi,

 I've got an OpenSolaris 2009.06 box that will reliably panic whenever
 I try to import one of my pools. What's the best practice for
 recovering (before I resort to nuking the pool and restoring from
 backup)?

 Could you please post panic stack backtrace?

 There are two pools on the system: rpool and tank. The rpool seems to
 be fine, since I can boot from a 2009.06 CD and 'zpool import -f
 rpool'; I can also 'zfs scrub rpool', and it doesn't find any errors.
 Hooray! Except I don't care about rpool. :-(

 If I boot from hard disk, the system begins importing zfs pools; once
 it's imported everything I usually have enough time to log in before
 it panics. If I boot from CD and 'zfs import -f tank', it panics.

 I've just started a 'zdb -e tank' which I found on the intertubes
 here: http://opensolaris.org/jive/thread.jspa?threadID=49020. Zdb
 seems to be ... doing something. Not sure _what_ it's doing, but it
 can't be making things worse for me right?

 Yes, zdb only reads, so it cannot make thing worse.

 I'm going to try adding the following to /etc/system, as mentioned
 here: http://opensolaris.org/jive/thread.jspa?threadID=114906
 set zfs:zfs_recover=1
 set aok=1

 Please do not rush with these settings. Let's look at the stack backtrace
 first.

 Regards,
 Victor



I think I've found the cause of my problem. I disconnected one side of
each mirror, rebooted, and imported. The system didn't panic! So one
of the disconnected drives (or cables, or controllers...) was the culprit.

I've since narrowed it down to a single 500GB drive. When that drive is
connected, a zpool import panics the system. When that drive is disconnected,
the pool imports fine.

r...@weyl:~# zpool status tank
  pool: tank
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid.  Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: resilver completed after 0h8m with 0 errors on Sun Nov  1 22:11:15 2009
config:

NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
  mirror DEGRADED 0 0 0
7508645614192559694  FAULTED  0 0 0  was
/dev/dsk/c7t0d0s0
c6t1d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 6  21.2G resilvered
c7t0d0   ONLINE   0 0 0

errors: No known data errors
r...@weyl:~#

The first thing that's jumping out at me: why does the first mirror
think the missing
disk was c7t0d0? I have an old zpool status from before the problem began, and
that disk used to be c6t0d0.

r...@weyl:~# zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  mirrorONLINE   0 0 0
c6t0d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0

errors: No known data errors
r...@weyl:~#


Victor has been very helpful, living up to his reputation. Thanks Victor!

If we determine a root cause, I'll update the list.

Things I've learned along the way:
- pools import automatically based on cached information in
/etc/zfs/zpool.cache; if you move zpool.cache elsewhere, none of the
pools will import upon rebooting;
- import problematic pools via 'zpool import -f -R /a poolname';
this doesn't update the cachefile, and mounts the pool on /a;
- adding the following to /etc/system didn't prevent a hardware-induced panic:
set zfs:zfs_recover=1
set aok=1
- crash dumps are typically saved in /var/crash/$( uname -n )
- beadm is your friend;
- redundancy is your friend (okay, I already knew that);
- if you have a zfs problem, you want Victor Latushkin to be your friend;

Cheers!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel panic on zfs import

2009-10-31 Thread Donald Murray, P.Eng.
Hi,

I've got an OpenSolaris 2009.06 box that will reliably panic whenever
I try to import one of my pools. What's the best practice for
recovering (before I resort to nuking the pool and restoring from
backup)?

There are two pools on the system: rpool and tank. The rpool seems to
be fine, since I can boot from a 2009.06 CD and 'zpool import -f
rpool'; I can also 'zfs scrub rpool', and it doesn't find any errors.
Hooray! Except I don't care about rpool. :-(

If I boot from hard disk, the system begins importing zfs pools; once
it's imported everything I usually have enough time to log in before
it panics. If I boot from CD and 'zfs import -f tank', it panics.

I've just started a 'zdb -e tank' which I found on the intertubes
here: http://opensolaris.org/jive/thread.jspa?threadID=49020. Zdb
seems to be ... doing something. Not sure _what_ it's doing, but it
can't be making things worse for me right?

I'm going to try adding the following to /etc/system, as mentioned
here: http://opensolaris.org/jive/thread.jspa?threadID=114906
set zfs:zfs_recover=1
set aok=1

Suggestions?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on zfs import

2009-10-31 Thread Victor Latushkin

Donald Murray, P.Eng. wrote:

Hi,

I've got an OpenSolaris 2009.06 box that will reliably panic whenever
I try to import one of my pools. What's the best practice for
recovering (before I resort to nuking the pool and restoring from
backup)?


Could you please post panic stack backtrace?


There are two pools on the system: rpool and tank. The rpool seems to
be fine, since I can boot from a 2009.06 CD and 'zpool import -f
rpool'; I can also 'zfs scrub rpool', and it doesn't find any errors.
Hooray! Except I don't care about rpool. :-(

If I boot from hard disk, the system begins importing zfs pools; once
it's imported everything I usually have enough time to log in before
it panics. If I boot from CD and 'zfs import -f tank', it panics.

I've just started a 'zdb -e tank' which I found on the intertubes
here: http://opensolaris.org/jive/thread.jspa?threadID=49020. Zdb
seems to be ... doing something. Not sure _what_ it's doing, but it
can't be making things worse for me right?


Yes, zdb only reads, so it cannot make thing worse.


I'm going to try adding the following to /etc/system, as mentioned
here: http://opensolaris.org/jive/thread.jspa?threadID=114906
set zfs:zfs_recover=1
set aok=1


Please do not rush with these settings. Let's look at the stack 
backtrace first.


Regards,
Victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-16 Thread Marc Althoff
We have the same problem since of today. The pool was to be renamed width 
zpool export, after an import it didn't come back online. A import -f results 
in a kernel panic.

zpool status -v freports a degraded drive also.

I'll also try to supply som,e traces and logs.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-16 Thread Victor Latushkin

Marc Althoff wrote:

We have the same problem since of today. The pool was to be renamed width 
zpool export, after an import it didn't come back online. A import -f results in a kernel 
panic.

zpool status -v freports a degraded drive also.

I'll also try to supply som,e traces and logs.
  
Please provide at least stack trace from console or /var/adm/messages 
for a start, please try to make sure that crashdump from the first panic 
is saved.


victor

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-16 Thread Marc Althoff
dear all, victor,

i am most happy to report that the problems were somehwat hardware-related, 
caused by a damaged / dangling SATA cable which apparently caused long delays 
(sometimes working, disk on, disk off, ...) during normal zfs operations. Why 
the -f produced a kernel panic I'm unsure. Interestingly it all fit some 
symptoms other people have with a bad uberlblock, a defect spanned metadata 
structure (?) detected after a scrube tc.

anyway, great that you guys answered to quickly. there was 6 TB of data on that 
pool. I stress-tested it for a week and 30 minutes prior to the incident 
deleted the old RAID set ... imagine my horror ;)

have a good one
marc
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-12 Thread Victor Latushkin

On 11.10.09 12:59, Darren Taylor wrote:

I have searched the forums and google wide, but cannot find a fix for the issue 
I'm currently experiencing. Long story short - I'm now at a point where I 
cannot even import my zpool (zpool import -f tank) without causing a kernel 
panic

I'm running OpenSolaris snv_111b and the zpool is version 14. 


This is the panic from /var/adm/messages;  (full output attached);


Where is full stack back trace? I do not see any attachment.

victor



genunix: [ID 361072 kern.notice] zfs: freeing free segment 
(offset=3540185931776 size=22528)

This is the output I get from zpool import;

# zpool import
  pool: tank
id: 15136317365944618902
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tankONLINE
  raidz1ONLINE
c9t4d0  ONLINE
c9t5d0  ONLINE
c9t6d0  ONLINE
c9t7d0  ONLINE
  raidz1ONLINE
c9t0d0  ONLINE
c9t1d0  ONLINE
c9t2d0  ONLINE
c9t3d0  ONLINE

I tried pulling back some info via this zdb command, but i'm not sure if i'm on 
the right track here (as zpool import seems to see the zpool without issue). 
This result is similar from all drives;

# zdb -l /dev/dsk/c9t4d0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3

I also can complete zdb -e tank without issues – it lists all my snapshots and various objects without problem (this is still running on the machine at the moment) 


I have put the following into /etc/system;

set zfs:zfs_recover=1
set aok=1 

i've also tried mounting the zpool read only with zpool import -f -o ro tank but no luck.. 


I dont know where to go next? – am I meant to try and recover using an older 
txg? E.

I would be extremely grateful to anyone who can offer advice on how to resolve this issue as the pool contains irreplaceable photos. Unfortunately I have not done any backups for a while as I thought raidz would be my savour. :( 


please help

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-12 Thread Darren Taylor
Hi Victor, i have tried to re-attach the detail from /var/adm/messages
-- 
This message posted from opensolaris.orgOct 11 17:16:55 opensolaris unix: [ID 836849 kern.notice] 
Oct 11 17:16:55 opensolaris ^Mpanic[cpu0]/thread=ff000b6f7c60: 
Oct 11 17:16:55 opensolaris genunix: [ID 361072 kern.notice] zfs: freeing free 
segment (offset=3540185931776 size=22528)
Oct 11 17:16:55 opensolaris unix: [ID 10 kern.notice] 
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f75f0 
genunix:vcmn_err+2c ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f76e0 
zfs:zfs_panic_recover+ae ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7770 
zfs:space_map_remove+13c ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7820 
zfs:space_map_load+260 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7860 
zfs:metaslab_activate+64 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7920 
zfs:metaslab_group_alloc+2b7 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7a00 
zfs:metaslab_alloc_dva+295 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7aa0 
zfs:metaslab_alloc+9b ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7ad0 
zfs:zio_dva_allocate+3e ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b00 
zfs:zio_execute+a0 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b60 
zfs:zio_notify_parent+a6 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b90 
zfs:zio_ready+188 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7bc0 
zfs:zio_execute+a0 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7c40 
genunix:taskq_thread+193 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7c50 
unix:thread_start+8 ()
Oct 11 17:16:55 opensolaris unix: [ID 10 kern.notice] 
Oct 11 17:16:55 opensolaris genunix: [ID 672855 kern.notice] syncing file 
systems...
Oct 11 17:16:55 opensolaris genunix: [ID 904073 kern.notice]  done
Oct 11 17:16:56 opensolaris genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Oct 11 17:17:09 opensolaris genunix: [ID 409368 kern.notice] ^M100% done: 
168706 pages dumped, compression ratio 3.58, 
Oct 11 17:17:09 opensolaris genunix: [ID 851671 kern.notice] dump succeeded___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-12 Thread Darren Taylor
i have re run zdb -l /dev/dsk/c9t4d0s0 as i should have the first time (thanks 
Nicolas).

Attached output.
-- 
This message posted from opensolaris.org# zdb -l /dev/dsk/c9t4d0s0

LABEL 0

version=14
name='tank'
state=0
txg=119170
pool_guid=15136317365944618902
hostid=290968
hostname='lexx'
top_guid=1561201926038510280
guid=11292568128772689834
vdev_tree
type='raidz'
id=0
guid=1561201926038510280
nparity=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=4000766230528
is_log=0
children[0]
type='disk'
id=0
guid=11292568128772689834
path='/dev/dsk/c9t4d0s0'
devid='id1,s...@n50014ee2588170a5/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a'
whole_disk=1
children[1]
type='disk'
id=1
guid=10678319508898151547
path='/dev/dsk/c9t5d0s0'
devid='id1,s...@n50014ee2032b9b04/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a'
whole_disk=1
children[2]
type='disk'
id=2
guid=16523383997370950474
path='/dev/dsk/c9t6d0s0'
devid='id1,s...@n50014ee2032b9b75/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@6,0:a'
whole_disk=1
children[3]
type='disk'
id=3
guid=1710422830365926220
path='/dev/dsk/c9t7d0s0'
devid='id1,s...@n50014ee2add68f2c/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@7,0:a'
whole_disk=1

LABEL 1

version=14
name='tank'
state=0
txg=119170
pool_guid=15136317365944618902
hostid=290968
hostname='lexx'
top_guid=1561201926038510280
guid=11292568128772689834
vdev_tree
type='raidz'
id=0
guid=1561201926038510280
nparity=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=4000766230528
is_log=0
children[0]
type='disk'
id=0
guid=11292568128772689834
path='/dev/dsk/c9t4d0s0'
devid='id1,s...@n50014ee2588170a5/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a'
whole_disk=1
children[1]
type='disk'
id=1
guid=10678319508898151547
path='/dev/dsk/c9t5d0s0'
devid='id1,s...@n50014ee2032b9b04/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a'
whole_disk=1
children[2]
type='disk'
id=2
guid=16523383997370950474
path='/dev/dsk/c9t6d0s0'
devid='id1,s...@n50014ee2032b9b75/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@6,0:a'
whole_disk=1
children[3]
type='disk'
id=3
guid=1710422830365926220
path='/dev/dsk/c9t7d0s0'
devid='id1,s...@n50014ee2add68f2c/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@7,0:a'
whole_disk=1

LABEL 2

version=14
name='tank'
state=0
txg=119170
pool_guid=15136317365944618902
hostid=290968
hostname='lexx'
top_guid=1561201926038510280
guid=11292568128772689834
vdev_tree
type='raidz'
id=0
guid=1561201926038510280
nparity=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=4000766230528
is_log=0
children[0]
type='disk'
id=0
guid=11292568128772689834
path='/dev/dsk/c9t4d0s0'
devid='id1,s...@n50014ee2588170a5/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a'
whole_disk=1
children[1]
type='disk'
id=1
guid=10678319508898151547
path='/dev/dsk/c9t5d0s0'
devid='id1,s...@n50014ee2032b9b04/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a'
whole_disk=1
children[2]
type='disk'
id=2
guid=16523383997370950474
path='/dev/dsk/c9t6d0s0'

[zfs-discuss] kernel panic on zpool import

2009-10-11 Thread Darren Taylor
I have searched the forums and google wide, but cannot find a fix for the issue 
I'm currently experiencing. Long story short - I'm now at a point where I 
cannot even import my zpool (zpool import -f tank) without causing a kernel 
panic

I'm running OpenSolaris snv_111b and the zpool is version 14. 

This is the panic from /var/adm/messages;  (full output attached);

genunix: [ID 361072 kern.notice] zfs: freeing free segment 
(offset=3540185931776 size=22528)

This is the output I get from zpool import;

# zpool import
  pool: tank
id: 15136317365944618902
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tankONLINE
  raidz1ONLINE
c9t4d0  ONLINE
c9t5d0  ONLINE
c9t6d0  ONLINE
c9t7d0  ONLINE
  raidz1ONLINE
c9t0d0  ONLINE
c9t1d0  ONLINE
c9t2d0  ONLINE
c9t3d0  ONLINE

I tried pulling back some info via this zdb command, but i'm not sure if i'm on 
the right track here (as zpool import seems to see the zpool without issue). 
This result is similar from all drives;

# zdb -l /dev/dsk/c9t4d0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3

I also can complete zdb -e tank without issues – it lists all my snapshots and 
various objects without problem (this is still running on the machine at the 
moment) 

I have put the following into /etc/system;

set zfs:zfs_recover=1
set aok=1 

i've also tried mounting the zpool read only with zpool import -f -o ro tank 
but no luck.. 

I dont know where to go next? – am I meant to try and recover using an older 
txg? E.

I would be extremely grateful to anyone who can offer advice on how to resolve 
this issue as the pool contains irreplaceable photos. Unfortunately I have not 
done any backups for a while as I thought raidz would be my savour. :( 

please help
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-11 Thread Ian Collins

Darren Taylor wrote:

I have searched the forums and google wide, but cannot find a fix for the issue 
I'm currently experiencing. Long story short - I'm now at a point where I 
cannot even import my zpool (zpool import -f tank) without causing a kernel 
panic

I'm running OpenSolaris snv_111b and the zpool is version 14. 


This is the panic from /var/adm/messages;  (full output attached);

genunix: [ID 361072 kern.notice] zfs: freeing free segment 
(offset=3540185931776 size=22528)

  
Have you tried importing to a system running a more recent build?  The 
problem may have been fixed...


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-11 Thread Darren Taylor
Hi Ian, I'm currently downloading build 124 to see if that helps...  the 
download is running a bit slow so wont know until later tomorrow. 

Just an update that i have also tried;   (forgot to mention above)
*  Pulling out each disk - tried mounting in degraded state - same kernel 
panic
*  Deleting the zpool.cache

Fingers crossed i get something different with the newer build. Very strange, 
as i don't think this was a hardware issue? -- as all the drives appear to be 
working without issue and zpool import list all drives as ONLINE without any 
information pointing to corruption.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-11-18 Thread Gavin Maltby


Richard Elling wrote:
 Chris Gerhard wrote:
 My home server running snv_94 is tipping with the same assertion when 
 someone list a particular file:
   
 
 Failed assertions indicate software bugs.  Please file one.

We learn something new every day!

Gavin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-11-17 Thread Richard Elling
Chris Gerhard wrote:
 My home server running snv_94 is tipping with the same assertion when someone 
 list a particular file:
   

Failed assertions indicate software bugs.  Please file one.
http://en.wikipedia.org/wiki/Assertion_(computing)
 -- richard

 ::status
 Loading modules: [ unix genunix specfs dtrace cpu.generic 
 cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp 
 usba qlc fctl nca lofs zfs audiosup sd cpc random crypto fcip fcp smbsrv nfs 
 logindmux ptm sppp nsctl sdbc sv ii rdc nsmb ipc mpt emlxs ]
   
 ::status
 
 debugging crash dump vmcore.17 (64-bit) from pearson
 operating system: 5.11 snv_94 (i86pc)
 panic message: 
 assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: 
 ../../comm
 on/fs/zfs/zfs_fuid.c, line: 116
 dump content: kernel pages only
   
 $c
 
 vpanic()
 assfail+0x7e(f83e3a10, f83e39f0, 74)
 zfs_fuid_table_load+0x1ed(ff025a1c2448, 0, ff025a231e88, 
 ff025a231eb0)
 zfs_fuid_init+0xf8(ff025a231e40, 0)
 zfs_fuid_find_by_idx+0x3f(ff025a231e40, 40100)
 zfs_fuid_map_id+0x3f(ff025a231e40, 4010020c1, ff02672d0638, 2)
 zfs_zaccess_common+0x246(ff02bc62f4b0, 2, ff000cfcabd0, 
 ff000cfcabd4, 0, ff02672d0638)
 zfs_zaccess+0x114(ff02bc62f4b0, 2, 0, 0, ff02672d0638)
 zfs_getacl+0x4c(ff02bc62f4b0, ff000cfcadd0, 0, ff02672d0638)
 zfs_getsecattr+0x81(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 
 0)
 fop_getsecattr+0x8f(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 
 0)
 cacl+0x5ae(6, 0, 0, ff02bf7f9740, ff000cfcae9c)
 acl+0x8d(80665d2, 6, 0, 0)
 sys_syscall32+0x101()
   
 ff02bf7f9740::print vnode_t
 
 {
 v_lock = {
 _opaque = [ 0 ]
 }
 v_flag = 0x1
 v_count = 0x2
 v_data = 0xff02bc62f4b0
 v_vfsp = 0xff025a0055d0
 v_stream = 0
 v_type = 1 (VREG)
 v_rdev = 0x
 v_vfsmountedhere = 0
 v_op = 0xff02520d2200
 v_pages = 0
 v_filocks = 0
 v_shrlocks = 0
 v_nbllock = {
 _opaque = [ 0 ]
 }
 v_cv = {
 _opaque = 0
 }
 v_locality = 0
 v_femhead = 0 
 v_path = 0xff02859d99c8 
 /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI
 v_rdcnt = 0
 v_wrcnt = 0
 v_mmap_read = 0
 v_mmap_write = 0
 v_mpssdata = 0
 v_fopdata = 0
 v_vsd = 0
 v_xattrdir = 0
 v_count_dnlc = 0x1
 }

 An ls -l of  /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI results in 
 the system crashing.

 Need to investigate this further when I get home
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-11-17 Thread Chris Gerhard

Richard Elling wrote:

Chris Gerhard wrote:
My home server running snv_94 is tipping with the same assertion when 
someone list a particular file:
  


Failed assertions indicate software bugs.  Please file one.
http://en.wikipedia.org/wiki/Assertion_(computing)


A colleague pointed out that it is an exact match for bug 6746456 so I 
will upgrade to a later build and check that out. Alas in the mean time 
the power supply on the system has failed to I can't check this immediately.


If it not fixed then I will file a new bug

--chris


-- richard


::status
Loading modules: [ unix genunix specfs dtrace cpu.generic 
cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti 
sctp arp usba qlc fctl nca lofs zfs audiosup sd cpc random crypto fcip 
fcp smbsrv nfs logindmux ptm sppp nsctl sdbc sv ii rdc nsmb ipc mpt 
emlxs ]
 

::status


debugging crash dump vmcore.17 (64-bit) from pearson
operating system: 5.11 snv_94 (i86pc)
panic message: assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, 
FTAG, db), file: ../../comm

on/fs/zfs/zfs_fuid.c, line: 116
dump content: kernel pages only
 

$c


vpanic()
assfail+0x7e(f83e3a10, f83e39f0, 74)
zfs_fuid_table_load+0x1ed(ff025a1c2448, 0, ff025a231e88, 
ff025a231eb0)

zfs_fuid_init+0xf8(ff025a231e40, 0)
zfs_fuid_find_by_idx+0x3f(ff025a231e40, 40100)
zfs_fuid_map_id+0x3f(ff025a231e40, 4010020c1, 
ff02672d0638, 2)
zfs_zaccess_common+0x246(ff02bc62f4b0, 2, ff000cfcabd0, 
ff000cfcabd4, 0, ff02672d0638)

zfs_zaccess+0x114(ff02bc62f4b0, 2, 0, 0, ff02672d0638)
zfs_getacl+0x4c(ff02bc62f4b0, ff000cfcadd0, 0, ff02672d0638)
zfs_getsecattr+0x81(ff02bf7f9740, ff000cfcadd0, 0, 
ff02672d0638, 0)
fop_getsecattr+0x8f(ff02bf7f9740, ff000cfcadd0, 0, 
ff02672d0638, 0)

cacl+0x5ae(6, 0, 0, ff02bf7f9740, ff000cfcae9c)
acl+0x8d(80665d2, 6, 0, 0)
sys_syscall32+0x101()
 

ff02bf7f9740::print vnode_t


{
v_lock = {
_opaque = [ 0 ]
}
v_flag = 0x1
v_count = 0x2
v_data = 0xff02bc62f4b0
v_vfsp = 0xff025a0055d0
v_stream = 0
v_type = 1 (VREG)
v_rdev = 0x
v_vfsmountedhere = 0
v_op = 0xff02520d2200
v_pages = 0
v_filocks = 0
v_shrlocks = 0
v_nbllock = {
_opaque = [ 0 ]
}
v_cv = {
_opaque = 0
}
v_locality = 0
v_femhead = 0 v_path = 0xff02859d99c8 
/tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI

v_rdcnt = 0
v_wrcnt = 0
v_mmap_read = 0
v_mmap_write = 0
v_mpssdata = 0
v_fopdata = 0
v_vsd = 0
v_xattrdir = 0
v_count_dnlc = 0x1
}

An ls -l of  /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI 
results in the system crashing.


Need to investigate this further when I get home
  





--
Chris Gerhard. __o __o __o
Systems TSC Chief Technologist_`\,`\,`\,_
Sun Microsystems Limited (*)/---/---/ (*)
Phone: +44 (0) 1252 426033 (ext 26033) http://blogs.sun.com/chrisg


smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-11-10 Thread Chris Gerhard
My home server running snv_94 is tipping with the same assertion when someone 
list a particular file:

::status
Loading modules: [ unix genunix specfs dtrace cpu.generic 
cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp 
usba qlc fctl nca lofs zfs audiosup sd cpc random crypto fcip fcp smbsrv nfs 
logindmux ptm sppp nsctl sdbc sv ii rdc nsmb ipc mpt emlxs ]
 ::status
debugging crash dump vmcore.17 (64-bit) from pearson
operating system: 5.11 snv_94 (i86pc)
panic message: 
assertion failed: 0 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../comm
on/fs/zfs/zfs_fuid.c, line: 116
dump content: kernel pages only
 $c
vpanic()
assfail+0x7e(f83e3a10, f83e39f0, 74)
zfs_fuid_table_load+0x1ed(ff025a1c2448, 0, ff025a231e88, 
ff025a231eb0)
zfs_fuid_init+0xf8(ff025a231e40, 0)
zfs_fuid_find_by_idx+0x3f(ff025a231e40, 40100)
zfs_fuid_map_id+0x3f(ff025a231e40, 4010020c1, ff02672d0638, 2)
zfs_zaccess_common+0x246(ff02bc62f4b0, 2, ff000cfcabd0, 
ff000cfcabd4, 0, ff02672d0638)
zfs_zaccess+0x114(ff02bc62f4b0, 2, 0, 0, ff02672d0638)
zfs_getacl+0x4c(ff02bc62f4b0, ff000cfcadd0, 0, ff02672d0638)
zfs_getsecattr+0x81(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 0)
fop_getsecattr+0x8f(ff02bf7f9740, ff000cfcadd0, 0, ff02672d0638, 0)
cacl+0x5ae(6, 0, 0, ff02bf7f9740, ff000cfcae9c)
acl+0x8d(80665d2, 6, 0, 0)
sys_syscall32+0x101()
 ff02bf7f9740::print vnode_t
{
v_lock = {
_opaque = [ 0 ]
}
v_flag = 0x1
v_count = 0x2
v_data = 0xff02bc62f4b0
v_vfsp = 0xff025a0055d0
v_stream = 0
v_type = 1 (VREG)
v_rdev = 0x
v_vfsmountedhere = 0
v_op = 0xff02520d2200
v_pages = 0
v_filocks = 0
v_shrlocks = 0
v_nbllock = {
_opaque = [ 0 ]
}
v_cv = {
_opaque = 0
}
v_locality = 0
v_femhead = 0 
v_path = 0xff02859d99c8 /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI
v_rdcnt = 0
v_wrcnt = 0
v_mmap_read = 0
v_mmap_write = 0
v_mpssdata = 0
v_fopdata = 0
v_vsd = 0
v_xattrdir = 0
v_count_dnlc = 0x1
}

An ls -l of  /tank/fs/shared/pics/MY_Pictures/SMOV0022.AVI results in the 
system crashing.

Need to investigate this further when I get home
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-11-07 Thread Andrew
Do you guys have any more information about this? I've tried the offset 
methods, zfs_recover, aok=1, mounting read only, yada yada, with still 0 luck. 
I have about 3TBs of data on my array, and I would REALLY hate to lose it.

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-11-02 Thread Matthew R. Wilson
I can reliably reproduce this panic with a similar stack trace on a
newly installed Solaris 10 10/08 system (I know, not OpenSolaris but
it appears to be the same problem). I just opened a support case w/
Sun but then discovered what appear to be the specific steps for me to
reproduce it.

My setup is a Sol10u6 server, with /export/olddata a ZFS filesystem
with sharenfs=root=zeus.mattwilson.local

zeus.mattwilson.local is an Ubuntu Linux system. I mount the NFS share
with no options, just mount athena:/export/olddata /mnt

What I think is causing the problem is that if I copy a file, as root,
with owner UID 4294967294 to the Solaris NFS share, using the -a
option to GNU cp on the Linux box (which, among other things,
preserves the owner), the panic occurs. Other files, with more
reasonable owners, don't panic the server.

In my case I can avoid the problem by fixing the bad owner ID on the
file I'm copying, but not sure if this helps with your situation.

My stack was:
SolarisCAT(vmcore.2/10X) stack
unix:vpanic_common+0x165()
unix:0xfb84d7c2()
genunix:0xfb9f0c63()
zfs:zfs_fuid_table_load+0xac()
zfs:zfs_fuid_init+0x53()
zfs:zfs_fuid_find_by_idx+0x87()
zfs:zfs_fuid_map_id+0x47()
zfs:zfs_fuid_map_ids+0x42()
zfs:zfs_getattr+0xbc()
zfs:zfs_shim_getattr+0x15()
genunix:fop_getattr+0x25()
nfssrv:rfs4_delegated_getattr+0x9()
nfssrv:rfs3_setattr+0x19d()
nfssrv:common_dispatch+0x5b8()
nfssrv:rfs_dispatch+0x21()
rpcmod:svc_getreq+0x209()
rpcmod:svc_run+0x124()
rpcmod:svc_do_run+0x88()
nfs:nfssys+0x16a()
unix:_sys_sysenter_post_swapgs+0x14b()
-- switch to user thread's user stack --

panic string:   assertion failed: 0 == dmu_bonus_hold(os, fuid_obj,
FTAG, db), file: ../../common/fs/zfs/zfs_fuid.c, line: 95


On Tue, Sep 9, 2008 at 7:56 AM, Mark Shellenbaum
[EMAIL PROTECTED] wrote:
 David Bartley wrote:
 On Tue, Sep 9, 2008 at 11:43 AM, Mark Shellenbaum
 [EMAIL PROTECTED] wrote:
 David Bartley wrote:
 Hello,

 We're repeatedly seeing a kernel panic on our disk server. We've been
 unable to determine exactly how to reproduce it, but it seems to occur
 fairly frequently (a few times a day). This is happening on both snv91 and
 snv96. We've run 'zpool scrub' and this has reported no errors. I can try 
 to
 provide more information if needed. Is there a way to turn on more
 logging/debugging?

 -- David
 --
 Have you been using the CIFS server?  You should only be going down that
 path for Windows created files and its trying to load Windows domain SID
  table.

 No. We have a bunch of linux NFS clients. The machines mount from the
 server using a mixture of NFSv3, NFSv4, sys auth, and krb5 auth.


 What is the history of this file system?  Was is created prior to snv_77
 and then upgraded?  You most likely have a bad uid/gid on one or more files.

 Can you post the dump so I can download it?

   -Mark
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Matthew R. Wilson
http://www.mattwilson.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-11-02 Thread Mark Shellenbaum
Matthew R. Wilson wrote:
 I can reliably reproduce this panic with a similar stack trace on a
 newly installed Solaris 10 10/08 system (I know, not OpenSolaris but
 it appears to be the same problem). I just opened a support case w/
 Sun but then discovered what appear to be the specific steps for me to
 reproduce it.
 
 My setup is a Sol10u6 server, with /export/olddata a ZFS filesystem
 with sharenfs=root=zeus.mattwilson.local
 
 zeus.mattwilson.local is an Ubuntu Linux system. I mount the NFS share
 with no options, just mount athena:/export/olddata /mnt
 
 What I think is causing the problem is that if I copy a file, as root,
 with owner UID 4294967294 to the Solaris NFS share, using the -a
 option to GNU cp on the Linux box (which, among other things,
 preserves the owner), the panic occurs. Other files, with more
 reasonable owners, don't panic the server.
 
 In my case I can avoid the problem by fixing the bad owner ID on the
 file I'm copying, but not sure if this helps with your situation.
 

I believe this panic shouldn't happen on OpenSolaris.  It has some extra 
protection to prevent the panic that doesn't exist in the S10 code base.

Are there any ACLs on the parent directory that would be inherited to 
the newly created file you tried to copy? If so what are they?


 My stack was:
 SolarisCAT(vmcore.2/10X) stack
 unix:vpanic_common+0x165()
 unix:0xfb84d7c2()
 genunix:0xfb9f0c63()
 zfs:zfs_fuid_table_load+0xac()
 zfs:zfs_fuid_init+0x53()
 zfs:zfs_fuid_find_by_idx+0x87()
 zfs:zfs_fuid_map_id+0x47()
 zfs:zfs_fuid_map_ids+0x42()
 zfs:zfs_getattr+0xbc()
 zfs:zfs_shim_getattr+0x15()
 genunix:fop_getattr+0x25()
 nfssrv:rfs4_delegated_getattr+0x9()
 nfssrv:rfs3_setattr+0x19d()
 nfssrv:common_dispatch+0x5b8()
 nfssrv:rfs_dispatch+0x21()
 rpcmod:svc_getreq+0x209()
 rpcmod:svc_run+0x124()
 rpcmod:svc_do_run+0x88()
 nfs:nfssys+0x16a()
 unix:_sys_sysenter_post_swapgs+0x14b()
 -- switch to user thread's user stack --
 
 panic string:   assertion failed: 0 == dmu_bonus_hold(os, fuid_obj,
 FTAG, db), file: ../../common/fs/zfs/zfs_fuid.c, line: 95
 
 
 On Tue, Sep 9, 2008 at 7:56 AM, Mark Shellenbaum
 [EMAIL PROTECTED] wrote:
 David Bartley wrote:
 On Tue, Sep 9, 2008 at 11:43 AM, Mark Shellenbaum
 [EMAIL PROTECTED] wrote:
 David Bartley wrote:
 Hello,

 We're repeatedly seeing a kernel panic on our disk server. We've been
 unable to determine exactly how to reproduce it, but it seems to occur
 fairly frequently (a few times a day). This is happening on both snv91 and
 snv96. We've run 'zpool scrub' and this has reported no errors. I can try 
 to
 provide more information if needed. Is there a way to turn on more
 logging/debugging?

 -- David
 --
 Have you been using the CIFS server?  You should only be going down that
 path for Windows created files and its trying to load Windows domain SID
  table.
 No. We have a bunch of linux NFS clients. The machines mount from the
 server using a mixture of NFSv3, NFSv4, sys auth, and krb5 auth.

 What is the history of this file system?  Was is created prior to snv_77
 and then upgraded?  You most likely have a bad uid/gid on one or more files.

 Can you post the dump so I can download it?

   -Mark
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 
 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-11-02 Thread Matthew R. Wilson
On Sun, Nov 2, 2008 at 4:30 PM, Mark Shellenbaum
[EMAIL PROTECTED] wrote:

 I believe this panic shouldn't happen on OpenSolaris.  It has some extra
 protection to prevent the panic that doesn't exist in the S10 code base.

 Are there any ACLs on the parent directory that would be inherited to the
 newly created file you tried to copy? If so what are they?

Nope, no ACL other than regular POSIX mode 755.

I did confirm that copying the same file to an snv_99 system does not
cause the panic, it looks like the ID gets remapped to the user
'nobody'.

Thanks,
Matthew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on ZFS snapshot destroy

2008-10-08 Thread Daniel Schwager
Hi,

i try to destroy a snapshop1 on opensolaris 
SunOS storage11 5.11 snv_98 i86pc i386 i86pc
and my box reboots leaving a crash-file in /var/crash/storage11.

This is repoducable... for this one snapshot1 - other 
snapshots was destroyable (without crash)

How can i help somebody to track down this problem ?
At the moment, i can't work with this pool.

regards
Danny

P.S.: the snapshot1 depends on a clone1 depending on snapshot2 depending on
a zfs-volume (created by zfs create -V ...)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kernel Panic

2008-09-09 Thread David Bartley
Hello,

We're repeatedly seeing a kernel panic on our disk server. We've been unable to 
determine exactly how to reproduce it, but it seems to occur fairly frequently 
(a few times a day). This is happening on both snv91 and snv96. We've run 
'zpool scrub' and this has reported no errors. I can try to provide more 
information if needed. Is there a way to turn on more logging/debugging?

Sep  9 09:32:23 ginseng unix: [ID 836849 kern.notice] 
Sep  9 09:32:23 ginseng ^Mpanic[cpu1]/thread=ff01598d6820: 
Sep  9 09:32:23 ginseng genunix: [ID 403854 kern.notice] assertion failed: 0 == 
dmu_bonus_hold(os, fuid_obj, FTAG, db), file: ../../common/fs/zfs/zfs_fuid.c, 
line: 116
Sep  9 09:32:23 ginseng unix: [ID 10 kern.notice] 
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03010 
genunix:assfail+7e ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d030b0 
zfs:zfs_fuid_table_load+1ed ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03100 
zfs:zfs_fuid_init+f8 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03140 
zfs:zfs_fuid_find_by_idx+3f ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d031a0 
zfs:zfs_fuid_map_id+3f ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03250 
zfs:zfs_zaccess_common+253 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d032b0 
zfs:zfs_zaccess_delete+9f ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03310 
zfs:zfs_zaccess_rename+64 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03400 
zfs:zfs_rename+2e1 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03490 
genunix:fop_rename+c2 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03770 
nfssrv:rfs3_rename+3ad ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03a70 
nfssrv:common_dispatch+439 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03a90 
nfssrv:rfs_dispatch+2d ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03b80 
rpcmod:svc_getreq+1c6 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03bf0 
rpcmod:svc_run+185 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03c30 
rpcmod:svc_do_run+85 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03ec0 
nfs:nfssys+770 ()
Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03f10 
unix:brand_sys_sysenter+1e6 ()
Sep  9 09:32:23 ginseng unix: [ID 10 kern.notice] 
Sep  9 09:32:23 ginseng genunix: [ID 672855 kern.notice] syncing file systems...
Sep  9 09:32:23 ginseng genunix: [ID 904073 kern.notice]  done
Sep  9 09:32:24 ginseng genunix: [ID 111219 kern.notice] dumping to 
/dev/dsk/c5d1s1, offset 429391872, content: kernel
Sep  9 09:32:41 ginseng genunix: [ID 409368 kern.notice] ^M100% done: 265125 
pages dumped, compression ratio 3.52, 
Sep  9 09:32:41 ginseng genunix: [ID 851671 kern.notice] dump succeeded

-- David
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-09-09 Thread Mark Shellenbaum
David Bartley wrote:
 Hello,
 
 We're repeatedly seeing a kernel panic on our disk server. We've been unable 
 to determine exactly how to reproduce it, but it seems to occur fairly 
 frequently (a few times a day). This is happening on both snv91 and snv96. 
 We've run 'zpool scrub' and this has reported no errors. I can try to provide 
 more information if needed. Is there a way to turn on more logging/debugging?
 
 Sep  9 09:32:23 ginseng unix: [ID 836849 kern.notice] 
 Sep  9 09:32:23 ginseng ^Mpanic[cpu1]/thread=ff01598d6820: 
 Sep  9 09:32:23 ginseng genunix: [ID 403854 kern.notice] assertion failed: 0 
 == dmu_bonus_hold(os, fuid_obj, FTAG, db), file: 
 ../../common/fs/zfs/zfs_fuid.c, line: 116
 Sep  9 09:32:23 ginseng unix: [ID 10 kern.notice] 
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03010 
 genunix:assfail+7e ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d030b0 
 zfs:zfs_fuid_table_load+1ed ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03100 
 zfs:zfs_fuid_init+f8 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03140 
 zfs:zfs_fuid_find_by_idx+3f ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d031a0 
 zfs:zfs_fuid_map_id+3f ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03250 
 zfs:zfs_zaccess_common+253 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d032b0 
 zfs:zfs_zaccess_delete+9f ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03310 
 zfs:zfs_zaccess_rename+64 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03400 
 zfs:zfs_rename+2e1 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03490 
 genunix:fop_rename+c2 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03770 
 nfssrv:rfs3_rename+3ad ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03a70 
 nfssrv:common_dispatch+439 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03a90 
 nfssrv:rfs_dispatch+2d ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03b80 
 rpcmod:svc_getreq+1c6 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03bf0 
 rpcmod:svc_run+185 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03c30 
 rpcmod:svc_do_run+85 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03ec0 
 nfs:nfssys+770 ()
 Sep  9 09:32:23 ginseng genunix: [ID 655072 kern.notice] ff0005d03f10 
 unix:brand_sys_sysenter+1e6 ()
 Sep  9 09:32:23 ginseng unix: [ID 10 kern.notice] 
 Sep  9 09:32:23 ginseng genunix: [ID 672855 kern.notice] syncing file 
 systems...
 Sep  9 09:32:23 ginseng genunix: [ID 904073 kern.notice]  done
 Sep  9 09:32:24 ginseng genunix: [ID 111219 kern.notice] dumping to 
 /dev/dsk/c5d1s1, offset 429391872, content: kernel
 Sep  9 09:32:41 ginseng genunix: [ID 409368 kern.notice] ^M100% done: 265125 
 pages dumped, compression ratio 3.52, 
 Sep  9 09:32:41 ginseng genunix: [ID 851671 kern.notice] dump succeeded
 
 -- David
 --

Have you been using the CIFS server?  You should only be going down that 
path for Windows created files and its trying to load Windows domain SID 
  table.

  -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel Panic

2008-09-09 Thread Mark Shellenbaum
David Bartley wrote:
 On Tue, Sep 9, 2008 at 11:43 AM, Mark Shellenbaum
 [EMAIL PROTECTED] wrote:
 David Bartley wrote:
 Hello,

 We're repeatedly seeing a kernel panic on our disk server. We've been
 unable to determine exactly how to reproduce it, but it seems to occur
 fairly frequently (a few times a day). This is happening on both snv91 and
 snv96. We've run 'zpool scrub' and this has reported no errors. I can try to
 provide more information if needed. Is there a way to turn on more
 logging/debugging?

 -- David
 --
 Have you been using the CIFS server?  You should only be going down that
 path for Windows created files and its trying to load Windows domain SID
  table.
 
 No. We have a bunch of linux NFS clients. The machines mount from the
 server using a mixture of NFSv3, NFSv4, sys auth, and krb5 auth.
 

What is the history of this file system?  Was is created prior to snv_77 
and then upgraded?  You most likely have a bad uid/gid on one or more files.

Can you post the dump so I can download it?

   -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-27 Thread Borys Saulyak
A little update on the subject.

After great help of Victor Latushkin the content of the pools is recovered.
The cause of the problem is still under investigation, but what is clear that 
both config objects where corrupted. 
What has been done to recover data:
Victor has a zfs module which allows to import pools in readonly mode bypassing 
reading of config objects.  After installing it he was able to import pools and 
we manages to save almost everything apart from couple of log files. This 
module seems to be the only way to read content of the pools in situations like 
mine, where pool cannot be imported, and therefor cannot be checked/fixed by 
scrubbing. I hope Victor will post sort of instruction along with the module on 
how to use it.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-19 Thread Borys Saulyak
 From what I can predict, and *nobody* has provided
  any panic
 essages to confirm, ZFS likely had difficulty
 writing.  For Solaris 10u5
Panic stack is looking pretty much the same as panic on imprt, and cannot be 
correlated to write failure:
Aug  5 12:01:27 omases11 unix: [ID 836849 kern.notice] 
Aug  5 12:01:27 omases11 ^Mpanic[cpu3]/thread=fe800279ac80: 
Aug  5 12:01:27 omases11 genunix: [ID 809409 kern.notice] ZFS: bad checksum 
(read on unknown off 0: zio fe8353c23640 [L0 packe
d nvlist] 4000L/600P DVA[0]=0:d4200:600 DVA[1]=0:904200:600 
fletcher4 lzjb LE contiguous birth=3637241 fill=1 cksum=6a85
cbad8b:60029922bbbf:2eb217a6bbefd5:1045aa85ce3521e3): error 50
Aug  5 12:01:27 omases11 unix: [ID 10 kern.notice] 
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279aac0 
zfs:zfsctl_ops_root+3008f24c ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279aad0 
zfs:zio_next_stage+65 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab00 
zfs:zio_wait_for_children+49 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab10 
zfs:zio_wait_children_done+15 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab20 
zfs:zio_next_stage+65 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab60 
zfs:zio_vdev_io_assess+84 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab70 
zfs:zio_next_stage+65 ()
Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279abd0 
zfs:vdev_mirror_io_done+c1 ()
Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279abe0 
zfs:zio_vdev_io_done+14 ()
Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279ac60 
genunix:taskq_thread+bc ()
Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279ac70 
unix:thread_start+8 ()
Aug  5 12:01:28 omases11 unix: [ID 10 kern.notice] 
Aug  5 12:01:28 omases11 genunix: [ID 672855 kern.notice] syncing file 
systems...
Aug  5 12:01:28 omases11 genunix: [ID 733762 kern.notice]  7
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-19 Thread Richard Elling
This panic message seems consistent with bugid 6322646, which was
fixed in NV b77 (post S10u5 freeze).
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6322646

 -- richard

Borys Saulyak wrote:
 From what I can predict, and *nobody* has provided
  any panic
 essages to confirm, ZFS likely had difficulty
 writing.  For Solaris 10u5
 
 Panic stack is looking pretty much the same as panic on imprt, and cannot be 
 correlated to write failure:
 Aug  5 12:01:27 omases11 unix: [ID 836849 kern.notice] 
 Aug  5 12:01:27 omases11 ^Mpanic[cpu3]/thread=fe800279ac80: 
 Aug  5 12:01:27 omases11 genunix: [ID 809409 kern.notice] ZFS: bad checksum 
 (read on unknown off 0: zio fe8353c23640 [L0 packe
 d nvlist] 4000L/600P DVA[0]=0:d4200:600 DVA[1]=0:904200:600 
 fletcher4 lzjb LE contiguous birth=3637241 fill=1 cksum=6a85
 cbad8b:60029922bbbf:2eb217a6bbefd5:1045aa85ce3521e3): error 50
 Aug  5 12:01:27 omases11 unix: [ID 10 kern.notice] 
 Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279aac0 
 zfs:zfsctl_ops_root+3008f24c ()
 Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279aad0 
 zfs:zio_next_stage+65 ()
 Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab00 
 zfs:zio_wait_for_children+49 ()
 Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab10 
 zfs:zio_wait_children_done+15 ()
 Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab20 
 zfs:zio_next_stage+65 ()
 Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab60 
 zfs:zio_vdev_io_assess+84 ()
 Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fe800279ab70 
 zfs:zio_next_stage+65 ()
 Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279abd0 
 zfs:vdev_mirror_io_done+c1 ()
 Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279abe0 
 zfs:zio_vdev_io_done+14 ()
 Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279ac60 
 genunix:taskq_thread+bc ()
 Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fe800279ac70 
 unix:thread_start+8 ()
 Aug  5 12:01:28 omases11 unix: [ID 10 kern.notice] 
 Aug  5 12:01:28 omases11 genunix: [ID 672855 kern.notice] syncing file 
 systems...
 Aug  5 12:01:28 omases11 genunix: [ID 733762 kern.notice]  7
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-19 Thread Richard Elling
Borys Saulyak wrote:
 May I remind you that I issue occurred on Solaris 10, not on OpenSolaris.

   

I believe you.  If you review the life cycle of a bug,
http://www.sun.com/bigadmin/hubs/documentation/patch/patch-docs/abugslife.pdf

then you will recall that bugs are fixed in NV and then
backported to Solaris 10 as patches.  We would all appreciate
a more rapid patch availability process for Solaris 10, but that
is a discussion more appropriate for another forum.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-18 Thread Borys Saulyak
Suppose that ZFS detects an error in the first
 case.  It can't tellbr
 the storage array something's wrong, please
 fix it (since thebr
 storage array doesn't provide for this with
 checksums and intelligentbr
 recovery), so all it can do is tell the user
 this file is corrupt,br
 recover it from backups.br
Just to remind you. System was working fine with no sign of any failures. 
Data got corrupted at export operation. If storage was somehow misbehaving I 
would expect ZFS to complain about it on any operation which did not finish 
succesfully.  I had NONE issues on the system with quite extensive read/write 
activity. System panicked on export and messed everything such that pools could 
not be imported. At what moment ZFS whould do better if I had even raid1 
configuration? I assume that this mess would be written on both disks and how 
this would help me in recovering. I do understand that having more disks would 
be better in case of failure of one or several of them. But only if it's 
related to disks. I'm almost sure disks were fine during failure. Is there 
anything you can improve apart from ZFS to cope with such issues?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-18 Thread Borys Saulyak
 Ask your hardware vendor. The hardware corrupted your
 data, not ZFS.
Right, that's all because of these storage vendors. All problems come from 
them! Never from ZFS :-) I have similar answer from them: ask Sun, ZFS is 
buggy. Our storage is always fine. That is really ridiculous! People pay huge 
money on storage and its support plus same for hardware and OS to get at the 
end both parties blaming each other with no intention to look deeper.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-18 Thread Richard Elling
Borys Saulyak wrote:
 Suppose that ZFS detects an error in the first
 case.  It can't tellbr
 the storage array something's wrong, please
 fix it (since thebr
 storage array doesn't provide for this with
 checksums and intelligentbr
 recovery), so all it can do is tell the user
 this file is corrupt,br
 recover it from backups.br
 
 Just to remind you. System was working fine with no sign of any failures. 
 Data got corrupted at export operation. If storage was somehow misbehaving I 
 would expect ZFS to complain about it on any operation which did not finish 
 succesfully. 

 From what I can predict, and *nobody* has provided any panic
messages to confirm, ZFS likely had difficulty writing.  For Solaris 10u5
and previous updates, ZFS will panic when writes cannot be completed
successfully.  This will be clearly logged.  For later releases, the policy
set in the pool's failmode property will be followed.  Or, to say this
another way, the only failmode property in Solaris 10u5 or NV builds
prior to build 77 (October 2007) is panic.  For later releases, the 
default
failmode is wait, but you can change it.

  I had NONE issues on the system with quite extensive read/write activity. 
 System panicked on export and messed everything such that pools could not be 
 imported. At what moment ZFS whould do better if I had even raid1 
 configuration? I assume that this mess would be written on both disks and how 
 this would help me in recovering. I do understand that having more disks 
 would be better in case of failure of one or several of them. But only if 
 it's related to disks. I'm almost sure disks were fine during failure. Is 
 there anything you can improve apart from ZFS to cope with such issues?
  
   

I think that nobody will be able to pinpoint the cause until
someone looks at the messages and fma logs.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-14 Thread Marc Bevand
Borys Saulyak borys.saulyak at eumetsat.int writes:
 
  Your pools have no redundancy...

 Box is connected to two fabric switches via different HBAs, storage is
 RAID5, MPxIP is ON, and all after that my pools have no redundancy?!?! 

As Darren said: no, there is no redundancy that ZFS can use. It is important 
to understand that your setup _prevents_ ZFS from self-healing itself. You 
need a ZFS-redundant pool (mirror, raidz or raidz2) or an fs with the 
attribute copies=2 to enable self-healing.

I would recommend you to make multiple LUNs visible to ZFS, and create 
redundant pools out of them. Browse he past 2 years or so of the zfs-discuss@ 
archives to give you an idea about how others with the same kind of hardware 
as you are doing it. For example, export each disk as a LUN, and create 
multiple raidz vdevs. Or create 2 hardware raid5 arrays and mirror them with 
ZFS, etc.

  ...and got corrupted, therefore there is nothing ZFS
 This is exactly what I would like to know. HOW this could happened? 

Ask your hardware vendor. The hardware corrupted your data, not ZFS.

 I'm just questioning myself. Is it really reliable filesystem as presented,
 or it's better to keep away from it on production environment.

Consider yourself lucky that the corruption was reported by ZFS. Other 
filesystems would have returned silently corrupted data and it would have 
maybe taken you days/weeks to troubleshoot it. As to myself, I use ZFS in 
production to backup 10+ million files, have seen occurences of hw causing 
data corruption, and have seen ZFS self-heal itself. So yes I trust it.

-marc


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-14 Thread Borys Saulyak
 I would recommend you to make multiple LUNs visible
 to ZFS, and create 
So, you are saying that ZFS will cope better with failures then any other 
storage system, right? I'm just trying to imagine...
I've got, lets say, 10 disks in the storage. They are currently in RAID5 
configuration and given to my box as one LUN. You suggest to create 10 LUNs 
instead, and give them to ZFS, where they will be part of one raidz, right? 
So what sort of protection will I gain by that? What kind of failure will be 
eliminated? Sorry, but I cannot catch it...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-14 Thread Will Murnane
On Thu, Aug 14, 2008 at 07:42, Borys Saulyak [EMAIL PROTECTED] wrote:
 I've got, lets say, 10 disks in the storage. They are currently in RAID5 
 configuration and given to my box as one LUN. You suggest to create 10 LUNs 
 instead, and give them to ZFS, where they will be part of one raidz, right?
 So what sort of protection will I gain by that? What kind of failure will be 
 eliminated? Sorry, but I cannot catch it...
Suppose that ZFS detects an error in the first case.  It can't tell
the storage array something's wrong, please fix it (since the
storage array doesn't provide for this with checksums and intelligent
recovery), so all it can do is tell the user this file is corrupt,
recover it from backups.

In the second case, ZFS can use the parity or mirrored data to
reconstruct plausible blocks, and then see if they match the checksum.
 Once it finds one that matches (which will happen as long as
sufficient parity remains), it can write the corrected data back to
the disk that had junk on it, and report to the user there were
problems over here, but I fixed them.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-14 Thread Chris Cosby
To further clarify Will's point...

Your current setup provides excellent hardware protection, but absolutely no
data protection.
ZFS provides excellent data protection when it has multiple copies of the
data blocks (1 hardware devices).

Combine the two, provide 1 hardware devices to ZFS, and you have a really
nice solution. If you can spare the space, setup your arrays and things to
provide exactly 2 identical LUNs to your ZFS box and create your zpool with
those in a mirror. The best of all worlds.


On Thu, Aug 14, 2008 at 9:41 AM, Will Murnane [EMAIL PROTECTED]wrote:

 On Thu, Aug 14, 2008 at 07:42, Borys Saulyak [EMAIL PROTECTED]
 wrote:
  I've got, lets say, 10 disks in the storage. They are currently in RAID5
 configuration and given to my box as one LUN. You suggest to create 10 LUNs
 instead, and give them to ZFS, where they will be part of one raidz, right?
  So what sort of protection will I gain by that? What kind of failure will
 be eliminated? Sorry, but I cannot catch it...
 Suppose that ZFS detects an error in the first case.  It can't tell
 the storage array something's wrong, please fix it (since the
 storage array doesn't provide for this with checksums and intelligent
 recovery), so all it can do is tell the user this file is corrupt,
 recover it from backups.

 In the second case, ZFS can use the parity or mirrored data to
 reconstruct plausible blocks, and then see if they match the checksum.
  Once it finds one that matches (which will happen as long as
 sufficient parity remains), it can write the corrected data back to
 the disk that had junk on it, and report to the user there were
 problems over here, but I fixed them.

 Will
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic at zpool import

2008-08-14 Thread Miles Nordin
 mb == Marc Bevand [EMAIL PROTECTED] writes:

mb Ask your hardware vendor. The hardware corrupted your data,
mb not ZFS.

You absolutely do NOT have adequate basis to make this statement.

I would further argue that you are probably wrong, and that I think
based on what we know that the pool was probably corrupted by a bug in
ZFS.  Simply because ZFS is (a) able to detect problems with hardware
when they exist, and (b) ringing an alarm bell of some sort, does NOT
exhonerate ZFS.  and AIUI that is your position.

Further, ZFS's ability to use zpool-level redundancy heal problems
created by its own bugs is not a cause for celebration or an
improvement over filesystems without bugs.  The virtue of the
self-healing is for when hardware actually does fail.  If self-healing
also helps with corruption created by bugs in ZFS, that does not shift
blame for unhealed bug-corruption back to the hardware, nor make ZFS
more robust than a different filesystem without corruption bugs.

mb Other filesystems would have returned silently corrupted
mb data and it would have maybe taken you days/weeks to
mb troubleshoot

possibly.  very likely, other filesystems would have handled it fine.

Boris, have a look at the two links I posted earlier about ``simon
sez, import!'' incantations, and required patches.

  http://opensolaris.org/jive/message.jspa?messageID=192572#194209
  http://sunsolve.sun.com/search/document.do?assetkey=1-66-233602-1

panic-on-import, sounds a lot like your problem.  Jonathan also posted
http://www.opensolaris.org/jive/thread.jspa?messageID=220125 which
seems to be incomplete instructions on how to choose a different
ueberblock which helped someone else with a corrupted pool, but the OP
in that thread never wrote it up in recipe form for ignorant sysadmins
like me to follow so it might not be widely useful.

In short, ZFS is unstable and prone to corruption, but may improve
substantially when patched up to the latest revision.  And many fixes
are available now, but some which are in SXCE right now will be
available in the stable binary-only Solaris not until u6 so we haven't
yet gained experience with how much improvement the patches provide.
And finally, there is no way to back up a ZFS filesystem with lots of
clones which is similarly robust to past Unix backup systems---your
best bet for space-efficient backups is to zfs send/recv data onto a
separate ZFS pool.

In more detail, I think there is some experience here that when a
single storage subsystem hosting both ZFS pools and vxfs filesystems
goes away, ZFS pools sometimes become corrupt while vxfs rolls its log
and continues.  so, in stable Sol10u5, ZFS is probably more prone to
metadata corruption causing whole-pool-failure than other logging
filesystems.  some fixes are around the corner, and others are
apparently the subject of some philosophical debate.


pgpWGngZltSqj.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >