Re: [zfs-discuss] Help! System panic when pool imported

2009-10-20 Thread Albert Chin
On Mon, Oct 19, 2009 at 09:02:20PM -0500, Albert Chin wrote:
 On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens wrote:
  Thanks for reporting this.  I have fixed this bug (6822816) in build  
  127.
 
 Thanks. I just installed OpenSolaris Preview based on 125 and will
 attempt to apply the patch you made to this release and import the pool.

Did the above and the zpool import worked. Thanks!

  --matt
 
  Albert Chin wrote:
  Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
  snapshot a few days ago:
# zfs snapshot a...@b
# zfs clone a...@b tank/a
# zfs clone a...@b tank/b
 
  The system started panicing after I tried:
# zfs snapshot tank/b...@backup
 
  So, I destroyed tank/b:
# zfs destroy tank/b
  then tried to destroy tank/a
# zfs destroy tank/a
 
  Now, the system is in an endless panic loop, unable to import the pool
  at system startup or with zpool import. The panic dump is:
panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == 
  zap_remove_int(mos, ds_prev-ds_phys-ds_next_clones_obj, obj, tx) (0x0 == 
  0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512
 
ff00102468d0 genunix:assfail3+c1 ()
ff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
ff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
ff0010246b10 zfs:dsl_pool_sync+196 ()
ff0010246ba0 zfs:spa_sync+32a ()
ff0010246c40 zfs:txg_sync_thread+265 ()
ff0010246c50 unix:thread_start+8 ()
 
  We really need to import this pool. Is there a way around this? We do
  have snv_114 source on the system if we need to make changes to
  usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the zfs
  destroy transaction never completed and it is being replayed, causing
  the panic. This cycle continues endlessly.
 

 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 -- 
 albert chin (ch...@thewrittenword.com)
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help! System panic when pool imported

2009-10-19 Thread Matthew Ahrens
Thanks for reporting this.  I have fixed this bug (6822816) in build 
127.  Here is the evaluation from the bug report:


The problem is that the clone's dsobj does not appear in the origin's 
ds_next_clones_obj. 

The bug can occur can occur under certain circumstances if there was a 
botched upgrade when doing zpool upgrade from pool version 10 or 
earlier to version 11 or later, while there was a clone in the pool.


The problem is caused because upgrade_clones_cb() failed to call 
dmu_buf_will_dirty(origin-ds_dbuf).


This bug can have several effects:

1. assertion failure from dsl_dataset_destroy_sync()
2. assertion failure from dsl_dataset_snapshot_sync()
3. assertion failure from dsl_dataset_promote_sync()
4. incomplete scrub or resilver, potentially leading to data loss

The fix will address the root cause, and also work around all of these 
issues on pools that have already experienced the botched upgrade, 
whether or not they have encountered any of the above effects.


Anyone who may have a botched upgrade should run zpool scrub after 
upgrading to bits with the fix in place (build 127 or later).


--matt

Albert Chin wrote:

Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
snapshot a few days ago:
  # zfs snapshot a...@b
  # zfs clone a...@b tank/a
  # zfs clone a...@b tank/b

The system started panicing after I tried:
  # zfs snapshot tank/b...@backup

So, I destroyed tank/b:
  # zfs destroy tank/b
then tried to destroy tank/a
  # zfs destroy tank/a

Now, the system is in an endless panic loop, unable to import the pool
at system startup or with zpool import. The panic dump is:
  panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == zap_remove_int(mos, 
ds_prev-ds_phys-ds_next_clones_obj, obj, tx) (0x0 == 0x2), file: 
../../common/fs/zfs/dsl_dataset.c, line: 1512

  ff00102468d0 genunix:assfail3+c1 ()
  ff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
  ff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
  ff0010246b10 zfs:dsl_pool_sync+196 ()
  ff0010246ba0 zfs:spa_sync+32a ()
  ff0010246c40 zfs:txg_sync_thread+265 ()
  ff0010246c50 unix:thread_start+8 ()

We really need to import this pool. Is there a way around this? We do
have snv_114 source on the system if we need to make changes to
usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the zfs
destroy transaction never completed and it is being replayed, causing
the panic. This cycle continues endlessly.

  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help! System panic when pool imported

2009-10-19 Thread Albert Chin
On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens wrote:
 Thanks for reporting this.  I have fixed this bug (6822816) in build  
 127.

Thanks. I just installed OpenSolaris Preview based on 125 and will
attempt to apply the patch you made to this release and import the pool.

 --matt

 Albert Chin wrote:
 Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
 snapshot a few days ago:
   # zfs snapshot a...@b
   # zfs clone a...@b tank/a
   # zfs clone a...@b tank/b

 The system started panicing after I tried:
   # zfs snapshot tank/b...@backup

 So, I destroyed tank/b:
   # zfs destroy tank/b
 then tried to destroy tank/a
   # zfs destroy tank/a

 Now, the system is in an endless panic loop, unable to import the pool
 at system startup or with zpool import. The panic dump is:
   panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == 
 zap_remove_int(mos, ds_prev-ds_phys-ds_next_clones_obj, obj, tx) (0x0 == 
 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512

   ff00102468d0 genunix:assfail3+c1 ()
   ff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
   ff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
   ff0010246b10 zfs:dsl_pool_sync+196 ()
   ff0010246ba0 zfs:spa_sync+32a ()
   ff0010246c40 zfs:txg_sync_thread+265 ()
   ff0010246c50 unix:thread_start+8 ()

 We really need to import this pool. Is there a way around this? We do
 have snv_114 source on the system if we need to make changes to
 usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the zfs
 destroy transaction never completed and it is being replayed, causing
 the panic. This cycle continues endlessly.

   

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help! System panic when pool imported

2009-09-27 Thread Andrew
This is what my /var/adm/messages looks like:

Sep 27 12:46:29 solaria genunix: [ID 403854 kern.notice] assertion failed: ss 
== NULL, file: ../../common/fs/zfs/space_map.c, line: 109
Sep 27 12:46:29 solaria unix: [ID 10 kern.notice]
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a97a0 
genunix:assfail+7e ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9830 
zfs:space_map_add+292 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a98e0 
zfs:space_map_load+3a7 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9920 
zfs:metaslab_activate+64 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a99e0 
zfs:metaslab_group_alloc+2b7 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9ac0 
zfs:metaslab_alloc_dva+295 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9b60 
zfs:metaslab_alloc+9b ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9b90 
zfs:zio_dva_allocate+3e ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9bc0 
zfs:zio_execute+a0 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9c40 
genunix:taskq_thread+193 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9c50 
unix:thread_start+8 ()
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help! System panic when pool imported

2009-09-27 Thread Albert Chin
On Sun, Sep 27, 2009 at 10:06:16AM -0700, Andrew wrote:
 This is what my /var/adm/messages looks like:
 
 Sep 27 12:46:29 solaria genunix: [ID 403854 kern.notice] assertion failed: ss 
 == NULL, file: ../../common/fs/zfs/space_map.c, line: 109
 Sep 27 12:46:29 solaria unix: [ID 10 kern.notice]
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a97a0 
 genunix:assfail+7e ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9830 
 zfs:space_map_add+292 ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a98e0 
 zfs:space_map_load+3a7 ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9920 
 zfs:metaslab_activate+64 ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a99e0 
 zfs:metaslab_group_alloc+2b7 ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9ac0 
 zfs:metaslab_alloc_dva+295 ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9b60 
 zfs:metaslab_alloc+9b ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9b90 
 zfs:zio_dva_allocate+3e ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9bc0 
 zfs:zio_execute+a0 ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9c40 
 genunix:taskq_thread+193 ()
 Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9c50 
 unix:thread_start+8 ()

I'm not sure that aok=1/zfs:zfs_recover=1 would help you because
zfs_panic_recover isn't in the backtrace (see
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6638754).
Sometimes a Sun zfs engineer shows up on the freenode #zfs channel. I'd
pop up there and ask. There are somewhat similar bug reports at
bugs.opensolaris.org. I'd post a bug report just in case.

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help! System panic when pool imported

2009-09-26 Thread Victor Latushkin

Richard Elling wrote:

Assertion failures indicate bugs. You might try another version of the OS.
In general, they are easy to search for in the bugs database.  A quick
search reveals
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6822816
but that doesn't look like it will help you.  I suggest filing a new bug at
the very least.


I have redispatched 6822816, so it needs to be reevaluated since more 
information is available now.


victor


On Sep 24, 2009, at 10:21 PM, Albert Chin wrote:


Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
snapshot a few days ago:
 # zfs snapshot a...@b
 # zfs clone a...@b tank/a
 # zfs clone a...@b tank/b

The system started panicing after I tried:
 # zfs snapshot tank/b...@backup

So, I destroyed tank/b:
 # zfs destroy tank/b
then tried to destroy tank/a
 # zfs destroy tank/a

Now, the system is in an endless panic loop, unable to import the pool
at system startup or with zpool import. The panic dump is:
 panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == 
zap_remove_int(mos, ds_prev-ds_phys-ds_next_clones_obj, obj, tx) 
(0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512


 ff00102468d0 genunix:assfail3+c1 ()
 ff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
 ff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
 ff0010246b10 zfs:dsl_pool_sync+196 ()
 ff0010246ba0 zfs:spa_sync+32a ()
 ff0010246c40 zfs:txg_sync_thread+265 ()
 ff0010246c50 unix:thread_start+8 ()

We really need to import this pool. Is there a way around this? We do
have snv_114 source on the system if we need to make changes to
usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the zfs
destroy transaction never completed and it is being replayed, causing
the panic. This cycle continues endlessly.

--
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help! System panic when pool imported

2009-09-25 Thread Albert Chin
On Fri, Sep 25, 2009 at 05:21:23AM +, Albert Chin wrote:
 [[ snip snip ]]
 
 We really need to import this pool. Is there a way around this? We do
 have snv_114 source on the system if we need to make changes to
 usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the zfs
 destroy transaction never completed and it is being replayed, causing
 the panic. This cycle continues endlessly.

What are the implications of adding the following to /etc/system:
  set zfs:zfs_recover=1
  set aok=1

And importing the pool with:
  # zpool import -o ro

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss