Darren,

I looked a bit at your dumps... in both cases, the problem is that the
os_phys block that we read from the disk is garbage:

 > 0xffffffff9377b000::print objset_phys_t
{
     os_meta_dnode = {
         dn_type = 0
         dn_indblkshift = 0
         dn_nlevels = 0
         dn_nblkptr = 0
         dn_bonustype = 0
         dn_checksum = 0
         dn_compress = 0
         dn_flags = 0
         dn_datablkszsec = 0x3
         dn_bonuslen = 0
         dn_pad2 = [ 0, 0, 0 ]
         dn_crypt = 0
         dn_maxblkid = 0
         dn_used = 0
         dn_pad3 = [ 0, 0, 0, 0 ]
         dn_blkptr = [
             {
                 blk_dva = [
                     {
                         dva_word = [ 0, 0 ]
                     }
        ...

I checked the actual arc buf this came from, and it looks the same.  So
the buf was successfully read, and it checksummed, but it doesn't have
good data.  This pretty much says that the problem is on the write side.
When we wrote out the root block in dmu_objset_sync(), we must have
written garbage.  I'm not yet quite sure how this happened... perhaps
something is messed up in your write path changes
(arc_write->zio_write->...), but its not obvious.  I'll investigate some
more when I get a chance....

-Mark

Darren J Moffat wrote:
> I really need some help on this.  Without help the ZFS crypto project is 
>   stalled.
> 
> I've updated my bits to the ON gate as of last night.  The way I
> recreate this is slightly different but the assert is still the same.
> 
> I can create a pool with my bits and export it; when I import it I
> get the dn_levels assert.
> 
> Please I really need help.
> 
> Darren J Moffat wrote:
> 
>> Using the ZFS crypto bits, see [1] for webrev, which are in sync with ON
>>  as of 2006-09-12 (ie they include the BrandZ stuff and the changes
>> that Eric putback on the 12th).
>>
>> [1] http://cr.grommit.com/~darrenm/zfs-crypto/
>>
>>
>> I created a new pool:
>>
>> # zpool create -f tank c0t1d0
>>
>> I then created four new file systems
>>
>> # zfs create -o encryption=aes256 tank/cipher-aes256
>> # zfs create -o encryption=aes128 tank/cipher-aes128
>> # zfs create -o encryption=aes192 tank/cipher-aes192
>> # zfs create -o encryption=off tank/clear
>>
>> I listed the encryption property, then I exported the pool.
>>
>> When I did so the machine panic'd thus:
>>
>>
>> panic[cpu1]/thread=ffffffffbe728880: assertion failed: dn->dn_nlevels >
>> level (0x0 > 0x0), file: ../../common/fs/zfs/dbuf.c, line: 1523
>>
>> fffffe8000bb4730 genunix:assfail3+b9 ()
>> fffffe8000bb47d0 zfs:dbuf_hold_impl+329 ()
>> fffffe8000bb4810 zfs:dbuf_hold+2b ()
>> fffffe8000bb48a0 zfs:dnode_hold_impl+bd ()
>> fffffe8000bb48d0 zfs:dnode_hold+2b ()
>> fffffe8000bb4950 zfs:dmu_buf_hold+45 ()
>> fffffe8000bb4a20 zfs:zap_lockdir+58 ()
>> fffffe8000bb4aa0 zfs:zap_lookup+4d ()
>> fffffe8000bb4b10 zfs:dsl_pool_open+94 ()
>> fffffe8000bb4bb0 zfs:spa_load+566 ()
>> fffffe8000bb4c00 zfs:spa_tryimport+90 ()
>> fffffe8000bb4c50 zfs:zfs_ioc_pool_tryimport+31 ()
>> fffffe8000bb4cd0 zfs:zfsdev_ioctl+115 ()
>> fffffe8000bb4d10 genunix:cdev_ioctl+48 ()
>> fffffe8000bb4d50 specfs:spec_ioctl+86 ()
>> fffffe8000bb4db0 genunix:fop_ioctl+37 ()
>> fffffe8000bb4eb0 genunix:ioctl+16b ()
>> fffffe8000bb4f00 unix:brand_sys_syscall32+2a1 ()
>>
>> For some reason savecore didn't grab the dump so I tried again:
>>
>> This time I only go the first two filesystems created and I got
>> a different panic:
>>
>> panic[cpu1]/thread=fffffe800036bc80: assertion failed: (((bp)->blk_birth
>> == 0) ? 0 : ((((((bp)->blk_prop[0]) >> (0)) & ((1ULL << (16)) - 1)) +
>> (1)) << (9))) == db->db_level == 1 ? dn->dn_datablksz :
>> (1<<dn->dn_phys->dn_indblkshift) (0x200 == 0x4000), file:
>> ../../common/fs/zfs/dbuf.c, line: 2186
>>
>> fffffe800036b4d0 genunix:assfail3+b9 ()
>> fffffe800036b820 zfs:dbuf_write_done+920 ()
>> fffffe800036b880 zfs:arc_write_done+1d3 ()
>> fffffe800036ba10 zfs:zio_done+2e4 ()
>> fffffe800036ba40 zfs:zio_next_stage+112 ()
>> fffffe800036ba90 zfs:zio_wait_for_children+56 ()
>> fffffe800036bab0 zfs:zio_wait_children_done+20 ()
>> fffffe800036bae0 zfs:zio_next_stage+112 ()
>> fffffe800036bb30 zfs:zio_vdev_io_assess+140 ()
>> fffffe800036bb60 zfs:zio_next_stage+112 ()
>> fffffe800036bbb0 zfs:vdev_mirror_io_done+377 ()
>> fffffe800036bbd0 zfs:zio_vdev_io_done+26 ()
>> fffffe800036bc60 genunix:taskq_thread+1dc ()
>> fffffe800036bc70 unix:thread_start+8 ()
>>
>> Then on reboot from that panic we see this one:
>>
>> panic[cpu0]/thread=ffffffff870a0c00: assertion failed: dn->dn_nlevels >
>> level (0x0 > 0x0), file: ../../common/fs/zfs/dbuf.c, line: 1523
>>
>> fffffe80005806a0 genunix:assfail3+b9 ()
>> fffffe8000580740 zfs:dbuf_hold_impl+329 ()
>> fffffe8000580780 zfs:dbuf_hold+2b ()
>> fffffe8000580810 zfs:dnode_hold_impl+bd ()
>> fffffe8000580840 zfs:dnode_hold+2b ()
>> fffffe80005808c0 zfs:dmu_buf_hold+45 ()
>> fffffe8000580990 zfs:zap_lockdir+58 ()
>> fffffe8000580a10 zfs:zap_lookup+4d ()
>> fffffe8000580a80 zfs:dsl_pool_open+94 ()
>> fffffe8000580b20 zfs:spa_load+566 ()
>> fffffe8000580b90 zfs:spa_open_common+c5 ()
>> fffffe8000580c00 zfs:spa_get_stats+4a ()
>> fffffe8000580c50 zfs:zfs_ioc_pool_stats+32 ()
>> fffffe8000580cd0 zfs:zfsdev_ioctl+115 ()
>> fffffe8000580d10 genunix:cdev_ioctl+48 ()
>> fffffe8000580d50 specfs:spec_ioctl+86 ()
>> fffffe8000580db0 genunix:fop_ioctl+37 ()
>> fffffe8000580eb0 genunix:ioctl+16b ()
>> fffffe8000580f00 unix:brand_sys_syscall32+2a1 ()
>>
>> ie the same as the panic from the first attempt.
>>
>> So I rebooted into failsafe and cleared the zpool.cache file so I could
>> come back up and get the dump (which I did this time).
>>
>> I then bfu'd the machine to the base ON nightly that I'm in sync with,
>> to check everything is okay in the base and to create the pool there.
>>
>> So I did this:
>>
>> # zpool create -f tank c0t1d0
>> # zfs create tank/clear
>> # zpool export tank
>>
>> Then bfu'd back to the zfs-crypto bits again and rebooted and attempted
>> to import the pool which was created with the base ON bits:
>>
>>
>> banking# zpool import
>>
>> panic[cpu2]/thread=ffffffff82b9c3e0: assertion failed:
>> dn->dn_indblkshift <= 17 (0xb1 <= 0x11), file:
>> ../../common/fs/zfs/dnode.c, line: 136
>>
>> fffffe8000b41850 genunix:assfail3+b9 ()
>> fffffe8000b41890 zfs:dnode_verify+320 ()
>> fffffe8000b418d0 zfs:dnode_special_open+2c ()
>> fffffe8000b41aa0 zfs:dmu_objset_open_impl+3aa ()
>> fffffe8000b41b10 zfs:dsl_pool_open+59 ()
>> fffffe8000b41bb0 zfs:spa_load+566 ()
>> fffffe8000b41c00 zfs:spa_tryimport+90 ()
>> fffffe8000b41c50 zfs:zfs_ioc_pool_tryimport+31 ()
>> fffffe8000b41cd0 zfs:zfsdev_ioctl+115 ()
>> fffffe8000b41d10 genunix:cdev_ioctl+48 ()
>> fffffe8000b41d50 specfs:spec_ioctl+86 ()
>> fffffe8000b41db0 genunix:fop_ioctl+37 ()
>> fffffe8000b41eb0 genunix:ioctl+16b ()
>> fffffe8000b41f00 unix:brand_sys_syscall32+2a1 ()
>>
>>
>> The dumps are available on SWAN from:
>>     /net/borg.sfbay/cube/builds/darrenm/zfs-crypto-dumps
>>
>> [note borg is sparcv9 but the dumps are from an amd64 kernel ]
>>
>> I think I must have missed something with the merging in of the crypto
>> pipeline changes but I can't see what it is.  These panics are in very
>> strange places.
>>
>> I need help, the ZFS crypto project is halted until this is resolved.
>>
>> Thanks in advance.
>>
> 
> 

Reply via email to