2012-02-04 18:27, Jim Klimov wrote:
panicstr = BAD TRAP: type=e (#pf Page fault) rp=ff0010a5e920
addr=30 occurred in module zfs due to a NULL pointer dereference
panicstack = unix:die+dd () | unix:trap+1799 () | unix:cmntrap+e6
() | zfs:ddt_phys_decref+c () | zfs:zio_ddt_free+5c () |
zfs:zio_execute+8d () | genunix:taskq_thread+285 () |
unix:thread_start+8 () | crashtime = 1328314064
panic-time = February 4, 2012 04:07:44 AM MSK MSK
(end fault-list[0])
I've finally looked into the code, and the problem seems
weird indeed. With properly wrong on-disk data the code
can take such a path that it tries to dereference a NULL
pointer. This is a bug, methinks:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zio.c#zio_ddt_free
2100 static int
2101 zio_ddt_free(zio_t *zio)
2102 {
2103 spa_t *spa = zio-io_spa;
2104 blkptr_t *bp = zio-io_bp;
2105 ddt_t *ddt = ddt_select(spa, bp);
2106 ddt_entry_t *dde;
2107 ddt_phys_t *ddp;
2108
2109 ASSERT(BP_GET_DEDUP(bp));
2110 ASSERT(zio-io_child_type == ZIO_CHILD_LOGICAL);
2111
2112 ddt_enter(ddt);
2113 freedde = dde = ddt_lookup(ddt, bp, B_TRUE);
2114 ddp = ddt_phys_select(dde, bp);
2115 ddt_phys_decref(ddp);
2116 ddt_exit(ddt);
2117
2118 return (ZIO_PIPELINE_CONTINUE);
2119 }
The way I see it, this function:
2109) asserts that the dedup bit is set on the blockpointer,
2113) selects a DDT entry matching the BP with ddt_lookup()
Strangely for me, it requests creation of a new DDT
entry if a matching one doesn't exist (B_TRUE) -
even though this is a DDT-free routine.
I doubt this autocreated entry gets a DVA assigned,
but I did not trace the code that far to be certain.
2114) then it tries to find the ddp pointer to the DDT
entry with ddt_phys_select(), which can validly return
NULL if no DDT entries, with same DVA[0] and same
phys_birth as the bp, exist. I speculated above that
an autocreated DDT entry might have no DVA yet, maybe.
I also wonder about cases where phys_birth TXGs would
differ for some reason, even if an otherwise matching
DDT entry exists (BP rewrite in future, some lag in
committing-to-disk today)?
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/ddt.c#ddt_phys_select
2115) ddt_phys_decref() just tries to use ddp as a pointer
to structure without checking if it is NULL itself:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/ddt.c#ddt_phys_decref
309 void
310 ddt_phys_decref(ddt_phys_t *ddp)
311 {
312 ASSERT((int64_t)ddp-ddp_refcnt 0);
313 ddp-ddp_refcnt--;
314 }
315
According to my stacktrace in the kernel panic message,
this should be where the system panics and fails.
Seemingly this should work if on-disk data is consistent.
Since I've had some strangeness regarding corrupt data
blocks pointed to by DDT, with some hickups and reboots
during or soon-after repair attempts, I can't vouch that
my on-disk data is consistent, or that the whole block
tree is readable and accessible to the OS now.
But the code allows for failure is something goes wrong.
I'd add a lot more NULL checks for values (especially
pointers) returned by functions called from zio_ddt_free()
and/or sanity checks for values fed into such functions.
Actually, I'd rewrite this part of zio_ddt_free() like
this:
- freedde = dde = ddt_lookup(ddt, bp, B_TRUE);
+ freedde = dde = ddt_lookup(ddt, bp, B_FALSE);
+ if (dde) {
ddp = ddt_phys_select(dde, bp);
+ if (ddp) {
ddt_phys_decref(ddp);
+ }
+ }
ddt_exit(ddt);
If ddt_lookup() finds no matching DDE, fine - we're freed.
If ddt_phys_select() finds no DDP, not so fine, but that's
not a good reason to kernel-panic. Perhaps log a message
and maybe leave a leaked block? Or relax the phys_birth check?
Am I mistaken anywhere? :)
Thanks,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss