2012-03-27 11:14, Carsten John write:
I saw a similar effect some time ago on a opensolaris box (build 111b). That
time my final solution was to copy over the read only mounted stuff to a newly
created pool. As it is the second time this failure occures (on different
machines) I'm really concerned about overall reliability....
A couple of months ago I reported a similar issue (though with
a different stacktrace and code path). I tracked it to code in
freeing of deduped blocks where a valid code path could return
a NULL pointer, but further routines used the pointer as if it
is always valid - thus a NULL dereference when the pool was
imported RW and tried to release blocks marked for deletion.
Adding a check for non-NULLness in my private rebuild of oi_151a
has fixed the issue. I wouldn't be surprised to see similar
slackiness in other parts of the code now. Not checking input
values in routines seems like an arrogant mistake waiting to
fire (and it did for us).
I am not sure how to make a webrev and ultimately a signed-off
contribution upstream, but I posted my patch and research on
the list and in illumos bugtracker.
I am not sure how you can fix a S11 system though.
If it is at zpool v28 or older, you can try to import it into
an openindiana installation, perhaps rebuilt for similar
patched code that would check for NULLs and fix your pool
(and then reuse it in S11 if you must). The source is there
on http://src.illumos.org and your stacktrace should tell you
in which functions you should start looking...
zfs-discuss mailing list