On 05/08/07 16:38, Garrett D'Amore wrote:
[snip]
This sounds like an "edge" case to me. I.e. using kstats to attempt to
locate a kernel bug. (Kstats are unlikely to explain the "why" for a
failure-to-detach, or at least, they are for a driver that is detaching,
since the kstats of interest are probably still in the kernel at detach
time. (Generally clobbering freeing kstats is one of the last things a
driver does on the way out. Note that in the nemo case this isn't quite
precisely true... but there may be some adjustment we can do there.)
The second question I have is, where is this most useful? If it is only
for DEBUG kernels, I can imagine that we could create a DEBUG behavior
where kstat_delete() doesn't really delete the stat, but stat, but puts
it into some kind of historical archive. (Keeping the most recent, or N
most recent, stats from each driver.) This could facilitate debug,
without confusing _administrative_ use, and without interfering with
normal driver use. Further, this functionality could (should?) be made
independent of KSTAT_FLAG_PERSISTENT (or any other flag). And it
doesn't have to be DEBUG kernels only, it could be tunable via an
/etc/system value (historical_kstats = 1 in /etc/system?)
I'm thinking of production kernels, not DEBUG ones. That is, I am considering
service etc having to root-cause a failure based on what evidence can be
found, and not thinking of a developer or similar trying to locate
a kernel bug. Admittedly it would be nicer to have some more well-defined
history trail, but you take what you can get in post-mortem debugging.
Kstats quite often include error count info, reset info etc which may
be relevant or at least interesting in debugging failed DRs etc.
Gavin
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code