Ivan wrote:
You told us that you like questions :-) Here's a new one:
Heh, I'm getting exactly what I asked for. ;)
Could you tell me how the OpenSolaris could be debugged? (Especially the VM part)?
I wish there was a great resource I could point to for this but there isn't one.
Having "grown up" in the Linux world, and worked on other systems from VxWorks to NT to LynxOS I believe that Solaris has the best debugging tools available to you out of the box (NT has better unbundled tools but they cost a squillion dollars to buy from commercial vendors). We have not done a good job of teaching folks how to effectively use the tools outside of Sun's walls. This is unfortunate.
Most folks in user-land seem to use dbx for debugging and DTrace pid provider for tracing these days. Java has its own facilities and it has a DTrace provider too.
Down in the kernel our tools of choice are mdb for static analysis, kmdb for in-situ debugging, and DTrace sdt, fbt, and sometimes syscall providers for tracing. The Panic! book (Brown/Drake) is really out of date these days, and I'm not aware of any such resource which is more up to date; it was an awesome resource seven years ago. Although the tools covered in that book are obsolete, the methodologies for debugging are solid.
An often overlooked (but very powerful) debugging tool is ASSERTs. We use them a lot in the kernel, to anticipate possible bugs, and catch them immediately (by forcing a panic) if an invalid condition is detected. In my development over the last six weeks I've had one bad trap panic which was a result of not initializing a variable -- after having the machine tip over I ran lint and sure enough, lint flagged it. :) Otherwise it's all ASSERT panics, which really speeds up debugging since I know exactly which line of code to go look at for the failure. The idea with ASSERTs is to be your own auditor, anticipating where things might go wrong when writing the code. When the system stays up under stress I can be fairly confident that things are working the way I intended.
(This is particularly a good thing in the VM system, since if *we* screw up, it can corrupt *your* data; good reason for paranoia)
It's hard to come up with a one-size-fits-all answer to a question like "how to debug XYZ". Debugging is one area where different developers have different needs from the tools. Mdb's modularity allows us to design our own debugging tools for the projects we work on -- this is invaluable since it can save a lot of time later. For instance in a project I'm currently on (you'll see more about it when the resource management roadmap is posted soon) I wanted to be able to drop into the kernel debugger and run consistency checks on every page_t in the system periodically. One of our core algorithms runs a state machine on the page_t, and there are only a limited number of valid states. So, I wrote, a dcmd in about three hours' time for mdb, recompiled the kmdb modules, rebooted, and ran it. In twenty seconds it was able to do what I would never be able to do, which was to check more than 100,000 page_ts for consistency.
If you're interested in debugging in the kernel the Panic! book is a good place to start (though as I noted it's out of date with respect to the tools). Internally within Sun there is a course which teaches core dump analysis which is more up-to-date with respect to tools but I do not believe there is an equivalent course for anyone outside of Sun (these courses are developed under contract and the materials are owned by the company which produces them rather than Sun, unfortunately).
The mdb community and mdb-discuss (which should probably be broadened to "debugging community" and "debugging discussion") would be a great place for further discussions if you have detailed questions.
- Eric _______________________________________________ opensolaris-code mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/opensolaris-code
