Werner Almesberger wrote: > Mike (mwester) wrote: >> None of the distros, and certainly not the kernel, are stable enough to >> ensure that any such code would actually remain running -- for every >> "solution" you come up with to insulate such a daemon from a failure, >> somebody somewhere will come up with another way to make it fail. > > Well, if the kernel is dead, also the current hack wouldn't help you.
"Dead" kernels are relatively easy to fix compared to the problems we deal with now. There are numerous ways the current kernel "injures" applications in ways it shouldn't, and the apps just aren't resilient enough yet. Not to mention the many ways in which one application can injure others, from "SIGKILL" to just running the system out of memory or some other resource. > Besides, that platform-specific code would need to be separated from > the actual driver anyway, so the choice is not between leaving it > alone and replacing it, but between refactoring that mess and getting > rid of it for good. > >> Of course Openmoko can just unilaterally do what you state, and force >> developers for every distro to invest huge amounts of effort in >> developing fail-safe watchdog daemons for their distros. > > Yeah, that would indeed be unreasonable. So I sat down and wrote it > myself. Took about one and a half hours, most of which was spent on > figuring out how to interface with events and i2c-dev, which I've > never done before. > > http://svn.openmoko.org/developers/werner/neodog/ Well, I'm not biting on that bait. I said in my original email that we could all waste time by doing silly stuff like proposing something, and someone else finding a use case where the daemon would fail, and someone else doing something to address that, etc, etc. I'm sure it's good code, and I'm sure you've taken care that it will continue to run when there's no memory left, or when other common resources are exhausted, or access to them is locked up, and I'm sure it's statically linked to protect from package corruption -- but it's still user-space, and I'm quite certain there are still failure modes where it will die along with all userspace, when a simple bit of kernel code can still quite capably run. > I agree with you that the kernel still needs lots of work. But > wasting time on wrapping some more rolls of band-aid around things > we can fix properly doesn't help ... Sometimes the band-aid needs to be applied directly to the wound to be effective. Regards, Mike
