On Mar 28, 2013, at 9:39 AM, Jim Klimov <[email protected]> wrote:

> On 2013-03-28 16:18, Sašo Kiselkov wrote:
>> I'm building a system that's relying as much as possible on stock parts,
>> so custom kernel modules and hacking is something I'd like to avoid. I'm
>> not going to be around forever to keep the system going, or to
>> continually work on ways of deploying an old hack on a new install.
> 
> I know *you* do have better contributions to make, but a watchdog driver
> is AFAIK about knowing what byte to write to what IO port to set, reset
> and query the timeout, and possibly configure what the watchdog does
> when the timer expires without updates. This info might be gleaned from
> Linux and BSD drivers for different watchdog chips.
> 
> I think it might be a useful project for a student to make.
> 
> Possibly too low-profile for a GSoC, but good to learn about driver
> development, porting code, etc. And quite useful for the community ;)
> As a result of such a project, we'd get one more kernel-hacker ;)

I've done such work for NetBSD systems.  These things are usually pretty 
trivial from a hardware standpoint.

The harder thing is when these things are exposed as "registers" that are on an 
otherwise bog-standard part.  In that case, you have to either modify an 
existing driver, or come up with some more tricky hack.  (Its easier when this 
function is exposed as a separate PCI function or something like that.  But 
that's very rarely the case with something like this.  Usually they are part of 
the low level system chipset -- they kind of need be in order to do something 
like generate an NMI or cause a power reset.)

Then the other side of the problem is determining how you are going to trigger 
this.  The usual thing is to hook this up to a system timer, which will catch 
hard hangs.  But many "apparent" hangs are really not hangs in this sense -- 
there could be a high-priority process that is starving other processing for 
example, or a deadlock in the filesystem.  Those kinds of "hangs" won't be 
detected by such a deadman.

The ideal type of design would be to have a user-space accessible deadman, that 
allowed user processes to configure, and then tickle the deadman to keep it 
alive.  This would allow you to have a critical user space process validate 
that *it* is still serving whatever it needs to.  This kind of task requires a 
little design work -- and probably should be hooked back into some common 
deadman framework.  NetBSD has such a framework if I recall correctly.  This 
project would be in-scope for GSoC effort, because I can see a few other 
options like using the system timer as a deadman (its already there btw!) if no 
other hardware watchdog is present.  The framework should abstract all those 
and present a single syscall or ioctl interface to manage it.

        - Garrett
> 
> //Jim
> 
> 
> _______________________________________________
> oi-dev mailing list
> [email protected]
> http://openindiana.org/mailman/listinfo/oi-dev


_______________________________________________
oi-dev mailing list
[email protected]
http://openindiana.org/mailman/listinfo/oi-dev

Reply via email to