[rtl] Is this asking for trouble or is the bug elsewhere?

Norm Dresner Sun, 19 Dec 1999 10:13:53 -0800

In migrating a program from MSDOS/AMX into RTL, I needed to create a fixed-rate task dispatcher; in the MSDOS version we had a high-rate timer interrupt handler that would, at the appropriate rates, call the AMX kernel to trigger tasks labeled FRT50, FRT20, FRT12_5, etc which ran at 40 Hz, 25 Hz, 12.5 Hz, ...

The things that were in the fixed rate tasks were soft real-time functions like monitoring control panels, checking various (not so critical) peripherals, display tasks, etc

In order to try to duplicate this functionality, I created a few "devices" called, appropriately enough, /dev/FRT64, /dev/FRT32, /dev/FRT16, and so on. Each of these input character devices were nothing more than an entry into a waiting queue -- each read blocked until the next strike of the relevant rate controller's clock. This was done with a simple call to interruptible_sleep_on() as shown in Rubini's device driver book. The other end of the rate controller was in a real-time periodic task which, for other reasons, ran at 1280 Hz and subdivided that frequency down into the 64...2 Hz range and, for each striking of the rate's clock, called wake_up_interruptible() for the respective queue.

That was the theory, and it seemed to work perfectly, each rate being dispatched at exactly the right rate-- and at least when the system wasn't too loaded, without any missed cycles. BUT....

I wrote a monitoring task that displayed the contents of various communications channels in the system (there are hundreds of potential channels of which only about 70 or so are used. The monitor program allowed the user to specify from 1 to 6 channels to monitor and displayed the results in an nxterm-window using ncurses. Again, this seemed to work perfectly until...

I started a few monitoring windows -- since I expect that this is the way the system will actually be used -- and left the machine running while I attended to things on another machine on the network. When I came back a few hours later, the monitoring tasks had all hung. But they weren't (at least not that top reported) zombies, I just couldn't interrupt them or kill them, and they weren't consuming any CPU time. But the rate scheduler wasn't hung. I could just start up a few more monitoring windows and they'd behave quite well for a while, until they too hung.

Since the monitors can run for hours without failing, I am positive that there's no static logic failure in the code. The logic in the device driver is quite simple: when the appropriate read-request is received and the minor device number decoded (which encodes the rate), a call is made to sleep_on_interruptible(). That's all. When the block is broken later, the code returns a 1 to signal that one byte has been transferred -- even though no data was really moved and that's all of the processing in the read-routine.

When the 1280 Hz clock is handled in the periodic rt-linux task, it is counted down to 64 hz and then dispatched -- on different phases of the clock so no two rates ever expire at the same time -- by calling wake_up_interruptible -- and no other processing is done.

The potential problem that worries me is the advisability of calling the normal kernel's wake_up_interruptible() from a real-time task. Does anyone have any experience with this or knowledge that it shouldn't be done?

The environment is Zentropix's version of Red Hat 5.2 with the RTLinux kernel version 0.9J.

Thanks,
Norm
--- [rtl] --- To unsubscribe: echo "unsubscribe rtl" | mail [EMAIL PROTECTED] OR echo "unsubscribe rtl " | mail [EMAIL PROTECTED] ---- For more information on Real-Time Linux see: http://www.rtlinux.org/~rtlinux/

[rtl] Is this asking for trouble or is the bug elsewhere?

Reply via email to