RE: Kernel 2.2: tq_scheduler functions scheduling and waiting

2001-05-29 Thread Arthur Naseef

I have tested with a kernel thread running the tq_scheduler and it
is much more stable.  The kernel still ran into a problem in n_tty.c
in which the compiler optimized-out the check "if (!tty)" in
n_tty_set_termios(); I am still investigating the right solution to
this.

As a long term fix, I will review the 2.4 and latest 2.2 sources.

> Yes.  The situation where one task is on two waitqueues
> is rare, but does happen.  And yes, there is code out there
> which does a bare schedule() and *assumes* that once the
> schedule has returned, the thing it was waiting for has
> indeed occurred.
> 
> Generally this is poor practice - it's safer to loop
> over the schedule() call until the condition you're
> sleeping on has been tested.

I see your point.  It would prevent this type of problem if all code
waiting for conditions made certain those conditions were met.  However,
given the way the kernel works, it is not necessary to check unless the
task specifically expects more than one condition to awaken it - at
least it wasn't until tq_scheduler was introduced.  Actually, that is
not fair either - only when functions in tq_scheduler starting
"blocking" did this become a problem.

It would help me tremendously if these types of limitations and
requirements for working in the kernel were well documented.  It takes
significant effort to determine the requirements, and to verify that
my understanding is correct.

> 
> You really shouldn't be sleeping in this way on tq_scheduler
> if there's any way in which the sleep can take an extended
> period of time.  You may end up putting important kernel
> tasks to sleep.

I agree.  In addition, even if the tq_scheduler function did check for
its own condition, a problem still exists when the task returns to the
code using the first wait queue before its condition is met; since the
code using the second wait queue would set the task state to running
and would not set it back (which it couldn't without knowing the
conditions to check).

> 
> Best to use schedule_task(), or an independent kernel thread.
> 
> -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel 2.2: tq_scheduler functions scheduling and waiting

2001-05-29 Thread Andrew Morton

Arthur Naseef wrote:
> 
> Andrew:
> 
> Excellent.  I will look at the 2.4 sources.
> 
> In addition to the TASK_ZOMBIE issue you mention, I believe there
> is an issue of false termination of wait queues.  Consider this:
> 
> - Task places itself on a wait queue
> - Calls schedule()
> - tq_scheduler function does the same
> 
> Now, there are two events which could place the task in TASK_RUNNING
> and no clear way to differentiate.  And, since most of the kernel
> code does not check that the wait condition was actually met, this
> could lead to all types of problems, right?
> 

Yes.  The situation where one task is on two waitqueues
is rare, but does happen.  And yes, there is code out there
which does a bare schedule() and *assumes* that once the
schedule has returned, the thing it was waiting for has
indeed occurred.

Generally this is poor practice - it's safer to loop
over the schedule() call until the condition you're
sleeping on has been tested.

You really shouldn't be sleeping in this way on tq_scheduler
if there's any way in which the sleep can take an extended
period of time.  You may end up putting important kernel
tasks to sleep.

Best to use schedule_task(), or an independent kernel thread.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: Kernel 2.2: tq_scheduler functions scheduling and waiting

2001-05-29 Thread Arthur Naseef

Andrew:

Excellent.  I will look at the 2.4 sources.

In addition to the TASK_ZOMBIE issue you mention, I believe there
is an issue of false termination of wait queues.  Consider this:

- Task places itself on a wait queue
- Calls schedule()
- tq_scheduler function does the same

Now, there are two events which could place the task in TASK_RUNNING
and no clear way to differentiate.  And, since most of the kernel
code does not check that the wait condition was actually met, this
could lead to all types of problems, right?

-art

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Andrew Morton
Sent: Monday, May 28, 2001 10:28 PM
To: Arthur Naseef
Cc: [EMAIL PROTECTED]
Subject: Re: Kernel 2.2: tq_scheduler functions scheduling and waiting


Arthur Naseef wrote:
> 
> All:
> 
> I have been diagnosing kernel panics for over a week and I have
> concerns with the use of tq_scheduler for which I was hoping I
> could get some assistance.
> 
> Is it considered acceptable for functions in the tq_scheduler
> task list to call schedule?  Is it acceptable for such functions
> to wait on wait queues?  What limitations exist?

When a task wants to exit, it cleans up all its stuff,
sets its state to TASK_ZOMBIE and then calls schedule().
The scheduler takes it off the runqueue and the task
is never again executed.  It's just a couple of stack
pages which are waiting for someone in wait4() to release.

But imagine what happens if the TASK_ZOMBIE task hits
schedule() and finds a tq_scheduler task to run.  And that
task calls schedule().  In state TASK_ZOMBIE.  Messy.

At the very least, the schedule() call will never return.

If the tq_scheduler task sets current->state to 
TASK_[UN]INTERRUPTIBLE (as it should) before calling
schedule() then it has overwritten TASK_ZOMBIE and the
task which is trying to exit has become magically
resurrected.  As far as I can tell, the "dead" task
will run again, do the `fake_volatile' thing in do_exit()
and try to go zombie again.

It would be very interesting to change the test in
schedule():

sti();
-   if (tq_scheduler)
+   if (tq_scheduler && current->state != TASK_ZOMBIE)
goto handle_tq_scheduler;

It's all rather unpleasant, and tq_scheduler was killed
in 2.4.  I suggest you take a look at all the serial
drivers in 2.4, see how I converted them to use schedule_task().
Someone kindly ported schedule_task() to 2.2.recent, so you
should be able to use that in the same way.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel 2.2: tq_scheduler functions scheduling and waiting

2001-05-28 Thread Andrew Morton

Arthur Naseef wrote:
> 
> All:
> 
> I have been diagnosing kernel panics for over a week and I have
> concerns with the use of tq_scheduler for which I was hoping I
> could get some assistance.
> 
> Is it considered acceptable for functions in the tq_scheduler
> task list to call schedule?  Is it acceptable for such functions
> to wait on wait queues?  What limitations exist?

When a task wants to exit, it cleans up all its stuff,
sets its state to TASK_ZOMBIE and then calls schedule().
The scheduler takes it off the runqueue and the task
is never again executed.  It's just a couple of stack
pages which are waiting for someone in wait4() to release.

But imagine what happens if the TASK_ZOMBIE task hits
schedule() and finds a tq_scheduler task to run.  And that
task calls schedule().  In state TASK_ZOMBIE.  Messy.

At the very least, the schedule() call will never return.

If the tq_scheduler task sets current->state to 
TASK_[UN]INTERRUPTIBLE (as it should) before calling
schedule() then it has overwritten TASK_ZOMBIE and the
task which is trying to exit has become magically
resurrected.  As far as I can tell, the "dead" task
will run again, do the `fake_volatile' thing in do_exit()
and try to go zombie again.

It would be very interesting to change the test in
schedule():

sti();
-   if (tq_scheduler)
+   if (tq_scheduler && current->state != TASK_ZOMBIE)
goto handle_tq_scheduler;

It's all rather unpleasant, and tq_scheduler was killed
in 2.4.  I suggest you take a look at all the serial
drivers in 2.4, see how I converted them to use schedule_task().
Someone kindly ported schedule_task() to 2.2.recent, so you
should be able to use that in the same way.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/