Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-30 Thread Oleg Nesterov
On 06/29, Oleg Nesterov wrote:
>
> Suppose we have the tasklets T1 and T2, both are scheduled on the
> same CPU. T1 takes some spinlock LOCK.
> 
> Currently it is possible to do
> 
>   spin_lock(LOCK);
>   disable_tasklet(T2);
> 
> With this patch, the above code hangs.

I am stupid. Yes, flush_workqueue() is evil and should not be used, but
if we use workqueues, tasklet_disable() becomes might_sleep() anyway.
This is incompatible and unavoidable change.

grep, grep...

net/bluetooth/hci_core.c:hci_rx_task()

read_lock(_task_lock)
hci_event_packet()
hci_num_comp_pkts_evt()
tasklet_disable(>tx_task)

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-30 Thread Oleg Nesterov
On 06/29, Oleg Nesterov wrote:

 Suppose we have the tasklets T1 and T2, both are scheduled on the
 same CPU. T1 takes some spinlock LOCK.
 
 Currently it is possible to do
 
   spin_lock(LOCK);
   disable_tasklet(T2);
 
 With this patch, the above code hangs.

I am stupid. Yes, flush_workqueue() is evil and should not be used, but
if we use workqueues, tasklet_disable() becomes might_sleep() anyway.
This is incompatible and unavoidable change.

grep, grep...

net/bluetooth/hci_core.c:hci_rx_task()

read_lock(hci_task_lock)
hci_event_packet()
hci_num_comp_pkts_evt()
tasklet_disable(hdev-tx_task)

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

> Also, create_workqueue() is very costly. The last 2 lines should be
> reverted.

Indeed.

The result improves from 3988 nanoseconds to 3975. :-)
Actually, the difference is within statistical variance,
which is about 20 ns.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Oleg Nesterov
(the email address of Matthew Wilcox looks wrong, changed to [EMAIL PROTECTED])

On 06/29, Oleg Nesterov wrote:
>
> Steven, unless you have some objections, could you change tasklet_kill() ?
> 
> > +static inline void tasklet_kill(struct tasklet_struct *t)
> >  {
> > -   return test_bit(TASKLET_STATE_SCHED, >state);
> > +   flush_workqueue(ktaskletd_wq);
> >  }
> 
> Just change flush_workqueue(ktaskletd_wq) to cancel_work_sync(t-work).

Ugh, tasklet_disable() should be changed as well.

> @@ -84,35 +50,35 @@ static inline void tasklet_disable_nosyn
>  static inline void tasklet_disable(struct tasklet_struct *t)
>  {
> tasklet_disable_nosync(t);
> -   tasklet_unlock_wait(t);
> -   smp_mb();
> -}
> -
> -static inline void tasklet_enable(struct tasklet_struct *t)
> -{
> -   smp_mb__before_atomic_dec();
> -   atomic_dec(>count);
> +   flush_workqueue(ktaskletd_wq);
> +   /* flush_workqueue should provide us a barrier */
>  }

Suppose we have the tasklets T1 and T2, both are scheduled on the
same CPU. T1 takes some spinlock LOCK.

Currently it is possible to do

spin_lock(LOCK);
disable_tasklet(T2);

With this patch, the above code hangs.


The most simple fix is to use wait_on_work(t->work) instead of
flush_workqueue(). Currently it is static, but we can export it.
This change will speedup tasklet_disable), btw.

A better fix imho is to use cancel_work_sync() again, but this
needs some complications to preserve TASKLET_STATE_PENDING.

This in turn means that cancel_work_sync() should return "int", but
not "void". This change makes sense regardless, I'll try to make a
patch on Sunday.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Oleg Nesterov
On 06/29, Alexey Kuznetsov wrote:
> 
> > If I understand correctly, this is because tasklet_head.list is protected
> > by local_irq_save(), and t could be scheduled on another CPU, so we just
> > can't steal it, yes?
> 
> Yes. All that code is written to avoid synchronization as much as possible.

Thanks!

> 
> > If we use worqueues, we can change the semantics of tasklet_kill() so
> > that it really cancels an already scheduled tasklet.
> > 
> > The question is: would it be the wrong/good change?
> 
> If it does not add another usec to tasklet_schedule(), it would be good.

No, it won't slowdown tasklet_schedule(). Instead it will speedup tasklet_kill.


Steven, unless you have some objections, could you change tasklet_kill() ?

> +static inline void tasklet_kill(struct tasklet_struct *t)
>  {
> -   return test_bit(TASKLET_STATE_SCHED, >state);
> +   flush_workqueue(ktaskletd_wq);
>  }

Just change flush_workqueue(ktaskletd_wq) to cancel_work_sync(t-work).

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

> What changed? 

softirq remains raised for such tasklet. Old times softirq was processed
once per invocation, in schedule and on syscall exit and this was relatively
harmless. Since softirqs are very weakly moderated, it results in strong
cpu hogging. 


> And can it be fixed?

With current tasklets this can be fixed only introducing additional
synchronization cost to tasklet schedule. Not good. Better to fix
caller.

If Ingo knows about this, I guess it is already fixed in -rt tasklet
or can be fixed. They are really flexible and extensible:
-rt tasklet cost is so high, that even another thousand of cycles
will remain unnnoticed. :-)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

> If I understand correctly, this is because tasklet_head.list is protected
> by local_irq_save(), and t could be scheduled on another CPU, so we just
> can't steal it, yes?

Yes. All that code is written to avoid synchronization as much as possible.


> If we use worqueues, we can change the semantics of tasklet_kill() so
> that it really cancels an already scheduled tasklet.
> 
> The question is: would it be the wrong/good change?

If it does not add another usec to tasklet_schedule(), it would be good.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Oleg Nesterov
On 06/29, Alexey Kuznetsov wrote:
>
> > Just look at the tasklet_disable() logic.
> 
> Do not count this.

A slightly off-topic question, tasklet_kill(t) doesn't try to steal
t from tasklet_head.list if t was scheduled, but waits until t completes.

If I understand correctly, this is because tasklet_head.list is protected
by local_irq_save(), and t could be scheduled on another CPU, so we just
can't steal it, yes?

If we use worqueues, we can change the semantics of tasklet_kill() so
that it really cancels an already scheduled tasklet.

The question is: would it be the wrong/good change?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

> again, there is no reason why this couldnt be done in a hardirq context. 
> If a hardirq preempts another hardirq and the first hardirq already 
> processes the 'softnet work', you dont do it from the second one but 
> queue it with the first one. (into the already existing 
> sd->completion_queue for tx work or queue->poll_list for rx work) It 
> would be a simple additional flag in softnet_data.

This is kind of obvious. It is just description of softnet.


> once we forget about 'hardirq contexts run with irqs disabled', _there 
> is just no technological point for softirqs_. They are an unnecessary 
> abstraction!

The first paragraph describes softirq, nothing else.

I have already understood when you say "technological",
you mean "terminological". "softirq" is just a term to describe
"softnet" workflow in an intelligible way. Call it from inside
irq handler, rather than in irq_exit, this changes _NOTHING_.

I understood that you describe original pre-historic
softnet model. You just want to replace softirq run at irq_exit
with an explicit soft{net,scsi,whatever}_call, which could
execute immediately or can be queued for later. I hope I am wrong,
because this is... mmm... not a progress.


> -rt, local_bh_disable() is a NOP there. How is it done?
...
> Are we talking about the very same thing perhaps, just from a different 
> angle? ;-)

When talking about softnet, yes.

No, when talking about "implementing non-reentrancy via another,
more flexible mechanism". We are not on the same page.
I am afraid even the books are different. :-)

I need to think about this and really read -rt code, this sounds so crazy
that it can be even correct.

Timeout, we are far out of topic anyway.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

> Not a very accurate measurement (jiffies that is).

Believe me or not, but the measurement has nanosecond precision.


> Since the work queue *is* a thread, you are running a busy loop here. Even
> though you call schedule, this thread still may have quota available, and
> will not yeild to the work queue.  Unless ofcourse this caller is of lower
> priority. But even then, I'm not sure how quickly the schedule would
> choose the work queue.

Instantly. That's why cnt is printed. Unless cnt==100, the result
is invalid.


> get it into -mm)  But the thing was is that tasklets IMHO are over used.

You preach to a choir. :-)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Oleg Nesterov
On 06/29, Steven Rostedt wrote:
> 
> On Fri, 29 Jun 2007, Alexey Kuznetsov wrote:
> >
> > static void measure_workqueue(void)
> > {
> > int i;
> > int cnt = 0;
> > unsigned long start;
> > DECLARE_WORK(test, do_test_wq, 0);
> > struct workqueue_struct * wq;
> >
> > start = jiffies;
> >
> > wq = create_workqueue("testq");

Also, create_workqueue() is very costly. The last 2 lines should be
reverted.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Jeff Garzik

Steven Rostedt wrote:

I had very little hope for this magic switch to get into mainline. (maybe
get it into -mm)  But the thing was is that tasklets IMHO are over used.
As Ingo said, there are probably only 2 or 3 places in the kernel that a
a switch to work queue conversion couldn't solve.


This is purely a guess, backed by zero evidence.

These network drivers were hand-tuned to use tasklets.  Sure it will 
WORK as a workqueue, but that says nothing equivalence.




Those places could then
probably be solved by a different design (yes that would take work).


Network driver patches welcome :)



Tasklets are there because there
wasn't work queues or kthreads at the time of solving the solution that
tasklets solved.


Completely false, at least in network driver land.  Threads existed and 
were used (proof: 8139too, among others).


Kernel threads were not used for hot path network packet shovelling 
because they were too damn slow.  Tasklets were single-threaded, fast, 
simple and immediate.  Workqueues today are simple and almost-fast.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Duncan Sands
> Old days that was acceptable, you had not gazillion of attempts
> but just a few, but since some time (also old already) it became
> disasterous.

What changed?  And can it be fixed?

Thanks,

Duncan.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Ingo Molnar

* Alexey Kuznetsov <[EMAIL PROTECTED]> wrote:

> > as i said above (see the underlined sentence), hardirq contexts 
> > already run just fine with hardirqs enabled.
> 
> RENTRANCY PROTECTION! If does not matter _how_ they run, it matters 
> what context they preempt and what that context has to make to prevent 
> that preemption. [...]

again, there is no reason why this couldnt be done in a hardirq context. 
If a hardirq preempts another hardirq and the first hardirq already 
processes the 'softnet work', you dont do it from the second one but 
queue it with the first one. (into the already existing 
sd->completion_queue for tx work or queue->poll_list for rx work) It 
would be a simple additional flag in softnet_data.

once we forget about 'hardirq contexts run with irqs disabled', _there 
is just no technological point for softirqs_. They are an unnecessary 
abstraction!

once we concede that point, reentrancy protection does not have to be 
done via local_bh_disable(). For example we run just fine without it in 
-rt, local_bh_disable() is a NOP there. How is it done? By controlling 
execution of the _workflow_ that a softirq does. By implementing 
non-reentrancy via another, more flexible mechanism. (and by carefully 
fixing a few _other_, non-workflow assumptions that softnet does/did, 
such as the per-cpu-ness of softnet_data.)

Are we talking about the very same thing perhaps, just from a different 
angle? ;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Steven Rostedt

On Fri, 29 Jun 2007, Alexey Kuznetsov wrote:
> Hello!
>
> > I find the 4usecs cost on a P4 interesting and a bit too high - how did
> > you measure it?
>
> Simple and stupid:

Noted ;-)

> static void measure_tasklet0(void)
> {
>   int i;
>   int cnt = 0;
>   DECLARE_TASKLET(test, do_test, 0);
>   unsigned long start = jiffies;

Not a very accurate measurement (jiffies that is).

>
>   for (i=0; i<100; i++) {
>   flag = 0;
>   local_bh_disable();
>   tasklet_schedule();
>   local_bh_enable();
>   while (flag == 0) {
>   schedule();
>   cnt++;
>   } /*while (flag == 0)*/;
>   }
>   printk("tasklet0: %lu %d\n", jiffies - start, cnt);
> }
>

[...]

>
> static void measure_workqueue(void)
> {
>   int i;
>   int cnt = 0;
>   unsigned long start;
>   DECLARE_WORK(test, do_test_wq, 0);
>   struct workqueue_struct * wq;
>
>   start = jiffies;
>
>   wq = create_workqueue("testq");
>
>   for (i=0; i<100; i++) {
>   flag = 0;
>   queue_work(wq, );
>   do {
>   schedule();

Since the work queue *is* a thread, you are running a busy loop here. Even
though you call schedule, this thread still may have quota available, and
will not yeild to the work queue.  Unless ofcourse this caller is of lower
priority. But even then, I'm not sure how quickly the schedule would
choose the work queue.



 >  cnt++;
>   } while (flag == 0);
>   }
>   printk("wq: %lu %d\n", jiffies - start, cnt);
>   destroy_workqueue(wq);
> }
>
>
>

> and is easy to remove.
>
> It is sad that some usb drivers started to use this creepy and
> useless thing.
>
>
> > also, the "be afraid of the hardirq or the process context" mantra is
> > overblown as well. If something is too heavy for a hardirq, _it's too
> > heavy for a tasklet too_. Most hardirqs are (or should be) running with
> > interrupts enabled, which makes their difference to softirqs miniscule.
>
> Incorrect.
>
> The difference between softirqs and hardirqs lays not in their "heavyness".
> It is in reentrancy protection, which has to be done with local_irq_disable(),
> unless networking is not isolated from hardirqs. That's all.
> Networking is too hairy to allow to be executed with disabled hardirqs.
> And moving this hairyiness to process context requires
>  a little  more efforts than conversion tasklets to work 
> queues.
>

I do really want to point out something in the Subject line. **RFC**
:-)

I had very little hope for this magic switch to get into mainline. (maybe
get it into -mm)  But the thing was is that tasklets IMHO are over used.
As Ingo said, there are probably only 2 or 3 places in the kernel that a
a switch to work queue conversion couldn't solve. Those places could then
probably be solved by a different design (yes that would take work).

Tasklets are there and people will continue to use them when they
shouldn't for as long as they exist. Tasklets are there because there
wasn't work queues or kthreads at the time of solving the solution that
tasklets solved.

So if you can still keep the same performance without tasklets, I say we
get rid of them. I've also meet too many device driver writers that want
the lowest possible latency for their device, and do so by sacrificing the
latency of other things in the system that may be even more critical.

Also note, that the more tasklets we have, the higher the latency will be
for other tasklets. There's only two prios you can currently give a
tasklet, so competing devices will need to fight each other without the
admin being able to have as much control over the result.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

> I felt that three calls to tasklet_disable were better than a gazillion calls 
> to
> spin_(un)lock.

It is not better.

Actually, it also has something equivalent to spinlock inside.
It raises some flag and waits for completion of already running
tasklets (cf. spin_lock_bh). And if tasklet_schedule happens while
it is disabled, it tries to take that lock gazillion
of times until the tasklet is reenabled back.

Old days that was acceptable, you had not gazillion of attempts
but just a few, but since some time (also old already) it became
disasterous.

It is really better just to avoid calling tasklet_schedule(),
when you do not want it to be executed. :-)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

> > The difference between softirqs and hardirqs lays not in their 
> > "heavyness". It is in reentrancy protection, which has to be done with 
> > local_irq_disable(), unless networking is not isolated from hardirqs. 
> 
> i know that pretty well ;)

You forgot about this again in the next sentence. :-)


> as i said above (see the underlined sentence), hardirq contexts already 
> run just fine with hardirqs enabled.

RENTRANCY PROTECTION! If does not matter _how_ they run, it matters what
context they preempt and what that context has to make to prevent that
preemption. If you still do not get the point, make
sed -e 's/local_bh_/local_irq_/'
over net/* and kill softirqs. Everything will work just fine.
Moreover, if you deal only with a single TCP connection
(and sysctl tcp_low_latency is not set), even hardirq latency will not suck,
all real work is done at process context.


> also, network softirq locking dependencies arent all that magic or 
> complex either: they do not operate on sockets that are already locked 
> by a user context, they are per CPU and they are not preempted by 
> 'themselves', nor are they preempted by certain other softirqs (such as 
> they are not preempted by the timer softirq). Am i missing some point of 
> yours?

I would not say I understood what you wanted to say. :-)

Does my statement about sed match your view? :-)


What I know is that there is no hairy locking dependencies at all
and there is no magic there. Especially, on level of sockets.
The things, which are troublesome are various shared trees/hash tables
(e.g. socket hash tables), which are modified both by incoming network
packets and process contexts.

I have some obscure suspicion that naive dream of "realtime" folks is
to move all those "bad" things to some kernel threads and to talk
to those threads passing some messages. I hope this suspicion is wrong,
otherwise I would say: go to Mach, pals. :-(

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Duncan Sands
Hi,

> > Just look at the tasklet_disable() logic.
> 
> Do not count this.
> 
> Done this way because nobody needed that thing, except for _one_ place
> in keyboard/console driver, which was very difficult to fix that time,
> when vt code was utterly messy and not smp safe at all.
> 
> start_bh_atomic() was successfully killed, but we had to preserve analogue
> of disable_bh() with the same semantics for some time.
> It is deliberately implemented in a way, which does not impact hot paths
> and is easy to remove.
> 
> It is sad that some usb drivers started to use this creepy and
> useless thing.

the usbatm USB driver uses it in the methods for opening and closing a new 
network
connection, and on device disconnect.  Yes, tasklet_disable could be eliminated 
by
adding a spinlock.  However this would mean taking the lock every time a packet 
is
received or transmitted.  As it is, typically open occurs once, when the 
computer
boots, and close and disconnect also occur once each, when the computer shuts 
down.
I felt that three calls to tasklet_disable were better than a gazillion calls to
spin_(un)lock.  Please feel free to educate me :)

Ciao,

Duncan.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Ingo Molnar

* Alexey Kuznetsov <[EMAIL PROTECTED]> wrote:

> > also, the "be afraid of the hardirq or the process context" mantra 
> > is overblown as well. If something is too heavy for a hardirq, _it's 
> > too heavy for a tasklet too_. Most hardirqs are (or should be) 
  
> > running with interrupts enabled, which makes their difference to 

> > softirqs miniscule.
^^
> 
> Incorrect.
> 
> The difference between softirqs and hardirqs lays not in their 
> "heavyness". It is in reentrancy protection, which has to be done with 
> local_irq_disable(), unless networking is not isolated from hardirqs. 

i know that pretty well ;)

> That's all. Networking is too hairy to allow to be executed with 
> disabled hardirqs. And moving this hairyiness to process context 
> requires  a little  more efforts than conversion 
> tasklets to work queues.

as i said above (see the underlined sentence), hardirq contexts already 
run just fine with hardirqs enabled. So your dismissal of executing that 
'hairy' bit in hardirq context is not that automatically true as you 
seem to assume i think.

also, network softirq locking dependencies arent all that magic or 
complex either: they do not operate on sockets that are already locked 
by a user context, they are per CPU and they are not preempted by 
'themselves', nor are they preempted by certain other softirqs (such as 
they are not preempted by the timer softirq). Am i missing some point of 
yours?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

> I find the 4usecs cost on a P4 interesting and a bit too high - how did 
> you measure it?

Simple and stupid:

int flag;

static void do_test(unsigned long dummy)
{
flag = 1;
}

static void do_test_wq(void *dummy)
{
flag = 1;
}

static void measure_tasklet0(void)
{
int i;
int cnt = 0;
DECLARE_TASKLET(test, do_test, 0);
unsigned long start = jiffies;

for (i=0; i<100; i++) {
flag = 0;
local_bh_disable();
tasklet_schedule();
local_bh_enable();
while (flag == 0) {
schedule();
cnt++;
} /*while (flag == 0)*/;
}
printk("tasklet0: %lu %d\n", jiffies - start, cnt);
}

static void measure_tasklet1(void)
{
int i;
int cnt = 0;
DECLARE_TASKLET(test, do_test, 0);
unsigned long start = jiffies;

for (i=0; i<100; i++) {
flag = 0;
local_bh_disable();
tasklet_schedule();
local_bh_enable();
do {
schedule();
cnt++;
} while (flag == 0);
}
printk("tasklet1: %lu %d\n", jiffies - start, cnt);
}

static void measure_workqueue(void)
{
int i;
int cnt = 0;
unsigned long start;
DECLARE_WORK(test, do_test_wq, 0);
struct workqueue_struct * wq;

start = jiffies;

wq = create_workqueue("testq");

for (i=0; i<100; i++) {
flag = 0;
queue_work(wq, );
do {
schedule();
cnt++;
} while (flag == 0);
}
printk("wq: %lu %d\n", jiffies - start, cnt);
destroy_workqueue(wq);
}



> tasklet as an intermediary towards a softirq - what's the technological 
> point in such a splitup?

"... work_struct as intermediary towards a workqueue - what's the technological
point in such a splitup?" Non-sense? Yes, but it is exactly what you said. :-)

softirq is just a context and engine to run something. Exactly like
workqueue task. struct tasklet is work_struct, it is just a thing to run.


> workqueues can be per-cpu - for tasklets to be per-cpu you have to 
> open-code them into per-cpu like rcu-tasklets did

I feel I have to repeat: tasklet==work_struct, workqueue==softirq. 

Essentially, you said that workqueues "scale" in direction of increasing
amount of softirqs. This is _correct_, but the word is different: "flexible"
is the word. What's about performance,scalability blah-blah, workqueues
are definitely worse. And this is OK, you do not need to conceal this.

 This is the price, which we pay for flexibility and to niceness to realtime.

That's what should be said in adverticement notes instead of propaganda.



> Just look at the tasklet_disable() logic.

Do not count this.

Done this way because nobody needed that thing, except for _one_ place
in keyboard/console driver, which was very difficult to fix that time,
when vt code was utterly messy and not smp safe at all.

start_bh_atomic() was successfully killed, but we had to preserve analogue
of disable_bh() with the same semantics for some time.
It is deliberately implemented in a way, which does not impact hot paths
and is easy to remove.

It is sad that some usb drivers started to use this creepy and
useless thing.


> also, the "be afraid of the hardirq or the process context" mantra is 
> overblown as well. If something is too heavy for a hardirq, _it's too 
> heavy for a tasklet too_. Most hardirqs are (or should be) running with 
> interrupts enabled, which makes their difference to softirqs miniscule.

Incorrect.

The difference between softirqs and hardirqs lays not in their "heavyness".
It is in reentrancy protection, which has to be done with local_irq_disable(),
unless networking is not isolated from hardirqs. That's all.
Networking is too hairy to allow to be executed with disabled hardirqs.
And moving this hairyiness to process context requires
 a little  more efforts than conversion tasklets to work queues.


> The most scalable workloads dont involve any (or many) softirq middlemen 
> at all: you queue work straight from the hardirq context to the target 
> process context.

Do you really see something common between this Holy Grail Quest and
tasklets/workqeueus? Come on. :-)

Actually, this is step backwards. Instead of execution in correct
context, you create a new dummy context.  This is the place, where goals
of realtime and Holy Grail Quest split.


> true just as much: tasklets from a totally uninteresting network adapter 
> can kill your latency-sensitive application too.

If I started nice --22 running process I signed to killing latency
of nice 0 processes. But I did not sign for killing network/scsi adapters.
"latency-sensitive application" use real 

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

 I find the 4usecs cost on a P4 interesting and a bit too high - how did 
 you measure it?

Simple and stupid:

int flag;

static void do_test(unsigned long dummy)
{
flag = 1;
}

static void do_test_wq(void *dummy)
{
flag = 1;
}

static void measure_tasklet0(void)
{
int i;
int cnt = 0;
DECLARE_TASKLET(test, do_test, 0);
unsigned long start = jiffies;

for (i=0; i100; i++) {
flag = 0;
local_bh_disable();
tasklet_schedule(test);
local_bh_enable();
while (flag == 0) {
schedule();
cnt++;
} /*while (flag == 0)*/;
}
printk(tasklet0: %lu %d\n, jiffies - start, cnt);
}

static void measure_tasklet1(void)
{
int i;
int cnt = 0;
DECLARE_TASKLET(test, do_test, 0);
unsigned long start = jiffies;

for (i=0; i100; i++) {
flag = 0;
local_bh_disable();
tasklet_schedule(test);
local_bh_enable();
do {
schedule();
cnt++;
} while (flag == 0);
}
printk(tasklet1: %lu %d\n, jiffies - start, cnt);
}

static void measure_workqueue(void)
{
int i;
int cnt = 0;
unsigned long start;
DECLARE_WORK(test, do_test_wq, 0);
struct workqueue_struct * wq;

start = jiffies;

wq = create_workqueue(testq);

for (i=0; i100; i++) {
flag = 0;
queue_work(wq, test);
do {
schedule();
cnt++;
} while (flag == 0);
}
printk(wq: %lu %d\n, jiffies - start, cnt);
destroy_workqueue(wq);
}



 tasklet as an intermediary towards a softirq - what's the technological 
 point in such a splitup?

... work_struct as intermediary towards a workqueue - what's the technological
point in such a splitup? Non-sense? Yes, but it is exactly what you said. :-)

softirq is just a context and engine to run something. Exactly like
workqueue task. struct tasklet is work_struct, it is just a thing to run.


 workqueues can be per-cpu - for tasklets to be per-cpu you have to 
 open-code them into per-cpu like rcu-tasklets did

I feel I have to repeat: tasklet==work_struct, workqueue==softirq. 

Essentially, you said that workqueues scale in direction of increasing
amount of softirqs. This is _correct_, but the word is different: flexible
is the word. What's about performance,scalability blah-blah, workqueues
are definitely worse. And this is OK, you do not need to conceal this.

 This is the price, which we pay for flexibility and to niceness to realtime.

That's what should be said in adverticement notes instead of propaganda.



 Just look at the tasklet_disable() logic.

Do not count this.

Done this way because nobody needed that thing, except for _one_ place
in keyboard/console driver, which was very difficult to fix that time,
when vt code was utterly messy and not smp safe at all.

start_bh_atomic() was successfully killed, but we had to preserve analogue
of disable_bh() with the same semantics for some time.
It is deliberately implemented in a way, which does not impact hot paths
and is easy to remove.

It is sad that some usb drivers started to use this creepy and
useless thing.


 also, the be afraid of the hardirq or the process context mantra is 
 overblown as well. If something is too heavy for a hardirq, _it's too 
 heavy for a tasklet too_. Most hardirqs are (or should be) running with 
 interrupts enabled, which makes their difference to softirqs miniscule.

Incorrect.

The difference between softirqs and hardirqs lays not in their heavyness.
It is in reentrancy protection, which has to be done with local_irq_disable(),
unless networking is not isolated from hardirqs. That's all.
Networking is too hairy to allow to be executed with disabled hardirqs.
And moving this hairyiness to process context requires
irony mode a little / more efforts than conversion tasklets to work queues.


 The most scalable workloads dont involve any (or many) softirq middlemen 
 at all: you queue work straight from the hardirq context to the target 
 process context.

Do you really see something common between this Holy Grail Quest and
tasklets/workqeueus? Come on. :-)

Actually, this is step backwards. Instead of execution in correct
context, you create a new dummy context.  This is the place, where goals
of realtime and Holy Grail Quest split.


 true just as much: tasklets from a totally uninteresting network adapter 
 can kill your latency-sensitive application too.

If I started nice --22 running process I signed to killing latency
of nice 0 processes. But I did not sign for killing network/scsi adapters.
latency-sensitive application use real time priority as 

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Duncan Sands
Hi,

  Just look at the tasklet_disable() logic.
 
 Do not count this.
 
 Done this way because nobody needed that thing, except for _one_ place
 in keyboard/console driver, which was very difficult to fix that time,
 when vt code was utterly messy and not smp safe at all.
 
 start_bh_atomic() was successfully killed, but we had to preserve analogue
 of disable_bh() with the same semantics for some time.
 It is deliberately implemented in a way, which does not impact hot paths
 and is easy to remove.
 
 It is sad that some usb drivers started to use this creepy and
 useless thing.

the usbatm USB driver uses it in the methods for opening and closing a new 
network
connection, and on device disconnect.  Yes, tasklet_disable could be eliminated 
by
adding a spinlock.  However this would mean taking the lock every time a packet 
is
received or transmitted.  As it is, typically open occurs once, when the 
computer
boots, and close and disconnect also occur once each, when the computer shuts 
down.
I felt that three calls to tasklet_disable were better than a gazillion calls to
spin_(un)lock.  Please feel free to educate me :)

Ciao,

Duncan.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Ingo Molnar

* Alexey Kuznetsov [EMAIL PROTECTED] wrote:

  also, the be afraid of the hardirq or the process context mantra 
  is overblown as well. If something is too heavy for a hardirq, _it's 
  too heavy for a tasklet too_. Most hardirqs are (or should be) 
  
  running with interrupts enabled, which makes their difference to 

  softirqs miniscule.
^^
 
 Incorrect.
 
 The difference between softirqs and hardirqs lays not in their 
 heavyness. It is in reentrancy protection, which has to be done with 
 local_irq_disable(), unless networking is not isolated from hardirqs. 

i know that pretty well ;)

 That's all. Networking is too hairy to allow to be executed with 
 disabled hardirqs. And moving this hairyiness to process context 
 requires irony mode a little / more efforts than conversion 
 tasklets to work queues.

as i said above (see the underlined sentence), hardirq contexts already 
run just fine with hardirqs enabled. So your dismissal of executing that 
'hairy' bit in hardirq context is not that automatically true as you 
seem to assume i think.

also, network softirq locking dependencies arent all that magic or 
complex either: they do not operate on sockets that are already locked 
by a user context, they are per CPU and they are not preempted by 
'themselves', nor are they preempted by certain other softirqs (such as 
they are not preempted by the timer softirq). Am i missing some point of 
yours?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

 I felt that three calls to tasklet_disable were better than a gazillion calls 
 to
 spin_(un)lock.

It is not better.

Actually, it also has something equivalent to spinlock inside.
It raises some flag and waits for completion of already running
tasklets (cf. spin_lock_bh). And if tasklet_schedule happens while
it is disabled, it tries to take that lock gazillion
of times until the tasklet is reenabled back.

Old days that was acceptable, you had not gazillion of attempts
but just a few, but since some time (also old already) it became
disasterous.

It is really better just to avoid calling tasklet_schedule(),
when you do not want it to be executed. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Steven Rostedt

On Fri, 29 Jun 2007, Alexey Kuznetsov wrote:
 Hello!

  I find the 4usecs cost on a P4 interesting and a bit too high - how did
  you measure it?

 Simple and stupid:

Noted ;-)

 static void measure_tasklet0(void)
 {
   int i;
   int cnt = 0;
   DECLARE_TASKLET(test, do_test, 0);
   unsigned long start = jiffies;

Not a very accurate measurement (jiffies that is).


   for (i=0; i100; i++) {
   flag = 0;
   local_bh_disable();
   tasklet_schedule(test);
   local_bh_enable();
   while (flag == 0) {
   schedule();
   cnt++;
   } /*while (flag == 0)*/;
   }
   printk(tasklet0: %lu %d\n, jiffies - start, cnt);
 }


[...]


 static void measure_workqueue(void)
 {
   int i;
   int cnt = 0;
   unsigned long start;
   DECLARE_WORK(test, do_test_wq, 0);
   struct workqueue_struct * wq;

   start = jiffies;

   wq = create_workqueue(testq);

   for (i=0; i100; i++) {
   flag = 0;
   queue_work(wq, test);
   do {
   schedule();

Since the work queue *is* a thread, you are running a busy loop here. Even
though you call schedule, this thread still may have quota available, and
will not yeild to the work queue.  Unless ofcourse this caller is of lower
priority. But even then, I'm not sure how quickly the schedule would
choose the work queue.



   cnt++;
   } while (flag == 0);
   }
   printk(wq: %lu %d\n, jiffies - start, cnt);
   destroy_workqueue(wq);
 }




 and is easy to remove.

 It is sad that some usb drivers started to use this creepy and
 useless thing.


  also, the be afraid of the hardirq or the process context mantra is
  overblown as well. If something is too heavy for a hardirq, _it's too
  heavy for a tasklet too_. Most hardirqs are (or should be) running with
  interrupts enabled, which makes their difference to softirqs miniscule.

 Incorrect.

 The difference between softirqs and hardirqs lays not in their heavyness.
 It is in reentrancy protection, which has to be done with local_irq_disable(),
 unless networking is not isolated from hardirqs. That's all.
 Networking is too hairy to allow to be executed with disabled hardirqs.
 And moving this hairyiness to process context requires
 irony mode a little / more efforts than conversion tasklets to work 
 queues.


I do really want to point out something in the Subject line. **RFC**
:-)

I had very little hope for this magic switch to get into mainline. (maybe
get it into -mm)  But the thing was is that tasklets IMHO are over used.
As Ingo said, there are probably only 2 or 3 places in the kernel that a
a switch to work queue conversion couldn't solve. Those places could then
probably be solved by a different design (yes that would take work).

Tasklets are there and people will continue to use them when they
shouldn't for as long as they exist. Tasklets are there because there
wasn't work queues or kthreads at the time of solving the solution that
tasklets solved.

So if you can still keep the same performance without tasklets, I say we
get rid of them. I've also meet too many device driver writers that want
the lowest possible latency for their device, and do so by sacrificing the
latency of other things in the system that may be even more critical.

Also note, that the more tasklets we have, the higher the latency will be
for other tasklets. There's only two prios you can currently give a
tasklet, so competing devices will need to fight each other without the
admin being able to have as much control over the result.

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Ingo Molnar

* Alexey Kuznetsov [EMAIL PROTECTED] wrote:

  as i said above (see the underlined sentence), hardirq contexts 
  already run just fine with hardirqs enabled.
 
 RENTRANCY PROTECTION! If does not matter _how_ they run, it matters 
 what context they preempt and what that context has to make to prevent 
 that preemption. [...]

again, there is no reason why this couldnt be done in a hardirq context. 
If a hardirq preempts another hardirq and the first hardirq already 
processes the 'softnet work', you dont do it from the second one but 
queue it with the first one. (into the already existing 
sd-completion_queue for tx work or queue-poll_list for rx work) It 
would be a simple additional flag in softnet_data.

once we forget about 'hardirq contexts run with irqs disabled', _there 
is just no technological point for softirqs_. They are an unnecessary 
abstraction!

once we concede that point, reentrancy protection does not have to be 
done via local_bh_disable(). For example we run just fine without it in 
-rt, local_bh_disable() is a NOP there. How is it done? By controlling 
execution of the _workflow_ that a softirq does. By implementing 
non-reentrancy via another, more flexible mechanism. (and by carefully 
fixing a few _other_, non-workflow assumptions that softnet does/did, 
such as the per-cpu-ness of softnet_data.)

Are we talking about the very same thing perhaps, just from a different 
angle? ;-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

  The difference between softirqs and hardirqs lays not in their 
  heavyness. It is in reentrancy protection, which has to be done with 
  local_irq_disable(), unless networking is not isolated from hardirqs. 
 
 i know that pretty well ;)

You forgot about this again in the next sentence. :-)


 as i said above (see the underlined sentence), hardirq contexts already 
 run just fine with hardirqs enabled.

RENTRANCY PROTECTION! If does not matter _how_ they run, it matters what
context they preempt and what that context has to make to prevent that
preemption. If you still do not get the point, make
sed -e 's/local_bh_/local_irq_/'
over net/* and kill softirqs. Everything will work just fine.
Moreover, if you deal only with a single TCP connection
(and sysctl tcp_low_latency is not set), even hardirq latency will not suck,
all real work is done at process context.


 also, network softirq locking dependencies arent all that magic or 
 complex either: they do not operate on sockets that are already locked 
 by a user context, they are per CPU and they are not preempted by 
 'themselves', nor are they preempted by certain other softirqs (such as 
 they are not preempted by the timer softirq). Am i missing some point of 
 yours?

I would not say I understood what you wanted to say. :-)

Does my statement about sed match your view? :-)


What I know is that there is no hairy locking dependencies at all
and there is no magic there. Especially, on level of sockets.
The things, which are troublesome are various shared trees/hash tables
(e.g. socket hash tables), which are modified both by incoming network
packets and process contexts.

I have some obscure suspicion that naive dream of realtime folks is
to move all those bad things to some kernel threads and to talk
to those threads passing some messages. I hope this suspicion is wrong,
otherwise I would say: go to Mach, pals. :-(

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Oleg Nesterov
On 06/29, Steven Rostedt wrote:
 
 On Fri, 29 Jun 2007, Alexey Kuznetsov wrote:
 
  static void measure_workqueue(void)
  {
  int i;
  int cnt = 0;
  unsigned long start;
  DECLARE_WORK(test, do_test_wq, 0);
  struct workqueue_struct * wq;
 
  start = jiffies;
 
  wq = create_workqueue(testq);

Also, create_workqueue() is very costly. The last 2 lines should be
reverted.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Duncan Sands
 Old days that was acceptable, you had not gazillion of attempts
 but just a few, but since some time (also old already) it became
 disasterous.

What changed?  And can it be fixed?

Thanks,

Duncan.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Jeff Garzik

Steven Rostedt wrote:

I had very little hope for this magic switch to get into mainline. (maybe
get it into -mm)  But the thing was is that tasklets IMHO are over used.
As Ingo said, there are probably only 2 or 3 places in the kernel that a
a switch to work queue conversion couldn't solve.


This is purely a guess, backed by zero evidence.

These network drivers were hand-tuned to use tasklets.  Sure it will 
WORK as a workqueue, but that says nothing equivalence.




Those places could then
probably be solved by a different design (yes that would take work).


Network driver patches welcome :)



Tasklets are there because there
wasn't work queues or kthreads at the time of solving the solution that
tasklets solved.


Completely false, at least in network driver land.  Threads existed and 
were used (proof: 8139too, among others).


Kernel threads were not used for hot path network packet shovelling 
because they were too damn slow.  Tasklets were single-threaded, fast, 
simple and immediate.  Workqueues today are simple and almost-fast.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

 Not a very accurate measurement (jiffies that is).

Believe me or not, but the measurement has nanosecond precision.


 Since the work queue *is* a thread, you are running a busy loop here. Even
 though you call schedule, this thread still may have quota available, and
 will not yeild to the work queue.  Unless ofcourse this caller is of lower
 priority. But even then, I'm not sure how quickly the schedule would
 choose the work queue.

Instantly. That's why cnt is printed. Unless cnt==100, the result
is invalid.


 get it into -mm)  But the thing was is that tasklets IMHO are over used.

You preach to a choir. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

 again, there is no reason why this couldnt be done in a hardirq context. 
 If a hardirq preempts another hardirq and the first hardirq already 
 processes the 'softnet work', you dont do it from the second one but 
 queue it with the first one. (into the already existing 
 sd-completion_queue for tx work or queue-poll_list for rx work) It 
 would be a simple additional flag in softnet_data.

This is kind of obvious. It is just description of softnet.


 once we forget about 'hardirq contexts run with irqs disabled', _there 
 is just no technological point for softirqs_. They are an unnecessary 
 abstraction!

The first paragraph describes softirq, nothing else.

I have already understood when you say technological,
you mean terminological. softirq is just a term to describe
softnet workflow in an intelligible way. Call it from inside
irq handler, rather than in irq_exit, this changes _NOTHING_.

I understood that you describe original pre-historic
softnet model. You just want to replace softirq run at irq_exit
with an explicit soft{net,scsi,whatever}_call, which could
execute immediately or can be queued for later. I hope I am wrong,
because this is... mmm... not a progress.


 -rt, local_bh_disable() is a NOP there. How is it done?
...
 Are we talking about the very same thing perhaps, just from a different 
 angle? ;-)

When talking about softnet, yes.

No, when talking about implementing non-reentrancy via another,
more flexible mechanism. We are not on the same page.
I am afraid even the books are different. :-)

I need to think about this and really read -rt code, this sounds so crazy
that it can be even correct.

Timeout, we are far out of topic anyway.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Oleg Nesterov
On 06/29, Alexey Kuznetsov wrote:

  Just look at the tasklet_disable() logic.
 
 Do not count this.

A slightly off-topic question, tasklet_kill(t) doesn't try to steal
t from tasklet_head.list if t was scheduled, but waits until t completes.

If I understand correctly, this is because tasklet_head.list is protected
by local_irq_save(), and t could be scheduled on another CPU, so we just
can't steal it, yes?

If we use worqueues, we can change the semantics of tasklet_kill() so
that it really cancels an already scheduled tasklet.

The question is: would it be the wrong/good change?

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

 If I understand correctly, this is because tasklet_head.list is protected
 by local_irq_save(), and t could be scheduled on another CPU, so we just
 can't steal it, yes?

Yes. All that code is written to avoid synchronization as much as possible.


 If we use worqueues, we can change the semantics of tasklet_kill() so
 that it really cancels an already scheduled tasklet.
 
 The question is: would it be the wrong/good change?

If it does not add another usec to tasklet_schedule(), it would be good.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Oleg Nesterov
On 06/29, Alexey Kuznetsov wrote:
 
  If I understand correctly, this is because tasklet_head.list is protected
  by local_irq_save(), and t could be scheduled on another CPU, so we just
  can't steal it, yes?
 
 Yes. All that code is written to avoid synchronization as much as possible.

Thanks!

 
  If we use worqueues, we can change the semantics of tasklet_kill() so
  that it really cancels an already scheduled tasklet.
  
  The question is: would it be the wrong/good change?
 
 If it does not add another usec to tasklet_schedule(), it would be good.

No, it won't slowdown tasklet_schedule(). Instead it will speedup tasklet_kill.


Steven, unless you have some objections, could you change tasklet_kill() ?

 +static inline void tasklet_kill(struct tasklet_struct *t)
  {
 -   return test_bit(TASKLET_STATE_SCHED, t-state);
 +   flush_workqueue(ktaskletd_wq);
  }

Just change flush_workqueue(ktaskletd_wq) to cancel_work_sync(t-work).

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

 What changed? 

softirq remains raised for such tasklet. Old times softirq was processed
once per invocation, in schedule and on syscall exit and this was relatively
harmless. Since softirqs are very weakly moderated, it results in strong
cpu hogging. 


 And can it be fixed?

With current tasklets this can be fixed only introducing additional
synchronization cost to tasklet schedule. Not good. Better to fix
caller.

If Ingo knows about this, I guess it is already fixed in -rt tasklet
or can be fixed. They are really flexible and extensible:
-rt tasklet cost is so high, that even another thousand of cycles
will remain unnnoticed. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Oleg Nesterov
(the email address of Matthew Wilcox looks wrong, changed to [EMAIL PROTECTED])

On 06/29, Oleg Nesterov wrote:

 Steven, unless you have some objections, could you change tasklet_kill() ?
 
  +static inline void tasklet_kill(struct tasklet_struct *t)
   {
  -   return test_bit(TASKLET_STATE_SCHED, t-state);
  +   flush_workqueue(ktaskletd_wq);
   }
 
 Just change flush_workqueue(ktaskletd_wq) to cancel_work_sync(t-work).

Ugh, tasklet_disable() should be changed as well.

 @@ -84,35 +50,35 @@ static inline void tasklet_disable_nosyn
  static inline void tasklet_disable(struct tasklet_struct *t)
  {
 tasklet_disable_nosync(t);
 -   tasklet_unlock_wait(t);
 -   smp_mb();
 -}
 -
 -static inline void tasklet_enable(struct tasklet_struct *t)
 -{
 -   smp_mb__before_atomic_dec();
 -   atomic_dec(t-count);
 +   flush_workqueue(ktaskletd_wq);
 +   /* flush_workqueue should provide us a barrier */
  }

Suppose we have the tasklets T1 and T2, both are scheduled on the
same CPU. T1 takes some spinlock LOCK.

Currently it is possible to do

spin_lock(LOCK);
disable_tasklet(T2);

With this patch, the above code hangs.


The most simple fix is to use wait_on_work(t-work) instead of
flush_workqueue(). Currently it is static, but we can export it.
This change will speedup tasklet_disable), btw.

A better fix imho is to use cancel_work_sync() again, but this
needs some complications to preserve TASKLET_STATE_PENDING.

This in turn means that cancel_work_sync() should return int, but
not void. This change makes sense regardless, I'll try to make a
patch on Sunday.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello!

 Also, create_workqueue() is very costly. The last 2 lines should be
 reverted.

Indeed.

The result improves from 3988 nanoseconds to 3975. :-)
Actually, the difference is within statistical variance,
which is about 20 ns.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt


--

>
> But I guess there is a reason it is still marked experimental...
>
> iq81340mc:/data_dir# ./md0_verify.sh
> kernel BUG at mm/page_alloc.c:363!

Well, at least this uncovered something :-)

I'll look into this too.

-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt

On Thu, 28 Jun 2007, Dan Williams wrote:

> > CONFIG_PREEMPT?
> >
> Everything thus far has been CONFIG_PREEMPT=n (the default for this platform).
>
> With CONFIG_PREEMPT=y the resync is back in the 50MB/s range.

So with upping the prio for the work queue you got back your performance?

>
> [iop-adma: hi-prio workqueue, CONFIG_PREEMPT=y]
> iq81340mc:~# cat /proc/mdstat
> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
>   468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>   [=>...]  recovery =  5.8% (9136404/156290816)
> finish=46.1min speed=53161K/sec
>
> The tasklet configuration stays in 50MB/s ballpark, and the default
> priority (nice -5) workqueue case remains in the 30's with
> CONFIG_PREEMPT=n.

[noted: should be CONFIG_PREEMPT=y]

This is expected. Seems you may have otherthings running at a higher prio.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Dan Williams

On 6/28/07, Dan Williams <[EMAIL PROTECTED]> wrote:

Everything thus far has been CONFIG_PREEMPT=n (the default for this platform).

With CONFIG_PREEMPT=y the resync is back in the 50MB/s range.

[iop-adma: hi-prio workqueue, CONFIG_PREEMPT=y]
iq81340mc:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
  468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
  [=>...]  recovery =  5.8% (9136404/156290816)
finish=46.1min speed=53161K/sec

The tasklet configuration stays in 50MB/s ballpark, and the default
priority (nice -5) workqueue case remains in the 30's with
CONFIG_PREEMPT=n.


That last line should be CONFIG_PREEMPT=y

But I guess there is a reason it is still marked experimental...

iq81340mc:/data_dir# ./md0_verify.sh
kernel BUG at mm/page_alloc.c:363!
Unable to handle kernel NULL pointer dereference at virtual address 
pgd = b29b8000
[] *pgd=
Internal error: Oops: 805 [#1] PREEMPT
Modules linked in:
CPU: 0Not tainted  (2.6.22-rc6 #144)
PC is at __bug+0x20/0x2c
LR is at 0x403f37a0
pc : [<4002ef08>]lr : [<403f37a0>]psr: 6093
sp : b28a3cc8  ip : 01d8  fp : b28a3cd4
r10: 0001  r9 : 0001  r8 : 
r7 : 004f  r6 : ffe0  r5 : 412919c0  r4 : 412919e0
r3 :   r2 : 403f37a0  r1 : 6093  r0 : 0026
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  Segment user
Control: 0400397f  Table: 729b8018  DAC: 0015
Process md0_verify_kern (pid: 4188, stack limit = 0xb28a2258)
Stack: (0xb28a3cc8 to 0xb28a4000)
3cc0:   b28a3d1c b28a3cd8 40070a90 4002eef4 b28a2000 403f6818
3ce0: 403f682c 4013 403f6838 001f 0010 4129c3c0 2013 15573000
3d00: 4129c3c0  b28a4550  b28a3d2c b28a3d20 40070b54 400706b8
3d20: b28a3d44 b28a3d30 40073e7c 40070b4c 4129c3c0 15574000 b28a3d58 b28a3d48
3d40: 40084734 40073e34 b3e0fdcc b28a3dcc b28a3d5c 40079cd8 400846f8 
3d60:  b39a4860 b28a3ddc  0001 15574000 b28a2000 4040d548
3d80: b28a4550 b3357240 72d9e0ff   15573fff 3ffe 15574000
3da0: 15573fff 4040d548 b28a3ddc b2c15320 b28a2000  b333cc00 b3357240
3dc0: b28a3e04 b28a3dd0 4007d724 40079844 b28a3dd8  0015 4040d548
3de0: b3357240 b28f6920 b28a2000 b3357240 b3357240 b3a4f9c0 b28a3e18 b28a3e08
3e00: 4003f2d8 4007d6b4 b3a462e0 b28a3e5c b28a3e1c 40095e70 4003f2a0 
3e20:  b28a3e3c 0013 b28a3e5c b28a3e3c b333cc10 b3c15cc0 b333cc00
3e40: 0080 b28a3fb0   b28a3f2c b28a3e60 400c8270 400958f0
3e60:  b28a3ec8 b28a2000 b3c15c00 0003  b29e54a0 0014
3e80: 0044 b38b4f00 0002   0001 403f6aac 403f6aac
3ea0: b333cc7c b3a462e0 000200d2  403f6aa8 b28a2000 b333cd30 b28a3f08
3ec0: b28a3ecc 40070c38 4006fdbc 000200d2 403f6aac 0010 41295a60 b28a2000
3ee0: b333cc7c  b333cc00 b333cc00 0003 0001fff3 004f 004f
3f00: b333cc00 403f72a8 b28a2000 400c7e60  b333cc00 b28a3fb0 fffe
3f20: b28a3f5c b28a3f30 40095054 400c7e6c  b4503000 000bf408 b333cc00
3f40:  000bf628 b28a2000 b28a3fb0 b28a3f84 b28a3f60 40096c3c 40094f58
3f60: b4503000 000bf408 b28a3fb0 b4503000 4002b0e4 156da000 b28a3fa4 b28a3f88
3f80: 4002e3cc 40096b24 000bf408 000bf628 000bf788 000b  b28a3fa8
3fa0: 4002af20 4002e39c 000bf408 000bf628 000bf788 000bf628 000bf408 
3fc0: 000bf408 000bf628 000bf788  000bf628 000bf408 156da000 000be848
3fe0: 15651550 3eb7e4ac 0002b4f0 1565158c 6010 000bf788  
Backtrace:
[<4002eee8>] (__bug+0x0/0x2c) from [<40070a90>] (free_hot_cold_page+0x3e4/0x434)
[<400706ac>] (free_hot_cold_page+0x0/0x434) from [<40070b54>]
(free_hot_page+0x14/0x18)
[<40070b40>] (free_hot_page+0x0/0x18) from [<40073e7c>] (put_page+0x54/0x174)
[<40073e28>] (put_page+0x0/0x174) from [<40084734>]
(free_page_and_swap_cache+0x48/0x64)
r5:15574000 r4:4129c3c0
[<400846ec>] (free_page_and_swap_cache+0x0/0x64) from [<40079cd8>]
(unmap_vmas+0x4a0/0x67c)
r4:b3e0fdcc
[<40079838>] (unmap_vmas+0x0/0x67c) from [<4007d724>] (exit_mmap+0x7c/0x158)
[<4007d6a8>] (exit_mmap+0x0/0x158) from [<4003f2d8>] (mmput+0x44/0x100)
[<4003f294>] (mmput+0x0/0x100) from [<40095e70>] (flush_old_exec+0x58c/0x9d8)
r4:b3a462e0
[<400958e4>] (flush_old_exec+0x0/0x9d8) from [<400c8270>]
(load_elf_binary+0x410/0x18fc)
[<400c7e60>] (load_elf_binary+0x0/0x18fc) from [<40095054>]
(search_binary_handler+0x108/0x2b4)
[<40094f4c>] (search_binary_handler+0x0/0x2b4) from [<40096c3c>]
(do_execve+0x124/0x1e4)
[<40096b18>] (do_execve+0x0/0x1e4) from [<4002e3cc>] (sys_execve+0x3c/0x5c)
[<4002e390>] (sys_execve+0x0/0x5c) from [<4002af20>] (ret_fast_syscall+0x0/0x3c)
r7:000b r6:000bf788 r5:000bf628 r4:000bf408
Code: e1a01000 e59f000c eb004d60 e3a03000 (e5833000)
note: md0_verify_kern[4188] exited with preempt_count 4
-
To unsubscribe from this list: send the line 

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Dan Williams

On 6/28/07, Steven Rostedt <[EMAIL PROTECTED]> wrote:

On Thu, 28 Jun 2007, Dan Williams wrote:
> >
> Unfortunately setting the thread to real time priority makes
> throughput slightly worse.  Instead of floating around 35MB/s the
> resync speed is stuck around 30MB/s:

That is really strange. If you higher the prio of the work queue it
gets worst?  Something really strange is happening here?  Are you using
CONFIG_PREEMPT?


Everything thus far has been CONFIG_PREEMPT=n (the default for this platform).

With CONFIG_PREEMPT=y the resync is back in the 50MB/s range.

[iop-adma: hi-prio workqueue, CONFIG_PREEMPT=y]
iq81340mc:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
 468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
 [=>...]  recovery =  5.8% (9136404/156290816)
finish=46.1min speed=53161K/sec

The tasklet configuration stays in 50MB/s ballpark, and the default
priority (nice -5) workqueue case remains in the 30's with
CONFIG_PREEMPT=n.

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt
On Thu, 28 Jun 2007, Dan Williams wrote:
> >
> Unfortunately setting the thread to real time priority makes
> throughput slightly worse.  Instead of floating around 35MB/s the
> resync speed is stuck around 30MB/s:

That is really strange. If you higher the prio of the work queue it
gets worst?  Something really strange is happening here?  Are you using
CONFIG_PREEMPT?

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Thu, 28 Jun 2007 18:00:01 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> >  with 1.2 usecs and 10,000 
> > irqs/sec the cost is 1.2 msecs/sec, or 0.1%.
> 
> off-by-10 error.

yeah, indeed - 12 msecs and 1.2% :-/

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Andrew Morton
On Thu, 28 Jun 2007 18:00:01 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote:

>  with 1.2 usecs and 10,000 
> irqs/sec the cost is 1.2 msecs/sec, or 0.1%.

off-by-10 error.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Dan Williams

On 6/28/07, Steven Rostedt <[EMAIL PROTECTED]> wrote:


Hi Dan,

On Mon, 25 Jun 2007, Dan Williams wrote:

> Yes you are right, ARM does not flush L1 when prev==next in switch_mm.
>
> > Perhaps something else is at fault here.
> >
> I'll try and dig a bit deeper...

BTW:

 static int __init iop_adma_init (void)
 {
+   iop_adma_workqueue = create_workqueue("iop-adma");
+   if (!iop_adma_workqueue)
+   return -ENODEV;
+

Could you also try upping the prio of all the "iop-adma" threads?


Unfortunately setting the thread to real time priority makes
throughput slightly worse.  Instead of floating around 35MB/s the
resync speed is stuck around 30MB/s:

[ iop-adma: hi-prio workqueue based callbacks ]
iq81340mc:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
 468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
 [>]  recovery =  3.9% (6107424/156290816)
finish=84.9min speed=29448K/sec

The iop-adma tasklet cleans up any completed descriptors and in the
process calls any attached callbacks.  For the raid5 resync case the
callback is simply:

static void ops_complete_check(void *stripe_head_ref)
{
   struct stripe_head *sh = stripe_head_ref;
   int pd_idx = sh->pd_idx;

   pr_debug("%s: stripe %llu\n", __FUNCTION__,
   (unsigned long long)sh->sector);

   if (test_and_clear_bit(STRIPE_OP_MOD_DMA_CHECK, >ops.pending) &&
   sh->ops.zero_sum_result == 0)
   set_bit(R5_UPTODATE, >dev[pd_idx].flags);

   set_bit(STRIPE_OP_CHECK, >ops.complete);
   set_bit(STRIPE_HANDLE, >state);
   release_stripe(sh);
}


[ iop-adma: tasklet based callbacks ]
iq81340mc:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
 468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
 [=>...]  recovery =  5.1% (8024248/156290816)
finish=47.9min speed=51486K/sec

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Jeff Garzik

Ingo Molnar wrote:

But it was not me who claimed that 'workqueues are slow'.


The claim was:  slower than tasklets.


choice. I am just wondering out loud whether this particular tool, in 
its current usage pattern, makes much technological sense. My claim is: 
it could very well be that it doesnt make _much_ sense, and in that case 
we should provide a non-intrusive migration path away in terms of a 
compatible API wrapper to a saner (albeit by virtue of trying to emulate 
an existing API, slower) mechanism. The examples cited so far had the 
tasklet as an intermediary towards a softirq - what's the technological 
point in such a splitup?


I already answered that in detail.  In sum, a driver cannot define its 
own softirq.  Softirqs are not modular.


Tasklets are the closest thing to softirqs for a driver.


The most scalable workloads dont involve any (or many) softirq middlemen 
at all: you queue work straight from the hardirq context to the target 
process context. And that's what you want to do _anyway_, because you 
want to create as little locally cached data for the hardirq context, as 
the target task could easily be on another CPU. (this is generally true 
for things like block IO, but it's also true for things like network 
IO.)


the most scalable solution would be _for the network adapter to figure 
out the target CPU for the packet_.


I agree completely.  Wanna implement this?  I will kiss your feet, and 
multi-core CPU vendors will worship you as a demi-god.


Until such time, we must deal with the network stack as it exists today, 
and the network drivers as they exist and work today.



Not many (if any) such adapters 
exist at the moment. (as it would involve allocating NR_CPUs irqs to 
that adapter alone.)


Good news:  this is becoming the norm for modern NICs, especially 10Gbps.

Plenty of NICs already exist that support multiple RX rings (persumably 
one per CPU), and newer NICs will raise individual MSI[-X] interrupts 
based on the RX ring into which a packet was received.


In this area, NIC vendors are way ahead of the Linux net stack.

The Linux net stack is unfortunately not threaded enough to sanely deal 
with dividing /flows/ up across multiple CPUs, even if the NIC does 
support multiple transmit and receive queues.   [side note: initial 
multi-queue TX is being worked on, on netdev]



Tasklet is single thread by definition and purpose. Those a few places 
where people used tasklets to do per-cpu jobs (RCU f.e.) exist just 
because they had troubles with allocating new softirq. [...]


no. The following tale is the true and only history of the RCU tasklet 
;-) The RCU guys first used a tasklet, then noticed its bad scalability 
(a particular VFS-intense benchmark regressed because only a single CPU 
would do RCU completion on an 8-way box) so they switched it to a 
per-cpu tasklet - without realizing that a per-cpu tasklet is in essence 
a softirq. I pointed it out to them (years down the road ...) then the 
"convert rcu-tasklet to softirq" patch was born.


You focused on the example rather than the key phrase:  tasklet is 
single thread by definition and purpose.


Wanting to change that without analysis of the impact illustrates the 
apples-to-oranges change being proposed.



outlined above: if you want good scalability, dont use middlemen :-) 
Figure out the target task as early as possible and let it do as much of 
the remaining work as possible. _Increasing_ the amount of cached 
context (by doing delayed processing in tasklets or even softirqs on the 
same CPU where the hardirq arrived) only increases the cross-CPU cost. 
Keeping stuff in a softirq only makes (some) sense as long as you have 
no target task at all (routing, filtering, etc.).


I do not disagree with these theoretical musings :)

I care the most about the "who will do all this work?" question.  In 
network driver land, these changes impact hot paths.  I am lazy, and 
don't care to revisit each network driver hot path and carefully re-tune 
each based on this proposal.  Who is volunteering?


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Jeff Garzik

Ingo Molnar wrote:

my argument was: workqueues are more scalable than tasklets in general.


Here is my argument:  that is totally irrelevant to $subject, when it 
comes to dealing with managing existing [network driver] behavior and 
performance.


My overall objection is the attempt to replace apples with oranges.

Network drivers use tasklets TODAY.  Each driver -- in particular 
acenic, ns83820, and the 10Gbps drivers -- has been carefully tuned to 
use tasklets, hardirqs, and perhaps NAPI too.  Changing to workqueue 
WILL affect network driver hot paths, yet I see no analysis or 
measurement at all of the behavior differences.


If hackers are willing to revisit each network driver, rework the 
tasklet code to something more sane [in your opinion], and TEST it, I 
will review the patches and happily ACK away.


Given that I feel that course of action is unlikely (the preferred 
alternative apparently being "I don't use these drivers, but surely my 
changes are OK anyway"), I do not see how this effort can proceed as is.


Lots of time went into tuning these network drivers for the specific 
thread model they use.  Maybe that thread model is no longer in style. 
Maybe modern machine behavior dictates a different approach.  The point 
is... you don't know.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Oleg Nesterov
On 06/28, Steven Rostedt wrote:
> 
> I also don't see any nice API to have the priority set for a workqueue
> thread from within the kernel. Looks like one needs to be added,
> otherwise, I need to have the wrapper dig into the workqueue structs to
> find the thread that handles the workqueue.

It is not so trivial to implement properly. Note that CPU_UP creates a new
cwq->thread, so somehow workqueue should "remember" its priority. This means
we should record it in workqueue_struct. The most simple way is to add yet
another parameter to __create_workqueue(), but this is nasty.

So, perhaps we should add "long nice" to "struct workqueue_struct", and then

void set_workqueue_nice(struct workqueue_struct *wq, long nice)
{
const cpumask_t *cpu_map = wq_cpu_map(wq);
struct cpu_workqueue_struct *cwq;
int cpu;

wq->nice = nice;

mutex_lock(_mutex);

for_each_cpu_mask(cpu, *cpu_map) {
cwq = per_cpu_ptr(wq->cpu_wq, cpu);
if (cwq->thread)
set_user_nice(cwq->thread, nice);
}

mutex_unlock(_mutex);
}

We could use for_each_cpu_online() instead, but then we should check
is_single_threaded().

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Ingo Molnar

* Alexey Kuznetsov <[EMAIL PROTECTED]> wrote:

> > the context-switch argument i'll believe if i see numbers. You'll 
> > probably need in excess of tens of thousands of irqs/sec to even be 
> > able to measure its overhead. (workqueues are driven by nice kernel 
> > threads so there's no TLB overhead, etc.)
> 
> It was authors of the patch who were supposed to give some numbers, at 
> least one or two, just to prove the concept. :-)

sure enough! But it was not me who claimed that 'workqueues are slow'.

firstly, i'm not here at all to tell people what tools to use. I'm not 
trying to 'force' people away from a perfectly logical technological 
choice. I am just wondering out loud whether this particular tool, in 
its current usage pattern, makes much technological sense. My claim is: 
it could very well be that it doesnt make _much_ sense, and in that case 
we should provide a non-intrusive migration path away in terms of a 
compatible API wrapper to a saner (albeit by virtue of trying to emulate 
an existing API, slower) mechanism. The examples cited so far had the 
tasklet as an intermediary towards a softirq - what's the technological 
point in such a splitup?

> According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet 
> schedule and execution eats ~300ns, workqueue eats ~4usec. On my 
> 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.

I find the 4usecs cost on a P4 interesting and a bit too high - how did 
you measure it? (any test-patch for it i could try?) But i think even 
your current numbers partly prove my point: with 1.2 usecs and 10,000 
irqs/sec the cost is 1.2 msecs/sec, or 0.1%. And 10K irqs/sec themselves 
will eat up much more CPU time than that already.

> Formally looking awful, this result is positive: tasklets are almost 
> never used in hot paths. I am sure only about one such place: acenic 
> driver uses tasklet to refill rx queue. This generates not more than 
> 3000 tasklet schedules per second. Even on P4 it pure workqueue 
> schedule will eat ~1% of bare cpu ticks.

... and the irq cost itself will eat 5-10% of bare CPU ticks already.

> > ... workqueues are also possibly much more scalable
> 
> I cannot figure out - scale in what direction? :-)

workqueues can be per-cpu - for tasklets to be per-cpu you have to 
open-code them into per-cpu like rcu-tasklets did (which in essence 
turns them into more expensive softirqs).

> >  (percpu workqueues
> > are easy without changing anything in your code but the call where 
> > you create the workqueue).
> 
> I do not see how it is related to scalability. And the statement does 
> not even make sense. The patch already uses per-cpu workqueue for 
> tasklets, otherwise it would be a disaster: guaranteed cpu 
> non-locality.

my argument was: workqueues are more scalable than tasklets in general.

Just look at the tasklet_disable() logic. We basically have a per-cpu 
list of tasklets that we poll in tasklet_action:

 static void tasklet_action(struct softirq_action *a)
 {
[...]
while (list) {
struct tasklet_struct *t = list;

list = list->next;

if (tasklet_trylock(t)) {

and if the trylock fails, we just continue to meet this activated 
tasklet again and again, in this nice linear list.

this happens to work in practice because 1) tasklets are used quite 
rarely! 2) tasklet_disable() is done realtively rarely and nobody truly 
runs tons of the same devices (which depend on a tasklet) on the same 
box, but still it's quite an unhealthy approach. Every time i look at 
the tasklet code it hurts - having fundamental stuff like that in the 
heart of Linux ;-)

also, the "be afraid of the hardirq or the process context" mantra is 
overblown as well. If something is too heavy for a hardirq, _it's too 
heavy for a tasklet too_. Most hardirqs are (or should be) running with 
interrupts enabled, which makes their difference to softirqs miniscule. 

The most scalable workloads dont involve any (or many) softirq middlemen 
at all: you queue work straight from the hardirq context to the target 
process context. And that's what you want to do _anyway_, because you 
want to create as little locally cached data for the hardirq context, as 
the target task could easily be on another CPU. (this is generally true 
for things like block IO, but it's also true for things like network 
IO.)

the most scalable solution would be _for the network adapter to figure 
out the target CPU for the packet_. Not many (if any) such adapters 
exist at the moment. (as it would involve allocating NR_CPUs irqs to 
that adapter alone.)

> Tasklet is single thread by definition and purpose. Those a few places 
> where people used tasklets to do per-cpu jobs (RCU f.e.) exist just 
> because they had troubles with allocating new softirq. [...]

no. The following tale is the true and only history of the RCU tasklet 
;-) The RCU guys first used 

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt

On Thu, 28 Jun 2007, Alexey Kuznetsov wrote:
> > the context-switch argument i'll believe if i see numbers. You'll
> > probably need in excess of tens of thousands of irqs/sec to even be able
> > to measure its overhead. (workqueues are driven by nice kernel threads
> > so there's no TLB overhead, etc.)
>
> It was authors of the patch who were supposed to give some numbers,
> at least one or two, just to prove the concept. :-)

The problem is that we don't have the hardware that uses tasklets in
critical ways. My original patch series had a debug print in every
function (tasklet_schedule and friends).   I got a few scattered prints on
all my boxes but no flooding of prints. So I can't show that this will
hurt, because on my boxes it does not.

>
> You could set realtime prioriry by default, not a poor nice -5.
> If some network adapters were killed just because I run some task
> with nice --22, it would be just ridiculous.

This is my fault to the patch series. I compelety forgot to up the prio.
My next series will include a change where the tasklet work queue will run
at something like prio FIFO 98 (or maybe 99?)

This is a bit embarrassing that I forgot to do this, since I'm a
real-time developer ;-)

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Jeff Garzik

Alexey Kuznetsov wrote:

Hello!

the context-switch argument i'll believe if i see numbers. You'll 
probably need in excess of tens of thousands of irqs/sec to even be able 
to measure its overhead. (workqueues are driven by nice kernel threads 
so there's no TLB overhead, etc.)


It was authors of the patch who were supposed to give some numbers,
at least one or two, just to prove the concept. :-)

According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet
schedule and execution eats ~300ns, workqueue eats ~4usec.
On my 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.


Thanks :)



Anyway, all the uses of tasklet should be verified:

The most dubios place is popular Neterion 10Gbit driver, which uses
tasklet like acenic. But at 10Gbit, multiply acenic numbers and panic. :-)

Also, there exists some hardware which uses tasklets even harder,
but I have no idea what real frequencies are: f.e. sundance.

The case with acenic/s2io is quite special: normally network drivers
refill queues in irq handlers. It was Jes Sorensen observation
that offloading refilling from irq improves performance, I do not
remember numbers. Probably, switching to workqueues will not affect
performance at all, probably it will just collapse, no idea.


CPUs have gotten so fast now that its quite possible to run the tasklet 
in parallel with the next invocation of the interrupt handler.


But given the amount of tasklet use in network drivers, I do not think 
tasklets can just be magically equated to workqueues, without 
case-by-case analysis.




Tasklet is single thread by definition and purpose. Those a few places


Indeed!



the only remaining argument is latency:


You could set realtime prioriry by default, not a poor nice -5.
If some network adapters were killed just because I run some task
with nice --22, it would be just ridiculous.


Indeed.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Jeff Garzik

Ingo Molnar wrote:

* Jeff Garzik <[EMAIL PROTECTED]> wrote:

Tasklets fill a niche not filled by either workqueues (slower, 
requiring context switches, and possibly much latency is all wq's 
processes are active) [...]


... workqueues are also possibly much more scalable (percpu workqueues 
are easy without changing anything in your code but the call where you 
create the workqueue).


All that scalability is just overhead, and overkill, for what 
tasklets/softirqs are used for.



the context-switch argument i'll believe if i see numbers. You'll 
probably need in excess of tens of thousands of irqs/sec to even be able 
to measure its overhead. (workqueues are driven by nice kernel threads 
so there's no TLB overhead, etc.)


As Alexey said...  I would have thought YOU needed to provide numbers, 
rather than just handwaving as justification for tasklet removal.



the only remaining argument is latency: but workqueues are already 
pretty high-prio (with a default priority of nice -5) - and you can 
increase it even further. You can make it SCHED_FIFO prio 98 if latency 
is so important.


You skipped the very relevant latency killer:  N threads in wq, and you 
submit the (N+1)th task.


I just cannot see how that is acceptable replacement for a network 
driver that uses tasklets.  Who wants to wait that long for packet RX or TX?



Tasklets on the other hand are _unconditionally_ 
high-priority. So this argument is more of an arms-race argument: "i 
want _my_ processing to be done immediately!". The fact that workqueues 
can be preempted and that their priorities can be adjusted flexibly is 
an optional _bonus_, not a disadvantage. If low-prio workqueues hurts 
your workflow, make them high-prio.


How about letting us stick with a solution that is WORKING now?

Of course tasklets are unconditionally high priority.  So are hardirqs. 
 So are softirqs.  This is not a problem, this is an expected and 
assumed-upon feature of the system.



And moving code -back- into hardirq is just the wrong thing to do, 
usually.


agreed - except if the in-tasklet processing is really thin and there's 
already a softirq layer in the workflow. (which the case was for the 
example that was cited.) In such a case moving either to the hardirq or 
to the softirq looks like the right thing - instead of the tasklet 
intermediary.


Wrong, for all the examples I care about -- drivers.  Network drivers in 
particular.  Just look at the comment in include/linux/interrupt.h if it 
wasn't clear:


/* PLEASE, avoid to allocate new softirqs, if you need not _really_ high
   frequency threaded job scheduling. For almost all the purposes
   tasklets are more than enough. F.e. all serial device BHs et
   al. should be converted to tasklets, not to softirqs.
 */

There is a good reason for this advice, as hinted at by the code 
immediately following the comment:


enum
{
HI_SOFTIRQ=0,
TIMER_SOFTIRQ,
NET_TX_SOFTIRQ,
NET_RX_SOFTIRQ,
BLOCK_SOFTIRQ,
TASKLET_SOFTIRQ,
SCHED_SOFTIRQ,
#ifdef CONFIG_HIGH_RES_TIMERS
HRTIMER_SOFTIRQ,
#endif
};

softirqs cannot really be used by drivers, because they are not modular. 
 They are a scarce resource in any case.


Guess what?  All this is why we have tasklets.

tasklet != workqueue

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Alexey Kuznetsov
Hello!

> the context-switch argument i'll believe if i see numbers. You'll 
> probably need in excess of tens of thousands of irqs/sec to even be able 
> to measure its overhead. (workqueues are driven by nice kernel threads 
> so there's no TLB overhead, etc.)

It was authors of the patch who were supposed to give some numbers,
at least one or two, just to prove the concept. :-)

According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet
schedule and execution eats ~300ns, workqueue eats ~4usec.
On my 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.

Formally looking awful, this result is positive: tasklets are almost
never used in hot paths. I am sure only about one such place: acenic
driver uses tasklet to refill rx queue. This generates not more than
3000 tasklet schedules per second. Even on P4 it pure workqueue schedule
will eat ~1% of bare cpu ticks.

Anyway, all the uses of tasklet should be verified:

The most dubios place is popular Neterion 10Gbit driver, which uses
tasklet like acenic. But at 10Gbit, multiply acenic numbers and panic. :-)

Also, there exists some hardware which uses tasklets even harder,
but I have no idea what real frequencies are: f.e. sundance.

The case with acenic/s2io is quite special: normally network drivers
refill queues in irq handlers. It was Jes Sorensen observation
that offloading refilling from irq improves performance, I do not
remember numbers. Probably, switching to workqueues will not affect
performance at all, probably it will just collapse, no idea.


> ... workqueues are also possibly much more scalable

I cannot figure out - scale in what direction? :-)
 

>(percpu workqueues 
> are easy without changing anything in your code but the call where you 
> create the workqueue).

I do not see how it is related to scalability. And the statement
does not even make sense. The patch already uses per-cpu workqueue
for tasklets, otherwise it would be a disaster: guaranteed cpu non-locality.

Tasklet is single thread by definition and purpose. Those a few places
where people used tasklets to do per-cpu jobs (RCU f.e.) exist just because
they had troubles with allocating new softirq. Workqueues do not make
any difference: tasklet is not workqueue, it is work_struct, and you
still will have to allocate array of per-cpu work structs, everything
remains the same.


> the only remaining argument is latency:

You could set realtime prioriry by default, not a poor nice -5.
If some network adapters were killed just because I run some task
with nice --22, it would be just ridiculous.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt

Hi Dan,

On Mon, 25 Jun 2007, Dan Williams wrote:

> Yes you are right, ARM does not flush L1 when prev==next in switch_mm.
>
> > Perhaps something else is at fault here.
> >
> I'll try and dig a bit deeper...

BTW:

 static int __init iop_adma_init (void)
 {
+   iop_adma_workqueue = create_workqueue("iop-adma");
+   if (!iop_adma_workqueue)
+   return -ENODEV;
+

Could you also try upping the prio of all the "iop-adma" threads?

You should see thread names such as (on SMP) "iop-adma/0", "iop-adma/1"
... "iop-adma/N" where N = # of CPUs - 1.

do a "chrt -p -f 98 "  once for each of the thread's PIDs.  The
chrt can be found in the package "util-linux" on Red Hat / Fedora, and in
schedutils on Debian.

It just dawned on me that workqueues don't run at a high priority by
default.  So it's funny that I'm running all my current tasklets as a low
priority work queues :-)

But that can certainly be a cause of high latency.  I need to update my
patches to make the workqueue thread a higher priority. All benchmarks on
this patch have been using a low priority work queue.

I also don't see any nice API to have the priority set for a workqueue
thread from within the kernel. Looks like one needs to be added,
otherwise, I need to have the wrapper dig into the workqueue structs to
find the thread that handles the workqueue.

Thanks,

-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Ingo Molnar

* Jeff Garzik <[EMAIL PROTECTED]> wrote:

> Tasklets fill a niche not filled by either workqueues (slower, 
> requiring context switches, and possibly much latency is all wq's 
> processes are active) [...]

... workqueues are also possibly much more scalable (percpu workqueues 
are easy without changing anything in your code but the call where you 
create the workqueue).

the context-switch argument i'll believe if i see numbers. You'll 
probably need in excess of tens of thousands of irqs/sec to even be able 
to measure its overhead. (workqueues are driven by nice kernel threads 
so there's no TLB overhead, etc.)

the only remaining argument is latency: but workqueues are already 
pretty high-prio (with a default priority of nice -5) - and you can 
increase it even further. You can make it SCHED_FIFO prio 98 if latency 
is so important. Tasklets on the other hand are _unconditionally_ 
high-priority. So this argument is more of an arms-race argument: "i 
want _my_ processing to be done immediately!". The fact that workqueues 
can be preempted and that their priorities can be adjusted flexibly is 
an optional _bonus_, not a disadvantage. If low-prio workqueues hurts 
your workflow, make them high-prio.

> And moving code -back- into hardirq is just the wrong thing to do, 
> usually.

agreed - except if the in-tasklet processing is really thin and there's 
already a softirq layer in the workflow. (which the case was for the 
example that was cited.) In such a case moving either to the hardirq or 
to the softirq looks like the right thing - instead of the tasklet 
intermediary.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Ingo Molnar

* Jeff Garzik [EMAIL PROTECTED] wrote:

 Tasklets fill a niche not filled by either workqueues (slower, 
 requiring context switches, and possibly much latency is all wq's 
 processes are active) [...]

... workqueues are also possibly much more scalable (percpu workqueues 
are easy without changing anything in your code but the call where you 
create the workqueue).

the context-switch argument i'll believe if i see numbers. You'll 
probably need in excess of tens of thousands of irqs/sec to even be able 
to measure its overhead. (workqueues are driven by nice kernel threads 
so there's no TLB overhead, etc.)

the only remaining argument is latency: but workqueues are already 
pretty high-prio (with a default priority of nice -5) - and you can 
increase it even further. You can make it SCHED_FIFO prio 98 if latency 
is so important. Tasklets on the other hand are _unconditionally_ 
high-priority. So this argument is more of an arms-race argument: i 
want _my_ processing to be done immediately!. The fact that workqueues 
can be preempted and that their priorities can be adjusted flexibly is 
an optional _bonus_, not a disadvantage. If low-prio workqueues hurts 
your workflow, make them high-prio.

 And moving code -back- into hardirq is just the wrong thing to do, 
 usually.

agreed - except if the in-tasklet processing is really thin and there's 
already a softirq layer in the workflow. (which the case was for the 
example that was cited.) In such a case moving either to the hardirq or 
to the softirq looks like the right thing - instead of the tasklet 
intermediary.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt

Hi Dan,

On Mon, 25 Jun 2007, Dan Williams wrote:

 Yes you are right, ARM does not flush L1 when prev==next in switch_mm.

  Perhaps something else is at fault here.
 
 I'll try and dig a bit deeper...

BTW:

 static int __init iop_adma_init (void)
 {
+   iop_adma_workqueue = create_workqueue(iop-adma);
+   if (!iop_adma_workqueue)
+   return -ENODEV;
+

Could you also try upping the prio of all the iop-adma threads?

You should see thread names such as (on SMP) iop-adma/0, iop-adma/1
... iop-adma/N where N = # of CPUs - 1.

do a chrt -p -f 98 pid  once for each of the thread's PIDs.  The
chrt can be found in the package util-linux on Red Hat / Fedora, and in
schedutils on Debian.

It just dawned on me that workqueues don't run at a high priority by
default.  So it's funny that I'm running all my current tasklets as a low
priority work queues :-)

But that can certainly be a cause of high latency.  I need to update my
patches to make the workqueue thread a higher priority. All benchmarks on
this patch have been using a low priority work queue.

I also don't see any nice API to have the priority set for a workqueue
thread from within the kernel. Looks like one needs to be added,
otherwise, I need to have the wrapper dig into the workqueue structs to
find the thread that handles the workqueue.

Thanks,

-- Steve
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Alexey Kuznetsov
Hello!

 the context-switch argument i'll believe if i see numbers. You'll 
 probably need in excess of tens of thousands of irqs/sec to even be able 
 to measure its overhead. (workqueues are driven by nice kernel threads 
 so there's no TLB overhead, etc.)

It was authors of the patch who were supposed to give some numbers,
at least one or two, just to prove the concept. :-)

According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet
schedule and execution eats ~300ns, workqueue eats ~4usec.
On my 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.

Formally looking awful, this result is positive: tasklets are almost
never used in hot paths. I am sure only about one such place: acenic
driver uses tasklet to refill rx queue. This generates not more than
3000 tasklet schedules per second. Even on P4 it pure workqueue schedule
will eat ~1% of bare cpu ticks.

Anyway, all the uses of tasklet should be verified:

The most dubios place is popular Neterion 10Gbit driver, which uses
tasklet like acenic. But at 10Gbit, multiply acenic numbers and panic. :-)

Also, there exists some hardware which uses tasklets even harder,
but I have no idea what real frequencies are: f.e. sundance.

The case with acenic/s2io is quite special: normally network drivers
refill queues in irq handlers. It was Jes Sorensen observation
that offloading refilling from irq improves performance, I do not
remember numbers. Probably, switching to workqueues will not affect
performance at all, probably it will just collapse, no idea.


 ... workqueues are also possibly much more scalable

I cannot figure out - scale in what direction? :-)
 

(percpu workqueues 
 are easy without changing anything in your code but the call where you 
 create the workqueue).

I do not see how it is related to scalability. And the statement
does not even make sense. The patch already uses per-cpu workqueue
for tasklets, otherwise it would be a disaster: guaranteed cpu non-locality.

Tasklet is single thread by definition and purpose. Those a few places
where people used tasklets to do per-cpu jobs (RCU f.e.) exist just because
they had troubles with allocating new softirq. Workqueues do not make
any difference: tasklet is not workqueue, it is work_struct, and you
still will have to allocate array of per-cpu work structs, everything
remains the same.


 the only remaining argument is latency:

You could set realtime prioriry by default, not a poor nice -5.
If some network adapters were killed just because I run some task
with nice --22, it would be just ridiculous.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Jeff Garzik

Ingo Molnar wrote:

* Jeff Garzik [EMAIL PROTECTED] wrote:

Tasklets fill a niche not filled by either workqueues (slower, 
requiring context switches, and possibly much latency is all wq's 
processes are active) [...]


... workqueues are also possibly much more scalable (percpu workqueues 
are easy without changing anything in your code but the call where you 
create the workqueue).


All that scalability is just overhead, and overkill, for what 
tasklets/softirqs are used for.



the context-switch argument i'll believe if i see numbers. You'll 
probably need in excess of tens of thousands of irqs/sec to even be able 
to measure its overhead. (workqueues are driven by nice kernel threads 
so there's no TLB overhead, etc.)


As Alexey said...  I would have thought YOU needed to provide numbers, 
rather than just handwaving as justification for tasklet removal.



the only remaining argument is latency: but workqueues are already 
pretty high-prio (with a default priority of nice -5) - and you can 
increase it even further. You can make it SCHED_FIFO prio 98 if latency 
is so important.


You skipped the very relevant latency killer:  N threads in wq, and you 
submit the (N+1)th task.


I just cannot see how that is acceptable replacement for a network 
driver that uses tasklets.  Who wants to wait that long for packet RX or TX?



Tasklets on the other hand are _unconditionally_ 
high-priority. So this argument is more of an arms-race argument: i 
want _my_ processing to be done immediately!. The fact that workqueues 
can be preempted and that their priorities can be adjusted flexibly is 
an optional _bonus_, not a disadvantage. If low-prio workqueues hurts 
your workflow, make them high-prio.


How about letting us stick with a solution that is WORKING now?

Of course tasklets are unconditionally high priority.  So are hardirqs. 
 So are softirqs.  This is not a problem, this is an expected and 
assumed-upon feature of the system.



And moving code -back- into hardirq is just the wrong thing to do, 
usually.


agreed - except if the in-tasklet processing is really thin and there's 
already a softirq layer in the workflow. (which the case was for the 
example that was cited.) In such a case moving either to the hardirq or 
to the softirq looks like the right thing - instead of the tasklet 
intermediary.


Wrong, for all the examples I care about -- drivers.  Network drivers in 
particular.  Just look at the comment in include/linux/interrupt.h if it 
wasn't clear:


/* PLEASE, avoid to allocate new softirqs, if you need not _really_ high
   frequency threaded job scheduling. For almost all the purposes
   tasklets are more than enough. F.e. all serial device BHs et
   al. should be converted to tasklets, not to softirqs.
 */

There is a good reason for this advice, as hinted at by the code 
immediately following the comment:


enum
{
HI_SOFTIRQ=0,
TIMER_SOFTIRQ,
NET_TX_SOFTIRQ,
NET_RX_SOFTIRQ,
BLOCK_SOFTIRQ,
TASKLET_SOFTIRQ,
SCHED_SOFTIRQ,
#ifdef CONFIG_HIGH_RES_TIMERS
HRTIMER_SOFTIRQ,
#endif
};

softirqs cannot really be used by drivers, because they are not modular. 
 They are a scarce resource in any case.


Guess what?  All this is why we have tasklets.

tasklet != workqueue

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Jeff Garzik

Alexey Kuznetsov wrote:

Hello!

the context-switch argument i'll believe if i see numbers. You'll 
probably need in excess of tens of thousands of irqs/sec to even be able 
to measure its overhead. (workqueues are driven by nice kernel threads 
so there's no TLB overhead, etc.)


It was authors of the patch who were supposed to give some numbers,
at least one or two, just to prove the concept. :-)

According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet
schedule and execution eats ~300ns, workqueue eats ~4usec.
On my 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.


Thanks :)



Anyway, all the uses of tasklet should be verified:

The most dubios place is popular Neterion 10Gbit driver, which uses
tasklet like acenic. But at 10Gbit, multiply acenic numbers and panic. :-)

Also, there exists some hardware which uses tasklets even harder,
but I have no idea what real frequencies are: f.e. sundance.

The case with acenic/s2io is quite special: normally network drivers
refill queues in irq handlers. It was Jes Sorensen observation
that offloading refilling from irq improves performance, I do not
remember numbers. Probably, switching to workqueues will not affect
performance at all, probably it will just collapse, no idea.


CPUs have gotten so fast now that its quite possible to run the tasklet 
in parallel with the next invocation of the interrupt handler.


But given the amount of tasklet use in network drivers, I do not think 
tasklets can just be magically equated to workqueues, without 
case-by-case analysis.




Tasklet is single thread by definition and purpose. Those a few places


Indeed!



the only remaining argument is latency:


You could set realtime prioriry by default, not a poor nice -5.
If some network adapters were killed just because I run some task
with nice --22, it would be just ridiculous.


Indeed.

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt

On Thu, 28 Jun 2007, Alexey Kuznetsov wrote:
  the context-switch argument i'll believe if i see numbers. You'll
  probably need in excess of tens of thousands of irqs/sec to even be able
  to measure its overhead. (workqueues are driven by nice kernel threads
  so there's no TLB overhead, etc.)

 It was authors of the patch who were supposed to give some numbers,
 at least one or two, just to prove the concept. :-)

The problem is that we don't have the hardware that uses tasklets in
critical ways. My original patch series had a debug print in every
function (tasklet_schedule and friends).   I got a few scattered prints on
all my boxes but no flooding of prints. So I can't show that this will
hurt, because on my boxes it does not.


 You could set realtime prioriry by default, not a poor nice -5.
 If some network adapters were killed just because I run some task
 with nice --22, it would be just ridiculous.

This is my fault to the patch series. I compelety forgot to up the prio.
My next series will include a change where the tasklet work queue will run
at something like prio FIFO 98 (or maybe 99?)

This is a bit embarrassing that I forgot to do this, since I'm a
real-time developer ;-)

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Ingo Molnar

* Alexey Kuznetsov [EMAIL PROTECTED] wrote:

  the context-switch argument i'll believe if i see numbers. You'll 
  probably need in excess of tens of thousands of irqs/sec to even be 
  able to measure its overhead. (workqueues are driven by nice kernel 
  threads so there's no TLB overhead, etc.)
 
 It was authors of the patch who were supposed to give some numbers, at 
 least one or two, just to prove the concept. :-)

sure enough! But it was not me who claimed that 'workqueues are slow'.

firstly, i'm not here at all to tell people what tools to use. I'm not 
trying to 'force' people away from a perfectly logical technological 
choice. I am just wondering out loud whether this particular tool, in 
its current usage pattern, makes much technological sense. My claim is: 
it could very well be that it doesnt make _much_ sense, and in that case 
we should provide a non-intrusive migration path away in terms of a 
compatible API wrapper to a saner (albeit by virtue of trying to emulate 
an existing API, slower) mechanism. The examples cited so far had the 
tasklet as an intermediary towards a softirq - what's the technological 
point in such a splitup?

 According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet 
 schedule and execution eats ~300ns, workqueue eats ~4usec. On my 
 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.

I find the 4usecs cost on a P4 interesting and a bit too high - how did 
you measure it? (any test-patch for it i could try?) But i think even 
your current numbers partly prove my point: with 1.2 usecs and 10,000 
irqs/sec the cost is 1.2 msecs/sec, or 0.1%. And 10K irqs/sec themselves 
will eat up much more CPU time than that already.

 Formally looking awful, this result is positive: tasklets are almost 
 never used in hot paths. I am sure only about one such place: acenic 
 driver uses tasklet to refill rx queue. This generates not more than 
 3000 tasklet schedules per second. Even on P4 it pure workqueue 
 schedule will eat ~1% of bare cpu ticks.

... and the irq cost itself will eat 5-10% of bare CPU ticks already.

  ... workqueues are also possibly much more scalable
 
 I cannot figure out - scale in what direction? :-)

workqueues can be per-cpu - for tasklets to be per-cpu you have to 
open-code them into per-cpu like rcu-tasklets did (which in essence 
turns them into more expensive softirqs).

   (percpu workqueues
  are easy without changing anything in your code but the call where 
  you create the workqueue).
 
 I do not see how it is related to scalability. And the statement does 
 not even make sense. The patch already uses per-cpu workqueue for 
 tasklets, otherwise it would be a disaster: guaranteed cpu 
 non-locality.

my argument was: workqueues are more scalable than tasklets in general.

Just look at the tasklet_disable() logic. We basically have a per-cpu 
list of tasklets that we poll in tasklet_action:

 static void tasklet_action(struct softirq_action *a)
 {
[...]
while (list) {
struct tasklet_struct *t = list;

list = list-next;

if (tasklet_trylock(t)) {

and if the trylock fails, we just continue to meet this activated 
tasklet again and again, in this nice linear list.

this happens to work in practice because 1) tasklets are used quite 
rarely! 2) tasklet_disable() is done realtively rarely and nobody truly 
runs tons of the same devices (which depend on a tasklet) on the same 
box, but still it's quite an unhealthy approach. Every time i look at 
the tasklet code it hurts - having fundamental stuff like that in the 
heart of Linux ;-)

also, the be afraid of the hardirq or the process context mantra is 
overblown as well. If something is too heavy for a hardirq, _it's too 
heavy for a tasklet too_. Most hardirqs are (or should be) running with 
interrupts enabled, which makes their difference to softirqs miniscule. 

The most scalable workloads dont involve any (or many) softirq middlemen 
at all: you queue work straight from the hardirq context to the target 
process context. And that's what you want to do _anyway_, because you 
want to create as little locally cached data for the hardirq context, as 
the target task could easily be on another CPU. (this is generally true 
for things like block IO, but it's also true for things like network 
IO.)

the most scalable solution would be _for the network adapter to figure 
out the target CPU for the packet_. Not many (if any) such adapters 
exist at the moment. (as it would involve allocating NR_CPUs irqs to 
that adapter alone.)

 Tasklet is single thread by definition and purpose. Those a few places 
 where people used tasklets to do per-cpu jobs (RCU f.e.) exist just 
 because they had troubles with allocating new softirq. [...]

no. The following tale is the true and only history of the RCU tasklet 
;-) The RCU guys first used a tasklet, then noticed its bad 

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Oleg Nesterov
On 06/28, Steven Rostedt wrote:
 
 I also don't see any nice API to have the priority set for a workqueue
 thread from within the kernel. Looks like one needs to be added,
 otherwise, I need to have the wrapper dig into the workqueue structs to
 find the thread that handles the workqueue.

It is not so trivial to implement properly. Note that CPU_UP creates a new
cwq-thread, so somehow workqueue should remember its priority. This means
we should record it in workqueue_struct. The most simple way is to add yet
another parameter to __create_workqueue(), but this is nasty.

So, perhaps we should add long nice to struct workqueue_struct, and then

void set_workqueue_nice(struct workqueue_struct *wq, long nice)
{
const cpumask_t *cpu_map = wq_cpu_map(wq);
struct cpu_workqueue_struct *cwq;
int cpu;

wq-nice = nice;

mutex_lock(workqueue_mutex);

for_each_cpu_mask(cpu, *cpu_map) {
cwq = per_cpu_ptr(wq-cpu_wq, cpu);
if (cwq-thread)
set_user_nice(cwq-thread, nice);
}

mutex_unlock(workqueue_mutex);
}

We could use for_each_cpu_online() instead, but then we should check
is_single_threaded().

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Jeff Garzik

Ingo Molnar wrote:

my argument was: workqueues are more scalable than tasklets in general.


Here is my argument:  that is totally irrelevant to $subject, when it 
comes to dealing with managing existing [network driver] behavior and 
performance.


My overall objection is the attempt to replace apples with oranges.

Network drivers use tasklets TODAY.  Each driver -- in particular 
acenic, ns83820, and the 10Gbps drivers -- has been carefully tuned to 
use tasklets, hardirqs, and perhaps NAPI too.  Changing to workqueue 
WILL affect network driver hot paths, yet I see no analysis or 
measurement at all of the behavior differences.


If hackers are willing to revisit each network driver, rework the 
tasklet code to something more sane [in your opinion], and TEST it, I 
will review the patches and happily ACK away.


Given that I feel that course of action is unlikely (the preferred 
alternative apparently being I don't use these drivers, but surely my 
changes are OK anyway), I do not see how this effort can proceed as is.


Lots of time went into tuning these network drivers for the specific 
thread model they use.  Maybe that thread model is no longer in style. 
Maybe modern machine behavior dictates a different approach.  The point 
is... you don't know.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Jeff Garzik

Ingo Molnar wrote:

But it was not me who claimed that 'workqueues are slow'.


The claim was:  slower than tasklets.


choice. I am just wondering out loud whether this particular tool, in 
its current usage pattern, makes much technological sense. My claim is: 
it could very well be that it doesnt make _much_ sense, and in that case 
we should provide a non-intrusive migration path away in terms of a 
compatible API wrapper to a saner (albeit by virtue of trying to emulate 
an existing API, slower) mechanism. The examples cited so far had the 
tasklet as an intermediary towards a softirq - what's the technological 
point in such a splitup?


I already answered that in detail.  In sum, a driver cannot define its 
own softirq.  Softirqs are not modular.


Tasklets are the closest thing to softirqs for a driver.


The most scalable workloads dont involve any (or many) softirq middlemen 
at all: you queue work straight from the hardirq context to the target 
process context. And that's what you want to do _anyway_, because you 
want to create as little locally cached data for the hardirq context, as 
the target task could easily be on another CPU. (this is generally true 
for things like block IO, but it's also true for things like network 
IO.)


the most scalable solution would be _for the network adapter to figure 
out the target CPU for the packet_.


I agree completely.  Wanna implement this?  I will kiss your feet, and 
multi-core CPU vendors will worship you as a demi-god.


Until such time, we must deal with the network stack as it exists today, 
and the network drivers as they exist and work today.



Not many (if any) such adapters 
exist at the moment. (as it would involve allocating NR_CPUs irqs to 
that adapter alone.)


Good news:  this is becoming the norm for modern NICs, especially 10Gbps.

Plenty of NICs already exist that support multiple RX rings (persumably 
one per CPU), and newer NICs will raise individual MSI[-X] interrupts 
based on the RX ring into which a packet was received.


In this area, NIC vendors are way ahead of the Linux net stack.

The Linux net stack is unfortunately not threaded enough to sanely deal 
with dividing /flows/ up across multiple CPUs, even if the NIC does 
support multiple transmit and receive queues.   [side note: initial 
multi-queue TX is being worked on, on netdev]



Tasklet is single thread by definition and purpose. Those a few places 
where people used tasklets to do per-cpu jobs (RCU f.e.) exist just 
because they had troubles with allocating new softirq. [...]


no. The following tale is the true and only history of the RCU tasklet 
;-) The RCU guys first used a tasklet, then noticed its bad scalability 
(a particular VFS-intense benchmark regressed because only a single CPU 
would do RCU completion on an 8-way box) so they switched it to a 
per-cpu tasklet - without realizing that a per-cpu tasklet is in essence 
a softirq. I pointed it out to them (years down the road ...) then the 
convert rcu-tasklet to softirq patch was born.


You focused on the example rather than the key phrase:  tasklet is 
single thread by definition and purpose.


Wanting to change that without analysis of the impact illustrates the 
apples-to-oranges change being proposed.



outlined above: if you want good scalability, dont use middlemen :-) 
Figure out the target task as early as possible and let it do as much of 
the remaining work as possible. _Increasing_ the amount of cached 
context (by doing delayed processing in tasklets or even softirqs on the 
same CPU where the hardirq arrived) only increases the cross-CPU cost. 
Keeping stuff in a softirq only makes (some) sense as long as you have 
no target task at all (routing, filtering, etc.).


I do not disagree with these theoretical musings :)

I care the most about the who will do all this work? question.  In 
network driver land, these changes impact hot paths.  I am lazy, and 
don't care to revisit each network driver hot path and carefully re-tune 
each based on this proposal.  Who is volunteering?


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Dan Williams

On 6/28/07, Steven Rostedt [EMAIL PROTECTED] wrote:


Hi Dan,

On Mon, 25 Jun 2007, Dan Williams wrote:

 Yes you are right, ARM does not flush L1 when prev==next in switch_mm.

  Perhaps something else is at fault here.
 
 I'll try and dig a bit deeper...

BTW:

 static int __init iop_adma_init (void)
 {
+   iop_adma_workqueue = create_workqueue(iop-adma);
+   if (!iop_adma_workqueue)
+   return -ENODEV;
+

Could you also try upping the prio of all the iop-adma threads?


Unfortunately setting the thread to real time priority makes
throughput slightly worse.  Instead of floating around 35MB/s the
resync speed is stuck around 30MB/s:

[ iop-adma: hi-prio workqueue based callbacks ]
iq81340mc:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
 468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
 []  recovery =  3.9% (6107424/156290816)
finish=84.9min speed=29448K/sec

The iop-adma tasklet cleans up any completed descriptors and in the
process calls any attached callbacks.  For the raid5 resync case the
callback is simply:

static void ops_complete_check(void *stripe_head_ref)
{
   struct stripe_head *sh = stripe_head_ref;
   int pd_idx = sh-pd_idx;

   pr_debug(%s: stripe %llu\n, __FUNCTION__,
   (unsigned long long)sh-sector);

   if (test_and_clear_bit(STRIPE_OP_MOD_DMA_CHECK, sh-ops.pending) 
   sh-ops.zero_sum_result == 0)
   set_bit(R5_UPTODATE, sh-dev[pd_idx].flags);

   set_bit(STRIPE_OP_CHECK, sh-ops.complete);
   set_bit(STRIPE_HANDLE, sh-state);
   release_stripe(sh);
}


[ iop-adma: tasklet based callbacks ]
iq81340mc:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
 468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
 [=...]  recovery =  5.1% (8024248/156290816)
finish=47.9min speed=51486K/sec

--
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Andrew Morton
On Thu, 28 Jun 2007 18:00:01 +0200 Ingo Molnar [EMAIL PROTECTED] wrote:

  with 1.2 usecs and 10,000 
 irqs/sec the cost is 1.2 msecs/sec, or 0.1%.

off-by-10 error.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

 On Thu, 28 Jun 2007 18:00:01 +0200 Ingo Molnar [EMAIL PROTECTED] wrote:
 
   with 1.2 usecs and 10,000 
  irqs/sec the cost is 1.2 msecs/sec, or 0.1%.
 
 off-by-10 error.

yeah, indeed - 12 msecs and 1.2% :-/

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt
On Thu, 28 Jun 2007, Dan Williams wrote:
 
 Unfortunately setting the thread to real time priority makes
 throughput slightly worse.  Instead of floating around 35MB/s the
 resync speed is stuck around 30MB/s:

That is really strange. If you higher the prio of the work queue it
gets worst?  Something really strange is happening here?  Are you using
CONFIG_PREEMPT?

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Dan Williams

On 6/28/07, Steven Rostedt [EMAIL PROTECTED] wrote:

On Thu, 28 Jun 2007, Dan Williams wrote:
 
 Unfortunately setting the thread to real time priority makes
 throughput slightly worse.  Instead of floating around 35MB/s the
 resync speed is stuck around 30MB/s:

That is really strange. If you higher the prio of the work queue it
gets worst?  Something really strange is happening here?  Are you using
CONFIG_PREEMPT?


Everything thus far has been CONFIG_PREEMPT=n (the default for this platform).

With CONFIG_PREEMPT=y the resync is back in the 50MB/s range.

[iop-adma: hi-prio workqueue, CONFIG_PREEMPT=y]
iq81340mc:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
 468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
 [=...]  recovery =  5.8% (9136404/156290816)
finish=46.1min speed=53161K/sec

The tasklet configuration stays in 50MB/s ballpark, and the default
priority (nice -5) workqueue case remains in the 30's with
CONFIG_PREEMPT=n.

--
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Dan Williams

On 6/28/07, Dan Williams [EMAIL PROTECTED] wrote:

Everything thus far has been CONFIG_PREEMPT=n (the default for this platform).

With CONFIG_PREEMPT=y the resync is back in the 50MB/s range.

[iop-adma: hi-prio workqueue, CONFIG_PREEMPT=y]
iq81340mc:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
  468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
  [=...]  recovery =  5.8% (9136404/156290816)
finish=46.1min speed=53161K/sec

The tasklet configuration stays in 50MB/s ballpark, and the default
priority (nice -5) workqueue case remains in the 30's with
CONFIG_PREEMPT=n.


That last line should be CONFIG_PREEMPT=y

But I guess there is a reason it is still marked experimental...

iq81340mc:/data_dir# ./md0_verify.sh
kernel BUG at mm/page_alloc.c:363!
Unable to handle kernel NULL pointer dereference at virtual address 
pgd = b29b8000
[] *pgd=
Internal error: Oops: 805 [#1] PREEMPT
Modules linked in:
CPU: 0Not tainted  (2.6.22-rc6 #144)
PC is at __bug+0x20/0x2c
LR is at 0x403f37a0
pc : [4002ef08]lr : [403f37a0]psr: 6093
sp : b28a3cc8  ip : 01d8  fp : b28a3cd4
r10: 0001  r9 : 0001  r8 : 
r7 : 004f  r6 : ffe0  r5 : 412919c0  r4 : 412919e0
r3 :   r2 : 403f37a0  r1 : 6093  r0 : 0026
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  Segment user
Control: 0400397f  Table: 729b8018  DAC: 0015
Process md0_verify_kern (pid: 4188, stack limit = 0xb28a2258)
Stack: (0xb28a3cc8 to 0xb28a4000)
3cc0:   b28a3d1c b28a3cd8 40070a90 4002eef4 b28a2000 403f6818
3ce0: 403f682c 4013 403f6838 001f 0010 4129c3c0 2013 15573000
3d00: 4129c3c0  b28a4550  b28a3d2c b28a3d20 40070b54 400706b8
3d20: b28a3d44 b28a3d30 40073e7c 40070b4c 4129c3c0 15574000 b28a3d58 b28a3d48
3d40: 40084734 40073e34 b3e0fdcc b28a3dcc b28a3d5c 40079cd8 400846f8 
3d60:  b39a4860 b28a3ddc  0001 15574000 b28a2000 4040d548
3d80: b28a4550 b3357240 72d9e0ff   15573fff 3ffe 15574000
3da0: 15573fff 4040d548 b28a3ddc b2c15320 b28a2000  b333cc00 b3357240
3dc0: b28a3e04 b28a3dd0 4007d724 40079844 b28a3dd8  0015 4040d548
3de0: b3357240 b28f6920 b28a2000 b3357240 b3357240 b3a4f9c0 b28a3e18 b28a3e08
3e00: 4003f2d8 4007d6b4 b3a462e0 b28a3e5c b28a3e1c 40095e70 4003f2a0 
3e20:  b28a3e3c 0013 b28a3e5c b28a3e3c b333cc10 b3c15cc0 b333cc00
3e40: 0080 b28a3fb0   b28a3f2c b28a3e60 400c8270 400958f0
3e60:  b28a3ec8 b28a2000 b3c15c00 0003  b29e54a0 0014
3e80: 0044 b38b4f00 0002   0001 403f6aac 403f6aac
3ea0: b333cc7c b3a462e0 000200d2  403f6aa8 b28a2000 b333cd30 b28a3f08
3ec0: b28a3ecc 40070c38 4006fdbc 000200d2 403f6aac 0010 41295a60 b28a2000
3ee0: b333cc7c  b333cc00 b333cc00 0003 0001fff3 004f 004f
3f00: b333cc00 403f72a8 b28a2000 400c7e60  b333cc00 b28a3fb0 fffe
3f20: b28a3f5c b28a3f30 40095054 400c7e6c  b4503000 000bf408 b333cc00
3f40:  000bf628 b28a2000 b28a3fb0 b28a3f84 b28a3f60 40096c3c 40094f58
3f60: b4503000 000bf408 b28a3fb0 b4503000 4002b0e4 156da000 b28a3fa4 b28a3f88
3f80: 4002e3cc 40096b24 000bf408 000bf628 000bf788 000b  b28a3fa8
3fa0: 4002af20 4002e39c 000bf408 000bf628 000bf788 000bf628 000bf408 
3fc0: 000bf408 000bf628 000bf788  000bf628 000bf408 156da000 000be848
3fe0: 15651550 3eb7e4ac 0002b4f0 1565158c 6010 000bf788  
Backtrace:
[4002eee8] (__bug+0x0/0x2c) from [40070a90] (free_hot_cold_page+0x3e4/0x434)
[400706ac] (free_hot_cold_page+0x0/0x434) from [40070b54]
(free_hot_page+0x14/0x18)
[40070b40] (free_hot_page+0x0/0x18) from [40073e7c] (put_page+0x54/0x174)
[40073e28] (put_page+0x0/0x174) from [40084734]
(free_page_and_swap_cache+0x48/0x64)
r5:15574000 r4:4129c3c0
[400846ec] (free_page_and_swap_cache+0x0/0x64) from [40079cd8]
(unmap_vmas+0x4a0/0x67c)
r4:b3e0fdcc
[40079838] (unmap_vmas+0x0/0x67c) from [4007d724] (exit_mmap+0x7c/0x158)
[4007d6a8] (exit_mmap+0x0/0x158) from [4003f2d8] (mmput+0x44/0x100)
[4003f294] (mmput+0x0/0x100) from [40095e70] (flush_old_exec+0x58c/0x9d8)
r4:b3a462e0
[400958e4] (flush_old_exec+0x0/0x9d8) from [400c8270]
(load_elf_binary+0x410/0x18fc)
[400c7e60] (load_elf_binary+0x0/0x18fc) from [40095054]
(search_binary_handler+0x108/0x2b4)
[40094f4c] (search_binary_handler+0x0/0x2b4) from [40096c3c]
(do_execve+0x124/0x1e4)
[40096b18] (do_execve+0x0/0x1e4) from [4002e3cc] (sys_execve+0x3c/0x5c)
[4002e390] (sys_execve+0x0/0x5c) from [4002af20] (ret_fast_syscall+0x0/0x3c)
r7:000b r6:000bf788 r5:000bf628 r4:000bf408
Code: e1a01000 e59f000c eb004d60 e3a03000 (e5833000)
note: md0_verify_kern[4188] exited with preempt_count 4
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL 

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt

On Thu, 28 Jun 2007, Dan Williams wrote:

  CONFIG_PREEMPT?
 
 Everything thus far has been CONFIG_PREEMPT=n (the default for this platform).

 With CONFIG_PREEMPT=y the resync is back in the 50MB/s range.

So with upping the prio for the work queue you got back your performance?


 [iop-adma: hi-prio workqueue, CONFIG_PREEMPT=y]
 iq81340mc:~# cat /proc/mdstat
 Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
 md0 : active raid5 sdd[4] sdc[2] sdb[1] sda[0]
   468872448 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
   [=...]  recovery =  5.8% (9136404/156290816)
 finish=46.1min speed=53161K/sec

 The tasklet configuration stays in 50MB/s ballpark, and the default
 priority (nice -5) workqueue case remains in the 30's with
 CONFIG_PREEMPT=n.

[noted: should be CONFIG_PREEMPT=y]

This is expected. Seems you may have otherthings running at a higher prio.

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Steven Rostedt


--


 But I guess there is a reason it is still marked experimental...

 iq81340mc:/data_dir# ./md0_verify.sh
 kernel BUG at mm/page_alloc.c:363!

Well, at least this uncovered something :-)

I'll look into this too.

-- Steve
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-27 Thread Jeff Garzik

Ingo Molnar wrote:
so how about the following, different approach: anyone who has a tasklet 
in any performance-sensitive codepath, please yell now. We'll also do a 
proactive search for such places. We can convert those places to 
softirqs, or move them back into hardirq context. Once this is done - 
and i doubt it will go beyond 1-2 places - we can just mass-convert the 
other 110 places to the lame but compatible solution of doing them in a 
global thread context.



Color me unconvinced.

Tasklets fill a niche not filled by either workqueues (slower, requiring 
context switches, and possibly much latency is all wq's processes are 
active) or softirqs (limited number of them, not flexible at all). 
Sure, tasklets kick over to ksoftirqd, but not immediately, and therein 
lies their value.


And moving code -back- into hardirq is just the wrong thing to do, usually.

This proposal is ENTIRELY derived from "not convenient to my project" 
logic AFAICS, rather than the more sound "not needed in the kernel."


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-27 Thread Jeff Garzik

Ingo Molnar wrote:
so how about the following, different approach: anyone who has a tasklet 
in any performance-sensitive codepath, please yell now. We'll also do a 
proactive search for such places. We can convert those places to 
softirqs, or move them back into hardirq context. Once this is done - 
and i doubt it will go beyond 1-2 places - we can just mass-convert the 
other 110 places to the lame but compatible solution of doing them in a 
global thread context.



Color me unconvinced.

Tasklets fill a niche not filled by either workqueues (slower, requiring 
context switches, and possibly much latency is all wq's processes are 
active) or softirqs (limited number of them, not flexible at all). 
Sure, tasklets kick over to ksoftirqd, but not immediately, and therein 
lies their value.


And moving code -back- into hardirq is just the wrong thing to do, usually.

This proposal is ENTIRELY derived from not convenient to my project 
logic AFAICS, rather than the more sound not needed in the kernel.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-26 Thread Clemens Ladisch

Ingo Molnar wrote:

so how about the following, different approach: anyone who has a tasklet
in any performance-sensitive codepath, please yell now.


ALSA uses quite a few tasklets in the framework and in several  
drivers.  Since we

care very much about low latency, many places use tasklet_hi_*.

It would be possible to convert to some GENERIC_SOFTIRQ mechanism, but  
then we'd

want to use a softirq that has higher priority than the 'standard' generic
softirq, similar to HI_SOFTIRQ vs. TASKLET_SOFTIRQ.

BTW: Is there a reason why HRTIMER_SOFTIRQ is the lowest-priority  
softirq instead

of being near TIMER_SOFTIRQ?


Regards,
Clemens

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-26 Thread Takashi Iwai
At Tue, 26 Jun 2007 15:03:23 +0200,
Clemens Ladisch wrote:
> 
> Ingo Molnar wrote:
> > so how about the following, different approach: anyone who has a tasklet
> > in any performance-sensitive codepath, please yell now.
> 
> ALSA uses quite a few tasklets in the framework and in several  
> drivers.  Since we
> care very much about low latency, many places use tasklet_hi_*.

I think we can replace from tasklet to workqueue in many card-driver
codes, at least.  Many of them use tasklet simply because there was 
no workq at that time.  It's the correct move for such drivers.

But, yes, the code using tasklet in the core part (especially in the
timer part) requires as low latency as possible.


Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-26 Thread Takashi Iwai
At Tue, 26 Jun 2007 15:03:23 +0200,
Clemens Ladisch wrote:
 
 Ingo Molnar wrote:
  so how about the following, different approach: anyone who has a tasklet
  in any performance-sensitive codepath, please yell now.
 
 ALSA uses quite a few tasklets in the framework and in several  
 drivers.  Since we
 care very much about low latency, many places use tasklet_hi_*.

I think we can replace from tasklet to workqueue in many card-driver
codes, at least.  Many of them use tasklet simply because there was 
no workq at that time.  It's the correct move for such drivers.

But, yes, the code using tasklet in the core part (especially in the
timer part) requires as low latency as possible.


Takashi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-26 Thread Clemens Ladisch

Ingo Molnar wrote:

so how about the following, different approach: anyone who has a tasklet
in any performance-sensitive codepath, please yell now.


ALSA uses quite a few tasklets in the framework and in several  
drivers.  Since we

care very much about low latency, many places use tasklet_hi_*.

It would be possible to convert to some GENERIC_SOFTIRQ mechanism, but  
then we'd

want to use a softirq that has higher priority than the 'standard' generic
softirq, similar to HI_SOFTIRQ vs. TASKLET_SOFTIRQ.

BTW: Is there a reason why HRTIMER_SOFTIRQ is the lowest-priority  
softirq instead

of being near TIMER_SOFTIRQ?


Regards,
Clemens

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Dan Williams

On 6/25/07, Steven Rostedt <[EMAIL PROTECTED]> wrote:

On Mon, 2007-06-25 at 18:46 -0700, Dan Williams wrote:
>
> Context switches on this platform flush the L1 cache so bouncing
> between a workqueue and the MD thread is painful.

Why is context switches between two kernel threads flushing the L1
cache?  Is this a flaw in the ARM arch?  I would think the only thing
that needs to be done between a context switch of two kernel threads (or
even a user thread to a kernel thread) is update the general regs and
stack. The memory access (page_tables or whatever ARM uses) should stay
the same.


Yes you are right, ARM does not flush L1 when prev==next in switch_mm.


Perhaps something else is at fault here.


I'll try and dig a bit deeper...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Steven Rostedt
On Mon, 2007-06-25 at 18:46 -0700, Dan Williams wrote:
> 
> Context switches on this platform flush the L1 cache so bouncing
> between a workqueue and the MD thread is painful.

Why is context switches between two kernel threads flushing the L1
cache?  Is this a flaw in the ARM arch?  I would think the only thing
that needs to be done between a context switch of two kernel threads (or
even a user thread to a kernel thread) is update the general regs and
stack. The memory access (page_tables or whatever ARM uses) should stay
the same.

Perhaps something else is at fault here.

Thanks for testing!

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Dan Williams

so how about the following, different approach: anyone who has a tasklet
in any performance-sensitive codepath, please yell now. We'll also do a
proactive search for such places. We can convert those places to
softirqs, or move them back into hardirq context. Once this is done -
and i doubt it will go beyond 1-2 places - we can just mass-convert the
other 110 places to the lame but compatible solution of doing them in a
global thread context.



I have a driver / testcase that reacts negatively to a workqueue
conversion.  This is with the iop-adma driver on an ARM based platform
re-syncing a degraded raid5 array.  The driver is currently in -mm and
it uses tasklets to run a short callback routine upon completion of an
offloaded memcpy or xor operation.  Quick tests show that
write-throughput does not go down too much, but resync speed, as
reported by /proc/mdstat, drops from ~50MB/s to ~30MB/s.

Context switches on this platform flush the L1 cache so bouncing
between a workqueue and the MD thread is painful.

The conversion patch is attached.

--
Dan
diff --git a/drivers/dma/iop-adma.c b/drivers/dma/iop-adma.c
index 5d8a6cf..7e89003 100644
--- a/drivers/dma/iop-adma.c
+++ b/drivers/dma/iop-adma.c
@@ -41,6 +41,8 @@
 #define tx_to_iop_adma_slot(tx) \
 	container_of(tx, struct iop_adma_desc_slot, async_tx)
 
+static struct workqueue_struct *iop_adma_workqueue;
+
 /**
  * iop_adma_free_slots - flags descriptor slots for reuse
  * @slot: Slot to free
@@ -273,9 +275,11 @@ iop_adma_slot_cleanup(struct iop_adma_chan *iop_chan)
 	spin_unlock_bh(_chan->lock);
 }
 
-static void iop_adma_tasklet(unsigned long data)
+static void iop_adma_work_routine(struct work_struct *work)
 {
-	struct iop_adma_chan *chan = (struct iop_adma_chan *) data;
+	struct iop_adma_chan *chan =
+		container_of(work, struct iop_adma_chan, work);
+
 	__iop_adma_slot_cleanup(chan);
 }
 
@@ -370,7 +374,7 @@ retry:
 		goto retry;
 
 	/* try to free some slots if the allocation fails */
-	tasklet_schedule(_chan->irq_tasklet);
+	queue_work(iop_adma_workqueue, _chan->work);
 
 	return NULL;
 }
@@ -704,7 +708,7 @@ iop_adma_prep_dma_zero_sum(struct dma_chan *chan, unsigned int src_cnt,
 static void iop_adma_dependency_added(struct dma_chan *chan)
 {
 	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
-	tasklet_schedule(_chan->irq_tasklet);
+	queue_work(iop_adma_workqueue, _chan->work);
 }
 
 static void iop_adma_free_chan_resources(struct dma_chan *chan)
@@ -785,7 +789,7 @@ static irqreturn_t iop_adma_eot_handler(int irq, void *data)
 
 	dev_dbg(chan->device->common.dev, "%s\n", __FUNCTION__);
 
-	tasklet_schedule(>irq_tasklet);
+	queue_work(iop_adma_workqueue, >work);
 
 	iop_adma_device_clear_eot_status(chan);
 
@@ -798,7 +802,7 @@ static irqreturn_t iop_adma_eoc_handler(int irq, void *data)
 
 	dev_dbg(chan->device->common.dev, "%s\n", __FUNCTION__);
 
-	tasklet_schedule(>irq_tasklet);
+	queue_work(iop_adma_workqueue, >work);
 
 	iop_adma_device_clear_eoc_status(chan);
 
@@ -1244,8 +1248,6 @@ static int __devinit iop_adma_probe(struct platform_device *pdev)
 		ret = -ENOMEM;
 		goto err_free_iop_chan;
 	}
-	tasklet_init(_chan->irq_tasklet, iop_adma_tasklet, (unsigned long)
-		iop_chan);
 
 	/* clear errors before enabling interrupts */
 	iop_adma_device_clear_err_status(iop_chan);
@@ -1268,11 +1270,13 @@ static int __devinit iop_adma_probe(struct platform_device *pdev)
 
 	spin_lock_init(_chan->lock);
 	init_timer(_chan->cleanup_watchdog);
-	iop_chan->cleanup_watchdog.data = (unsigned long) iop_chan;
-	iop_chan->cleanup_watchdog.function = iop_adma_tasklet;
+	iop_chan->cleanup_watchdog.data = (unsigned long) _chan->work;
+	iop_chan->cleanup_watchdog.function = iop_adma_work_routine;
 	INIT_LIST_HEAD(_chan->chain);
 	INIT_LIST_HEAD(_chan->all_slots);
 	INIT_RCU_HEAD(_chan->common.rcu);
+	INIT_WORK(_chan->work, iop_adma_work_routine);
+
 	iop_chan->common.device = dma_dev;
 	list_add_tail(_chan->common.device_node, _dev->channels);
 
@@ -1443,6 +1447,10 @@ static struct platform_driver iop_adma_driver = {
 
 static int __init iop_adma_init (void)
 {
+	iop_adma_workqueue = create_workqueue("iop-adma");
+	if (!iop_adma_workqueue)
+		return -ENODEV;
+
 	/* it's currently unsafe to unload this module */
 	/* if forced, worst case is that rmmod hangs */
 	__unsafe(THIS_MODULE);
diff --git a/include/asm-arm/hardware/iop_adma.h b/include/asm-arm/hardware/iop_adma.h
index 8eb5990..7d8742b 100644
--- a/include/asm-arm/hardware/iop_adma.h
+++ b/include/asm-arm/hardware/iop_adma.h
@@ -67,7 +67,7 @@ struct iop_adma_chan {
 	struct list_head all_slots;
 	struct timer_list cleanup_watchdog;
 	int slots_allocated;
-	struct tasklet_struct irq_tasklet;
+	struct work_struct work;
 };
 
 /**


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Steven Rostedt
On Mon, 2007-06-25 at 18:00 -0600, Jonathan Corbet wrote:
> A couple of days ago I said:
> 
> > The cafe_ccic (OLPC) camera driver uses a tasklet to move frames out of
> > the DMA buffers in the streaming I/O path
> > 
> > Obviously some testing is called for here.  I will make an attempt to do
> > that testing
> 
> I've done that testing - I have an OLPC B3 unit running V2 of the
> tasklet->workqueue patch, and all seems well.  30 FPS to the display and
> no dropped frames.  The tasklets/0 process is running 3-5% CPU, in case
> that's interesting.  For whatever reason, I see about 3% *more* idle
> time when running just mplayer than I did without the patch.
> 
> Consider my minor qualms withdrawn, there doesn't seem to be any trouble
> in this area.

Jon, thanks a lot!

This is great news. I wonder if converting tasklets to work queues also
helps with other softirqs.  Before, softirqs could not preempt a
tasklet, since tasklets run as a softirq. With tasklets as work queues,
what's left as a softirq can now preempt tasklets. Perhaps this can even
help with performance.

-- Steve
  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Steven Rostedt
On Tue, 2007-06-26 at 01:36 +0200, Stefan Richter wrote:

> I can't speak for Kristian, nor do I have test equipment for isochronous
> applications, but I know that there are people out there which do data
> acquisition on as many FireWire buses as they can stuff boards into
> their boxes.  There are also FireWire cards with 2 or 4 controllers per
> board; and each controller can receive or transmit on several channels.
> 
> Depending on the buffering scheme, there may be one (?) interrupt per
> channel and isochronopus cycle.  Or an interrupt when the buffer is
> full.  Some application programmers use large buffers; others want small
> buffers.  An isochronous cycle is 125us.
> 
> Asynchronous I/O can even produce much higher interrupt rates.  I think
> IP over 1394 might indeed cause interrupt rates that are moderately
> higher than 1/125us during normal traffic.  SBP-2 ( = 1394 storage) is
> not as much affected because the bulk of data is transferred without
> interrupts.  So I suppose some eth1394 bandwidth tests with this patch
> series might make sense... alas I'm short of spare time.  (Would be
> interesting to see whether the old ohci1394 driver is blown to bits with
> the patch series; it's an old driver with who-knows what assumptions in
> there.)

Hi, any testing of the patches would be much appreciated. I don't have
access to any boxes that might have problems with running tasklets as
work queues. So if you know others with this equipment, and can pass the
patches off to them. It will hopefully help us know if this patch helps,
hurts, or just doesn't make a difference.

> 
> > Workqueue priority can be set, and your handler should 
> > probably be SCHED_FIFO.
> 
> Does this cooperate nicely with a SCHED_FIFO thread of a userspace data
> acquisition program or audio server or the like?

Well, if you put the prio of the work queue higher, it won't be any
different than a tasklet. A tasklet runs at an even higher priority than
any thread on the system.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Jonathan Corbet
A couple of days ago I said:

> The cafe_ccic (OLPC) camera driver uses a tasklet to move frames out of
> the DMA buffers in the streaming I/O path
> 
> Obviously some testing is called for here.  I will make an attempt to do
> that testing

I've done that testing - I have an OLPC B3 unit running V2 of the
tasklet->workqueue patch, and all seems well.  30 FPS to the display and
no dropped frames.  The tasklets/0 process is running 3-5% CPU, in case
that's interesting.  For whatever reason, I see about 3% *more* idle
time when running just mplayer than I did without the patch.

Consider my minor qualms withdrawn, there doesn't seem to be any trouble
in this area.

Thanks,

jon

Jonathan Corbet / LWN.net / [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Stefan Richter
Ingo Molnar wrote:
> regarding workqueues - would it be possible for you to test Steve's 
> patch and get us performance numbers? Do you have any test with tons of 
> tasklet activity that would definitely show the performance impact of 
> workqueues?

I can't speak for Kristian, nor do I have test equipment for isochronous
applications, but I know that there are people out there which do data
acquisition on as many FireWire buses as they can stuff boards into
their boxes.  There are also FireWire cards with 2 or 4 controllers per
board; and each controller can receive or transmit on several channels.

Depending on the buffering scheme, there may be one (?) interrupt per
channel and isochronopus cycle.  Or an interrupt when the buffer is
full.  Some application programmers use large buffers; others want small
buffers.  An isochronous cycle is 125us.

Asynchronous I/O can even produce much higher interrupt rates.  I think
IP over 1394 might indeed cause interrupt rates that are moderately
higher than 1/125us during normal traffic.  SBP-2 ( = 1394 storage) is
not as much affected because the bulk of data is transferred without
interrupts.  So I suppose some eth1394 bandwidth tests with this patch
series might make sense... alas I'm short of spare time.  (Would be
interesting to see whether the old ohci1394 driver is blown to bits with
the patch series; it's an old driver with who-knows what assumptions in
there.)

> Workqueue priority can be set, and your handler should 
> probably be SCHED_FIFO.

Does this cooperate nicely with a SCHED_FIFO thread of a userspace data
acquisition program or audio server or the like?
-- 
Stefan Richter
-=-=-=== -==- ==-=-
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Ingo Molnar

* Kristian H?gsberg <[EMAIL PROTECTED]> wrote:

> OK, here's a yell.  I'm using tasklets in the new firewire stack for 
> all interrupt handling.  All my interrupt handler does is read out the 
> event mask and schedule the appropriate tasklets.  Most of these 
> tasklets typically just end up scheduling work or completing a 
> completion, so moving it to a workqueue is pretty pointless.  In 
> particular, the isochronous DMA events must be handled with as little 
> latency as possible, so a workqueue in that code path would be pretty 
> bad.

regarding workqueues - would it be possible for you to test Steve's 
patch and get us performance numbers? Do you have any test with tons of 
tasklet activity that would definitely show the performance impact of 
workqueues? Workqueue priority can be set, and your handler should 
probably be SCHED_FIFO.

right now the tasklet-emulation workqueue is globally locked, etc., but 
if you use per-cpu workqueues then you'd probably get better scalability 
than tasklets. (yes, despite the extra scheduling (which only costs ~1 
microsecond) that the workqueue has to do.) Scheduling is pretty cheap, 
the basic overhead of servicing a single interrupt is often 10 times 
more expensive than a context-switch.

> I'm not strongly attached to tasklets, and it sounds like I got it 
> wrong and used the wrong delayed execution mechanism.  But that's just 
> another data point that suggests that there are too many of these.  I 
> guess I need to sit down and look into porting that to softirqs?

i'd like to stress that your approach is completely fine and valid - and 
if what we propose impacts performance negatively without any acceptable 
(and robust) replacement solution offered by us then our patch wont be 
done - simple as that. Softirqs could be an additional (performance) 
advantage on SMP systems with multiple firewire interrupt sources, but 
it would have to be measured too.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Kristian Høgsberg
On Mon, 2007-06-25 at 16:31 -0400, Steven Rostedt wrote:
> On Mon, 2007-06-25 at 16:07 -0400, Kristian Høgsberg wrote:
> 
> > > Maybe we should be looking at something like GENERIC_SOFTIRQ to run
> > > functions that a driver could add. But they would run only on the CPU
> > > that scheduled them, and do not guarantee non-reentrant as tasklets do
> > > today.
> > 
> > Sounds like this will fill the gap.  Of course, this won't reduce the
> > number of delayed-execution mechanisms available...
> 
> I disagree. Adding a generic softirq is not really adding another
> delayed-execution, it's just extending the sofitrq.  It would not have
> any different semantics as a normal softirq, except that it would be
> dynamic for modules to use.  A tasklet has different concepts than
> softirq. It adds non-reentrancy and that the tasklet function can run
> where it wasn't scheduled.

Hmm, yeah, true.  Ok, sounds useful, let me know when you have a patch.
I'll try porting the firewire stack and let you know how it works.

Just to make sure I got this right: softirqs will always be scheduled on
the irq handling CPU and if an interrupt schedules multiple softirqs
from one handler invocation, the softirqs will be executed in the order
they are scheduled?  And the reentrancy that softirqs does not protect
against (as opposed to tasklets) is the case where different CPUs handle
the interrupt and each schedule the same softirq for execution?

Kristian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Steven Rostedt
On Mon, 2007-06-25 at 22:50 +0200, Tilman Schmidt wrote:

> Ok, I'm reassured. I'll look into converting these to a work queue
> then, although I can't promise when I'll get around to it.
> 
> In fact, if these timing requirements are so easy to meet, perhaps
> it doesn't even need its own work queue, and just making each
> tasklet into a work item and queueing them to the global queue
> with schedule_work() would do? Or am I getting too reckless now?

I'm sure you probably wouldn't have a problem with using just
schedule_work. But that's shared and you don't know with what. A
function in the keventd work queue can call schedule (not recommended,
but with closed source drivers, you'd never know. schedule_work is
EXPORT_SYMBOL not EXPORT_SYMBOL_GPL).

So, if you convert it to work queues, I'd strongly recommend adding a
new instance.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Tilman Schmidt
Am 25.06.2007 19:06 schrieb Steven Rostedt:
> On Mon, 2007-06-25 at 18:50 +0200, Tilman Schmidt wrote:
> 
>> The Siemens Gigaset ISDN base driver uses tasklets in its isochronous
>> data paths. [...]
>> Does that qualify as performance sensitive for the purpose of this
>> discussion?
> 
> Actually, no.  16ms, even 8ms is an incredible amount of time. Unless
> you have a thread that is running at a higher priority than the thread
> that handles the work queue performing the task, you would have no
> problems making that deadline.

Ok, I'm reassured. I'll look into converting these to a work queue
then, although I can't promise when I'll get around to it.

In fact, if these timing requirements are so easy to meet, perhaps
it doesn't even need its own work queue, and just making each
tasklet into a work item and queueing them to the global queue
with schedule_work() would do? Or am I getting too reckless now?

Thanks,
Tilman

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Steven Rostedt
On Mon, 2007-06-25 at 16:07 -0400, Kristian Høgsberg wrote:

> > Maybe we should be looking at something like GENERIC_SOFTIRQ to run
> > functions that a driver could add. But they would run only on the CPU
> > that scheduled them, and do not guarantee non-reentrant as tasklets do
> > today.
> 
> Sounds like this will fill the gap.  Of course, this won't reduce the
> number of delayed-execution mechanisms available...

I disagree. Adding a generic softirq is not really adding another
delayed-execution, it's just extending the sofitrq.  It would not have
any different semantics as a normal softirq, except that it would be
dynamic for modules to use.  A tasklet has different concepts than
softirq. It adds non-reentrancy and that the tasklet function can run
where it wasn't scheduled.

Adding a generic softirq would keep the same concepts as a softirq but
just extend the users for it. Tasklets are probably not the best for
critical sections since it can be postponed longer if it was scheduled
on another CPU that is handling a bunch of other tasklets. So the
latency of running the tasklet is higher and you lose a cache advantage
by jumping to another CPU to execute.

Work queues already exist, so all we need to do to replace tasklets is
to extend softirqs for those critical cases that tasklets are used, and
replace the rest with work queues. By removing the non critical tasklets
to work queues will even lower the latency to execution of the more
critical tasklets.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Kristian Høgsberg
On Mon, 2007-06-25 at 15:11 -0400, Steven Rostedt wrote:
> On Mon, 2007-06-25 at 14:48 -0400, Kristian Høgsberg wrote:
> ...
> > However, I don't really understand how you can discuss a wholesale
> > replacing of tasklets with workqueues, given the very different
> > execution sematics of the two mechanisms.  I would think that others
> > have used tasklets for similar purposes as I have and moving that to
> > workqueues just has to break a bunch of stuff.  I don't know the various
> > places tasklets are used as well as other people in this thread, but I
> > think deprecating them and moving code to either softirqs or workqueues
> > on a case by case basis is a better approach.  That way we also avoid
> > the gross wrappers.
> 
> The gross wrappers were a perfect way to shed light on something that is
> overused, and should most likely be replaced.
> 
> Does your system need to have these functions that are in tasklets need
> to be non-reentrant?  I wonder how many "irq critical" functions used
> tasklets just because adding a softirq requires too much (no generic
> softirq code).  A tasklet is constrained to run on one CPU at a time,
> and it is not guaranteed to run on the CPU it was scheduled on.

When I started the firewire work, I wasn't aware that tasklets were
going away, but I knew that doing too much work in the interrupt handler
was frowned upon, for good reasons.  So I was looking at softirqs vs
taslkets, and since using softirqs means you have to go add yourself to
the big bitmask, I opted for tasklets.  The comment in interrupt.h
directly recommends this.  As it stands, the firewire stack does
actaully rely on the non-reentrancy of tasklets, but that's not a
deal-breaker, I can add the necessary locking.

> Perhaps it's time to add a new functionality while removing tasklets.
> Things that are ok to bounce around CPUs (like tasklets do) can most
> likely be replaced by a work queue. But these highly critical tasks
> probably would benefit from being a softirq.
> 
> Maybe we should be looking at something like GENERIC_SOFTIRQ to run
> functions that a driver could add. But they would run only on the CPU
> that scheduled them, and do not guarantee non-reentrant as tasklets do
> today.

Sounds like this will fill the gap.  Of course, this won't reduce the
number of delayed-execution mechanisms available...

Kristian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Stephen Hemminger
On Mon, 25 Jun 2007 18:50:03 +0200
Tilman Schmidt <[EMAIL PROTECTED]> wrote:

> Ingo Molnar <[EMAIL PROTECTED]> wrote:
> > so how about the following, different approach: anyone who has a tasklet 
> > in any performance-sensitive codepath, please yell now.


Getting rid of tasklet's may seem like a good idea. But doing it by changing
them all to workqueue's would have bad consequences for networking.

The first issue is that it would change the semantic assumptions in the
places where tasklets are used. Many places "know" that a tasklet runs in soft
irq context so extra locking is not needed.

The performance overhead of changing to workqueue's could also be disastrous for
some devices. There are 10G device drivers that use tasklets to handle transmit
completion.


Here is a more detailed list how network devices are using tasklet's

Receive packet handling: ifb, ppp, ipw2200, ipw2100
Receive buffer refill: acenic, s2io
Receive & Transmit: sc9031, sundance
Transmit buffer allocation: smc91x
Phy handling: skge

Sorry, if you are going to get rid of tasklets, you need to fix all the
network drivers first.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Steven Rostedt
On Mon, 2007-06-25 at 14:48 -0400, Kristian Høgsberg wrote:

> OK, here's a yell.  I'm using tasklets in the new firewire stack for all

Thanks for speaking up!

> interrupt handling.  All my interrupt handler does is read out the event
> mask and schedule the appropriate tasklets.  Most of these tasklets
> typically just end up scheduling work or completing a completion, so
> moving it to a workqueue is pretty pointless.  In particular, the
> isochronous DMA events must be handled with as little latency as
> possible, so a workqueue in that code path would be pretty bad.
> 
> I'm not strongly attached to tasklets, and it sounds like I got it wrong
> and used the wrong delayed execution mechanism.  But that's just another
> data point that suggests that there are too many of these.  I guess I
> need to sit down and look into porting that to softirqs?
> 
> However, I don't really understand how you can discuss a wholesale
> replacing of tasklets with workqueues, given the very different
> execution sematics of the two mechanisms.  I would think that others
> have used tasklets for similar purposes as I have and moving that to
> workqueues just has to break a bunch of stuff.  I don't know the various
> places tasklets are used as well as other people in this thread, but I
> think deprecating them and moving code to either softirqs or workqueues
> on a case by case basis is a better approach.  That way we also avoid
> the gross wrappers.

The gross wrappers were a perfect way to shed light on something that is
overused, and should most likely be replaced.

Does your system need to have these functions that are in tasklets need
to be non-reentrant?  I wonder how many "irq critical" functions used
tasklets just because adding a softirq requires too much (no generic
softirq code).  A tasklet is constrained to run on one CPU at a time,
and it is not guaranteed to run on the CPU it was scheduled on.

Perhaps it's time to add a new functionality while removing tasklets.
Things that are ok to bounce around CPUs (like tasklets do) can most
likely be replaced by a work queue. But these highly critical tasks
probably would benefit from being a softirq.

Maybe we should be looking at something like GENERIC_SOFTIRQ to run
functions that a driver could add. But they would run only on the CPU
that scheduled them, and do not guarantee non-reentrant as tasklets do
today.

I think I even found a bug in a driver that was trying to get around the
non-reentrancy of a tasklet (will post this soon).

It's looking like only a few tasklets have this critical requirement,
and are the candidates to move to a more generic softirq.  The rest of
the tasklets would be switched to work queues, and this gross wrapper of
mine can do that in the meantime so we can find those that should be
converted to a generic softirq.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Kristian Høgsberg
On Fri, 2007-06-22 at 23:59 +0200, Ingo Molnar wrote:
> so how about the following, different approach: anyone who has a tasklet 
> in any performance-sensitive codepath, please yell now. We'll also do a 
> proactive search for such places. We can convert those places to 
> softirqs, or move them back into hardirq context. Once this is done - 
> and i doubt it will go beyond 1-2 places - we can just mass-convert the 
> other 110 places to the lame but compatible solution of doing them in a 
> global thread context.

OK, here's a yell.  I'm using tasklets in the new firewire stack for all
interrupt handling.  All my interrupt handler does is read out the event
mask and schedule the appropriate tasklets.  Most of these tasklets
typically just end up scheduling work or completing a completion, so
moving it to a workqueue is pretty pointless.  In particular, the
isochronous DMA events must be handled with as little latency as
possible, so a workqueue in that code path would be pretty bad.

I'm not strongly attached to tasklets, and it sounds like I got it wrong
and used the wrong delayed execution mechanism.  But that's just another
data point that suggests that there are too many of these.  I guess I
need to sit down and look into porting that to softirqs?

However, I don't really understand how you can discuss a wholesale
replacing of tasklets with workqueues, given the very different
execution sematics of the two mechanisms.  I would think that others
have used tasklets for similar purposes as I have and moving that to
workqueues just has to break a bunch of stuff.  I don't know the various
places tasklets are used as well as other people in this thread, but I
think deprecating them and moving code to either softirqs or workqueues
on a case by case basis is a better approach.  That way we also avoid
the gross wrappers.

Kristian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Steven Rostedt
On Mon, 2007-06-25 at 18:50 +0200, Tilman Schmidt wrote:

> The Siemens Gigaset ISDN base driver uses tasklets in its isochronous
> data paths. These will be scheduled for each completion of an isochronous
> URB, or every 8 msec for each of the four isochronous pipes if both B
> channels are connected. The driver uses three URBs for each pipe, always
> maintaining two in flight while processing the third one. So the tasklet
> has to run within 16 ms from being scheduled in order to avoid packet
> loss (in the receive path) or data underrun (in the transmit path).
> 
> Does that qualify as performance sensitive for the purpose of this
> discussion?

Actually, no.  16ms, even 8ms is an incredible amount of time. Unless
you have a thread that is running at a higher priority than the thread
that handles the work queue performing the task, you would have no
problems making that deadline.  If you did miss the deadline without
having ridiculously high prio tasks, I would think that you would miss
your deadline with tasklets as well. Unless the large latency has to do
with preempt_disable, but that large of a latency would be IMHO a bug.

-- Steve
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Tilman Schmidt
Ingo Molnar <[EMAIL PROTECTED]> wrote:
> so how about the following, different approach: anyone who has a tasklet 
> in any performance-sensitive codepath, please yell now.

The Siemens Gigaset ISDN base driver uses tasklets in its isochronous
data paths. These will be scheduled for each completion of an isochronous
URB, or every 8 msec for each of the four isochronous pipes if both B
channels are connected. The driver uses three URBs for each pipe, always
maintaining two in flight while processing the third one. So the tasklet
has to run within 16 ms from being scheduled in order to avoid packet
loss (in the receive path) or data underrun (in the transmit path).

Does that qualify as performance sensitive for the purpose of this
discussion?

Thanks,
Tilman

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Tilman Schmidt
Ingo Molnar [EMAIL PROTECTED] wrote:
 so how about the following, different approach: anyone who has a tasklet 
 in any performance-sensitive codepath, please yell now.

The Siemens Gigaset ISDN base driver uses tasklets in its isochronous
data paths. These will be scheduled for each completion of an isochronous
URB, or every 8 msec for each of the four isochronous pipes if both B
channels are connected. The driver uses three URBs for each pipe, always
maintaining two in flight while processing the third one. So the tasklet
has to run within 16 ms from being scheduled in order to avoid packet
loss (in the receive path) or data underrun (in the transmit path).

Does that qualify as performance sensitive for the purpose of this
discussion?

Thanks,
Tilman

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-25 Thread Steven Rostedt
On Mon, 2007-06-25 at 18:50 +0200, Tilman Schmidt wrote:

 The Siemens Gigaset ISDN base driver uses tasklets in its isochronous
 data paths. These will be scheduled for each completion of an isochronous
 URB, or every 8 msec for each of the four isochronous pipes if both B
 channels are connected. The driver uses three URBs for each pipe, always
 maintaining two in flight while processing the third one. So the tasklet
 has to run within 16 ms from being scheduled in order to avoid packet
 loss (in the receive path) or data underrun (in the transmit path).
 
 Does that qualify as performance sensitive for the purpose of this
 discussion?

Actually, no.  16ms, even 8ms is an incredible amount of time. Unless
you have a thread that is running at a higher priority than the thread
that handles the work queue performing the task, you would have no
problems making that deadline.  If you did miss the deadline without
having ridiculously high prio tasks, I would think that you would miss
your deadline with tasklets as well. Unless the large latency has to do
with preempt_disable, but that large of a latency would be IMHO a bug.

-- Steve
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >