Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-02 Thread Evgeniy Polyakov
On Fri, 1 Apr 2005 11:11:34 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> >  CBUS was designed to provide very fast _insert_ operation.
> 
> I just don't see any point in doing this.  If the aggregate system load is
> unchanged (possibly increased) then why is cbus desirable?
> 
> The only advantage I can see is that it permits irq-context messaging.

As one and not main advantage.

> >  It is needed not only for fork() accounting, but for any
> >  fast path, when we do not want to slow process down just to
> >  send notification about it, instead we can create such a notification,
> >  and deliver it later.
> 
> And when we deliver it later, we slow processes down!
> 
> >  Why do we defer all work from HW IRQ into BH context?
> 
> To enable more interrupts to come in while we're doing that work.

Exactly(!) for this reason CBUS was designed and implemented - 
to allow more inserts to arrive and process _them_, thus
do not slow down fast path, but not to make the whole delivering faster.
 
> >  Because while we are in HW IRQ we can not perform other tasks,
> >  so with connector and CBUS we have the same situation.
> 
> I agree that being able to send from irq context would be handy.  If we had
> any code which wants to do that, and at present we do not.

It is not main advantage.

> But I fail to see any advantage in moving work out of fork() context, into
> kthread context and incurring additional context switch overhead.  Apart
> from conceivably better CPU cache utilisation.

And not event that is the main reason.

> The fact that deferred delivery can cause an arbitrarily large backlog and,
> ultimately, deliberate message droppage or oom significantly impacts the
> reliability of the whole scheme and means that well-designed code must use
> synchronous delivery _anyway_.  The deferred delivery should only be used
> from IRQ context for low-bandwidth delivery rates.

Not at all, OOM can not happen with limited queue length.
CBUS will fallback to the direct cn_netlink_send() in that case.

Cache utilisation and ability to send events from any context are
significant issues, but not they are the most important reasons.
Ability to not slow down fast pathes - that is the main reason,
even with higher delivery price.
Concider situation when one may want to have notification 
of each new write system call - let's say without such
notification it took about 1 second to write one page from userspace,
now with notification sending, which is not so fast, it will take
1.5 seconds, but with CBUS write() still costs 1 second plus
later, when we do not care about writing performance and scheduler
decides to run CBUS thread, those notifications will take additional
0.7 seconds instead of 0.5 and will be delivered.
But if one requires not delayed fact of the notification, but
almost immediate event - one can still use direct connector's methods.

> >  While we are sending a low priority event, we stops actuall work,
> >  which is not acceptible in many situations.
> 
> Have you tested the average forking rate on a uniprocessor machine with
> this patch?  If the forking continues for (say) one second, does the patch
> provide any performance benefit?

Yes, as I said I run CBUS test with non-SMP machine too 
[it is still SMP with nosmp kernel option and SMP kernel].

On my the nearest test SMP machine it took ~930 - 950 msec average for fork
for both processors, so it is about 1850-1900 per processor 
[both with CBUS with fork connector and without fork connector compiled].
With CBUS and 1 CPU fork() + exit() time takes about 1780 - 1800 msec.

I can rerun test on Monday on diferent [and faster] machines, 
if you want.

Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-02 Thread Evgeniy Polyakov
On Fri, 1 Apr 2005 11:11:34 -0800
Andrew Morton [EMAIL PROTECTED] wrote:

   CBUS was designed to provide very fast _insert_ operation.
 
 I just don't see any point in doing this.  If the aggregate system load is
 unchanged (possibly increased) then why is cbus desirable?
 
 The only advantage I can see is that it permits irq-context messaging.

As one and not main advantage.

   It is needed not only for fork() accounting, but for any
   fast path, when we do not want to slow process down just to
   send notification about it, instead we can create such a notification,
   and deliver it later.
 
 And when we deliver it later, we slow processes down!
 
   Why do we defer all work from HW IRQ into BH context?
 
 To enable more interrupts to come in while we're doing that work.

Exactly(!) for this reason CBUS was designed and implemented - 
to allow more inserts to arrive and process _them_, thus
do not slow down fast path, but not to make the whole delivering faster.
 
   Because while we are in HW IRQ we can not perform other tasks,
   so with connector and CBUS we have the same situation.
 
 I agree that being able to send from irq context would be handy.  If we had
 any code which wants to do that, and at present we do not.

It is not main advantage.

 But I fail to see any advantage in moving work out of fork() context, into
 kthread context and incurring additional context switch overhead.  Apart
 from conceivably better CPU cache utilisation.

And not event that is the main reason.

 The fact that deferred delivery can cause an arbitrarily large backlog and,
 ultimately, deliberate message droppage or oom significantly impacts the
 reliability of the whole scheme and means that well-designed code must use
 synchronous delivery _anyway_.  The deferred delivery should only be used
 from IRQ context for low-bandwidth delivery rates.

Not at all, OOM can not happen with limited queue length.
CBUS will fallback to the direct cn_netlink_send() in that case.

Cache utilisation and ability to send events from any context are
significant issues, but not they are the most important reasons.
Ability to not slow down fast pathes - that is the main reason,
even with higher delivery price.
Concider situation when one may want to have notification 
of each new write system call - let's say without such
notification it took about 1 second to write one page from userspace,
now with notification sending, which is not so fast, it will take
1.5 seconds, but with CBUS write() still costs 1 second plus
later, when we do not care about writing performance and scheduler
decides to run CBUS thread, those notifications will take additional
0.7 seconds instead of 0.5 and will be delivered.
But if one requires not delayed fact of the notification, but
almost immediate event - one can still use direct connector's methods.

   While we are sending a low priority event, we stops actuall work,
   which is not acceptible in many situations.
 
 Have you tested the average forking rate on a uniprocessor machine with
 this patch?  If the forking continues for (say) one second, does the patch
 provide any performance benefit?

Yes, as I said I run CBUS test with non-SMP machine too 
[it is still SMP with nosmp kernel option and SMP kernel].

On my the nearest test SMP machine it took ~930 - 950 msec average for fork
for both processors, so it is about 1850-1900 per processor 
[both with CBUS with fork connector and without fork connector compiled].
With CBUS and 1 CPU fork() + exit() time takes about 1780 - 1800 msec.

I can rerun test on Monday on diferent [and faster] machines, 
if you want.

Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
>
> Andrew, CBUS is not intended to be faster than connector itself, 
>  it is just not possible, since it calls connector's methods
>  with some preparation, which takes time.

Right - it's simply transferring work from one place to another.

>  CBUS was designed to provide very fast _insert_ operation.

I just don't see any point in doing this.  If the aggregate system load is
unchanged (possibly increased) then why is cbus desirable?

The only advantage I can see is that it permits irq-context messaging.

>  It is needed not only for fork() accounting, but for any
>  fast path, when we do not want to slow process down just to
>  send notification about it, instead we can create such a notification,
>  and deliver it later.

And when we deliver it later, we slow processes down!

>  Why do we defer all work from HW IRQ into BH context?

To enable more interrupts to come in while we're doing that work.

>  Because while we are in HW IRQ we can not perform other tasks,
>  so with connector and CBUS we have the same situation.

I agree that being able to send from irq context would be handy.  If we had
any code which wants to do that, and at present we do not.

But I fail to see any advantage in moving work out of fork() context, into
kthread context and incurring additional context switch overhead.  Apart
from conceivably better CPU cache utilisation.

The fact that deferred delivery can cause an arbitrarily large backlog and,
ultimately, deliberate message droppage or oom significantly impacts the
reliability of the whole scheme and means that well-designed code must use
synchronous delivery _anyway_.  The deferred delivery should only be used
from IRQ context for low-bandwidth delivery rates.

>  While we are sending a low priority event, we stops actuall work,
>  which is not acceptible in many situations.

Have you tested the average forking rate on a uniprocessor machine with
this patch?  If the forking continues for (say) one second, does the patch
provide any performance benefit?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Evgeniy Polyakov
On Fri, 2005-04-01 at 03:20 -0800, Andrew Morton wrote: 
> Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >
> > On Fri, 2005-04-01 at 02:30 -0800, Andrew Morton wrote:
> > > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > > keventd does very hard jobs on some of my test machines which 
> > > > > > for example route big amount of traffic.
> > > > > 
> > > > > As I said - that's going to cause _your_ kernel thread to be slowed 
> > > > > down as
> > > > > well.
> > > > 
> > > > Yes, but it does not solve peak performance issues - all scheduled
> > > > jobs can run one after another which will decrease insert performance.
> > > 
> > > Well the keventd handler would simply process all currently-queued
> > > messages.  It's not as if you'd only process one event per keventd 
> > > callout.
> > 
> > But that will hurt insert performance.
> 
> Why?  All it involves is one schedule_work() on the insert side.  And that
> will involve just a single test_bit() in the great majority of cases
> because the work will already be pending.

Here is example:
schedule_work();
keventd->cbus_process(), which has 2 variants:
1. process all pending events.
2. process only number of them.

In the first case we will hurt very noticeble insert performance,
because actual delivery takes some time, so one process will
take time_for_one_delivery * number_of_events_to_be_delivered, 
since in a peak number_of_events_to_be_delivered may be very high,
it will take too much time to flush the event queue and deliver all 
messages.

In the second case we finish our work in predictible time, 
but it can not help us with the keventd, which may [and is] cought
new schedule_work(), and thus will run cbus_process() again
without time window after previous delivering.

That time window is _very_ helpfull for the insert performance
and thus low latencies.

> > Processing all messages without splitting them up into pieces noticebly
> > slows insert operation down.
> 
> What does "splitting them up into pieces" mean?  They're single messages
> end-to-end.  You've been discussing batching of messages, which is the
> opposite?

There is a queue of single event messages, if we process them all in one
shot, 
then there is no time to insert new event [each CPU is running keventd
thread]
untill all previous are sent.
So we split that queue into pieces of [currently] 10 messages in each, 
and send only them, then we sleep for some time, in which new inserts 
can be completed, then process next 10...

> > > (please remind me why cbus exists, btw.  What functionality does it offer
> > > which the connector code doesn't?)
> > 
> > The main goal of CBUS is insert operation performance.
> > Anyone who wants to have maximum delivery speed should use direct
> > connector's
> > methods instead.
> 
> Delivery speed is not the same thing as insertion speed.  All the insertion
> does is to place the event onto an internal queue.  We still need to do a
> context switch and send the thing.  Provided there's a reasonable amount of
> batching, the CPU consumption will be the same in both cases.

Sending is slow [in comparison to insertion], so it can be deferred 
for better latency.
Context switch and actuall sending will happen after main function(like
fork(), 
or any other in fast path) is already finished, so we do not hurt it's
performance.

> > > > > Introducing an up-to-ten millisecond latency seems a lot worse than 
> > > > > some
> > > > > reduction in peak bandwidth - it's not as if pumping 10 
> > > > > events/sec is a
> > > > > likely use case.  Using prepare_to_wait/finish_wait will provide some
> > > > > speedup in SMP environments due to reduced cacheline transfers.
> > > > 
> > > > It is a question actually...
> > > > If we allow peak processing, then we _definitely_ will have insert 
> > > > performance degradation, it was observed in my tests.
> > > 
> > > I don't understand your terminology.  What is "peak processing" and how
> > > does it differ from maximum insertion rate?
> > > 
> > > > The main goal of CBUS was exactly insert speed
> > > 
> > > Why?
> > 
> > To allow connector usage in a really fast pathes.
> > If one cares about _delivery_ speed then one should use cn_netlink_send
> > ().
> 
> We care about both insertion and delivery!  Sure, simply sticking the event
> onto a queue is faster than delivering it.  But we still need to deliver it
> sometime so there's no aggregate gain.  Still confused.

If one needs low latency events in peaks - use CBUS, it will smooth 
shapes, since actual delivering will be postponed.

There is no _aggregate_ gain - only immediate low latenciy with work
deferring.

> > > > - so
> > > > it somehow must smooth shape performance peaks, and thus
> > > > above budget was introdyced.
> > > 
> > > An up-to-ten-millisecond latency between the kernel-side queueing of a
> > > message and the delivery of that message to userspace sounds like an
> > > awfully bad restriction to me.  Even one millisecond will 

Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
>
> On Fri, 2005-04-01 at 02:30 -0800, Andrew Morton wrote:
> > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> > >
> > > > > keventd does very hard jobs on some of my test machines which 
> > > > > for example route big amount of traffic.
> > > > 
> > > > As I said - that's going to cause _your_ kernel thread to be slowed 
> > > > down as
> > > > well.
> > > 
> > > Yes, but it does not solve peak performance issues - all scheduled
> > > jobs can run one after another which will decrease insert performance.
> > 
> > Well the keventd handler would simply process all currently-queued
> > messages.  It's not as if you'd only process one event per keventd callout.
> 
> But that will hurt insert performance.

Why?  All it involves is one schedule_work() on the insert side.  And that
will involve just a single test_bit() in the great majority of cases
because the work will already be pending.

> Processing all messages without splitting them up into pieces noticebly
> slows insert operation down.

What does "splitting them up into pieces" mean?  They're single messages
end-to-end.  You've been discussing batching of messages, which is the
opposite?

> > (please remind me why cbus exists, btw.  What functionality does it offer
> > which the connector code doesn't?)
> 
> The main goal of CBUS is insert operation performance.
> Anyone who wants to have maximum delivery speed should use direct
> connector's
> methods instead.

Delivery speed is not the same thing as insertion speed.  All the insertion
does is to place the event onto an internal queue.  We still need to do a
context switch and send the thing.  Provided there's a reasonable amount of
batching, the CPU consumption will be the same in both cases.

> > > > Introducing an up-to-ten millisecond latency seems a lot worse than some
> > > > reduction in peak bandwidth - it's not as if pumping 10 events/sec 
> > > > is a
> > > > likely use case.  Using prepare_to_wait/finish_wait will provide some
> > > > speedup in SMP environments due to reduced cacheline transfers.
> > > 
> > > It is a question actually...
> > > If we allow peak processing, then we _definitely_ will have insert 
> > > performance degradation, it was observed in my tests.
> > 
> > I don't understand your terminology.  What is "peak processing" and how
> > does it differ from maximum insertion rate?
> > 
> > > The main goal of CBUS was exactly insert speed
> > 
> > Why?
> 
> To allow connector usage in a really fast pathes.
> If one cares about _delivery_ speed then one should use cn_netlink_send
> ().

We care about both insertion and delivery!  Sure, simply sticking the event
onto a queue is faster than delivering it.  But we still need to deliver it
sometime so there's no aggregate gain.  Still confused.

> > > - so
> > > it somehow must smooth shape performance peaks, and thus
> > > above budget was introdyced.
> > 
> > An up-to-ten-millisecond latency between the kernel-side queueing of a
> > message and the delivery of that message to userspace sounds like an
> > awfully bad restriction to me.  Even one millisecond will have impact in
> > some scenarios.
> 
> If you care about delivery speed stronger than insertion, 
> then you do not need to use CBUS, but cn_netlink_send() instead.
> 
> I will test smaller values, but doubt it will have better insert
> performance.

I fail to see why it is desirable to defer the delivery.  The delivery has
to happen some time, and we've now incurred additional context switch
overhead.

> Concider fork() connector - it is better for userspace
> to return from system call as soon as possible, while information
> delivering
> about that event will be delayed.

Why?  The amount of CPU required to deliver the information regarding a
fork is the same in either case.  Probably more, in the deferred case.

> Noone says that queueing is done in much higher rate then delivering, 
> it only shapes shart peaks when it is unacceptably to wait untill
> delivery
> is finished.

Maybe cbus gave better lmbench numbers because the forking was happening on
one CPU and the event delivery was pushed across to the other one.  OK for
a microbenchmark, but dubious for the real world.

I can see that there might be CPU consumption improvements due to the
batching of work and hence more effective CPU cache utilisation.  That
would need to be carefully measured, and you'd get the same effect by
simply doing a list_add+schedule_work for each insertion.


> > > > > I did not try that case with the keventd but with one kernel thread 
> > > > > it was tested and showed worse performance.
> > > > 
> > > > But your current implementation has only one kernel thread?
> > > 
> > > It has a budget and timeout between each bundle processing.
> > > keventd does not allow to create such a timeout between each bundle
> > > processing.
> > 
> > Yes, there's batching there.  But I don't understand why the ability to
> > internally queue events at a 

Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Evgeniy Polyakov
On Fri, 2005-04-01 at 02:30 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >
> > > > keventd does very hard jobs on some of my test machines which 
> > > > for example route big amount of traffic.
> > > 
> > > As I said - that's going to cause _your_ kernel thread to be slowed down 
> > > as
> > > well.
> > 
> > Yes, but it does not solve peak performance issues - all scheduled
> > jobs can run one after another which will decrease insert performance.
> 
> Well the keventd handler would simply process all currently-queued
> messages.  It's not as if you'd only process one event per keventd callout.

But that will hurt insert performance.
Processing all messages without splitting them up into pieces noticebly
slows insert operation down.

> > > I mean, it's just a choice between two ways of multiplexing the CPU.  One
> > > is via a context switch in schedule() and the other is via list traversal
> > > in run_workqueue().  The latter will be faster.
> > 
> > But in case of separate thread one can control execution process,
> > if it will be called from work queue then insert requests 
> > can appear one after another in a very interval,
> > so theirs processing will hurt insert performance.
> 
> Why does insert performance matter so much?  These things still have to be
> sent up to userspace.
> 
> (please remind me why cbus exists, btw.  What functionality does it offer
> which the connector code doesn't?)

The main goal of CBUS is insert operation performance.
Anyone who wants to have maximum delivery speed should use direct
connector's
methods instead.

> > > > > Plus keventd is thread-per-cpu and quite possibly would be faster.
> > > > 
> > > > I experimented with several usage cases for CBUS and it was proven 
> > > > to be the fastest case when only one sending thread exists which manages
> > > > only very limited amount of messages at a time [like 10 in CBUS
> > > > currently]
> > > 
> > > Maybe that's because the cbus data structures are insufficiently
> > > percpuified.  On really big machines that single kernel thread will be a
> > > big bottleneck.
> > 
> > It is not because of messages itself, but becouse of it's peaks,
> > if there is a peak then above mechanism will smooth it into
> > several pieces [for example 10 in each bundle, that value should be
> > changeable in run-time, will place it into TODO],
> > with keventd there is no guarantee that next peak will be processed
> > not just after the current one, but after some timeout.
> 
> keventd should process all the currently-queued messages.  If messages are
> being queued quickly then that will be a lot of messages on each keventd
> callout.

But for maximum _insert_ operatios, none should process _all_ messages
at once, 
even if there are many of them.
One needs to smooth peak message number into pieces and process each
piece after previous ane after some timeout.

> > > Introducing an up-to-ten millisecond latency seems a lot worse than some
> > > reduction in peak bandwidth - it's not as if pumping 10 events/sec is 
> > > a
> > > likely use case.  Using prepare_to_wait/finish_wait will provide some
> > > speedup in SMP environments due to reduced cacheline transfers.
> > 
> > It is a question actually...
> > If we allow peak processing, then we _definitely_ will have insert 
> > performance degradation, it was observed in my tests.
> 
> I don't understand your terminology.  What is "peak processing" and how
> does it differ from maximum insertion rate?
> 
> > The main goal of CBUS was exactly insert speed
> 
> Why?

To allow connector usage in a really fast pathes.
If one cares about _delivery_ speed then one should use cn_netlink_send
().

> > - so
> > it somehow must smooth shape performance peaks, and thus
> > above budget was introdyced.
> 
> An up-to-ten-millisecond latency between the kernel-side queueing of a
> message and the delivery of that message to userspace sounds like an
> awfully bad restriction to me.  Even one millisecond will have impact in
> some scenarios.

If you care about delivery speed stronger than insertion, 
then you do not need to use CBUS, but cn_netlink_send() instead.

I will test smaller values, but doubt it will have better insert
performance.

> > It is similar to NAPI in some abstract way, but with different aims - 
> > NAPI for speed improovement, but here we have peak smootheness.
> > 
> > > > If too many deferred insert works will be called simultaneously
> > > > [which may happen with keventd] it will slow down insert operations
> > > > noticeably.
> > > 
> > > What is a "deferred insert work"?  Insertion is always synchronous?
> > 
> > Insert is synchronous in one CPU, but actuall message delivering is
> > deferred.
> 
> OK, so why does it matter that "If too many deferred insert works will be
> called [which may happen with keventd] it will slow down insert operations
> noticeably"?
> 
> There's no point in being able to internally queue messages at a higher
> 

Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
>
> > > keventd does very hard jobs on some of my test machines which 
> > > for example route big amount of traffic.
> > 
> > As I said - that's going to cause _your_ kernel thread to be slowed down as
> > well.
> 
> Yes, but it does not solve peak performance issues - all scheduled
> jobs can run one after another which will decrease insert performance.

Well the keventd handler would simply process all currently-queued
messages.  It's not as if you'd only process one event per keventd callout.

> > I mean, it's just a choice between two ways of multiplexing the CPU.  One
> > is via a context switch in schedule() and the other is via list traversal
> > in run_workqueue().  The latter will be faster.
> 
> But in case of separate thread one can control execution process,
> if it will be called from work queue then insert requests 
> can appear one after another in a very interval,
> so theirs processing will hurt insert performance.

Why does insert performance matter so much?  These things still have to be
sent up to userspace.

(please remind me why cbus exists, btw.  What functionality does it offer
which the connector code doesn't?)

> > > > Plus keventd is thread-per-cpu and quite possibly would be faster.
> > > 
> > > I experimented with several usage cases for CBUS and it was proven 
> > > to be the fastest case when only one sending thread exists which manages
> > > only very limited amount of messages at a time [like 10 in CBUS
> > > currently]
> > 
> > Maybe that's because the cbus data structures are insufficiently
> > percpuified.  On really big machines that single kernel thread will be a
> > big bottleneck.
> 
> It is not because of messages itself, but becouse of it's peaks,
> if there is a peak then above mechanism will smooth it into
> several pieces [for example 10 in each bundle, that value should be
> changeable in run-time, will place it into TODO],
> with keventd there is no guarantee that next peak will be processed
> not just after the current one, but after some timeout.

keventd should process all the currently-queued messages.  If messages are
being queued quickly then that will be a lot of messages on each keventd
callout.

> > Introducing an up-to-ten millisecond latency seems a lot worse than some
> > reduction in peak bandwidth - it's not as if pumping 10 events/sec is a
> > likely use case.  Using prepare_to_wait/finish_wait will provide some
> > speedup in SMP environments due to reduced cacheline transfers.
> 
> It is a question actually...
> If we allow peak processing, then we _definitely_ will have insert 
> performance degradation, it was observed in my tests.

I don't understand your terminology.  What is "peak processing" and how
does it differ from maximum insertion rate?

> The main goal of CBUS was exactly insert speed

Why?

> - so
> it somehow must smooth shape performance peaks, and thus
> above budget was introdyced.

An up-to-ten-millisecond latency between the kernel-side queueing of a
message and the delivery of that message to userspace sounds like an
awfully bad restriction to me.  Even one millisecond will have impact in
some scenarios.

> It is similar to NAPI in some abstract way, but with different aims - 
> NAPI for speed improovement, but here we have peak smootheness.
> 
> > > If too many deferred insert works will be called simultaneously
> > > [which may happen with keventd] it will slow down insert operations
> > > noticeably.
> > 
> > What is a "deferred insert work"?  Insertion is always synchronous?
> 
> Insert is synchronous in one CPU, but actuall message delivering is
> deferred.

OK, so why does it matter that "If too many deferred insert works will be
called [which may happen with keventd] it will slow down insert operations
noticeably"?

There's no point in being able to internally queue messages at a higher
frequency than we can deliver them to userspace.  Confused.

> > > I did not try that case with the keventd but with one kernel thread 
> > > it was tested and showed worse performance.
> > 
> > But your current implementation has only one kernel thread?
> 
> It has a budget and timeout between each bundle processing.
> keventd does not allow to create such a timeout between each bundle
> processing.

Yes, there's batching there.  But I don't understand why the ability to
internally queue events at a high rate is so much more important than the
latency which that batching will introduce.

(And keventd _does_ allow such batching.  schedule_delayed_work()).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Evgeniy Polyakov
On Fri, 2005-04-01 at 01:25 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >
> > On Thu, 2005-03-31 at 23:59 -0800, Andrew Morton wrote:
> > > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> > > >
> > > > On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
> > > > > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > > > +static int cbus_event_thread(void *data)
> > > > > >  > > +{
> > > > > >  > > +  int i, non_empty = 0, empty = 0;
> > > > > >  > > +  struct cbus_event_container *c;
> > > > > >  > > +
> > > > > >  > > +  daemonize(cbus_name);
> > > > > >  > > +  allow_signal(SIGTERM);
> > > > > >  > > +  set_user_nice(current, 19);
> > > > > >  > 
> > > > > >  > Please use the kthread api for managing this thread.
> > > > > >  > 
> > > > > >  > Is a new kernel thread needed?
> > > > > > 
> > > > > >  Logic behind cbus is following: 
> > > > > >  1. make insert operation return as soon as possible,
> > > > > >  2. deferring actual message delivering to the safe time
> > > > > > 
> > > > > >  That thread does second point.
> > > > > 
> > > > > But does it need a new thread rather than using the existing keventd?
> > > > 
> > > > Yes, it is much cleaner [especially from performance tuning point] 
> > > > to use own kernel thread than pospone all work to the queued work.
> > > > 
> > > 
> > > Why?  Unless keventd is off doing something else (rare), it should be
> > > exactly equivalent.  And if keventd _is_ off doing something else then 
> > > that
> > > will slow down this kernel thread too, of course.
> > 
> > keventd does very hard jobs on some of my test machines which 
> > for example route big amount of traffic.
> 
> As I said - that's going to cause _your_ kernel thread to be slowed down as
> well.

Yes, but it does not solve peak performance issues - all scheduled
jobs can run one after another which will decrease insert performance.

> I mean, it's just a choice between two ways of multiplexing the CPU.  One
> is via a context switch in schedule() and the other is via list traversal
> in run_workqueue().  The latter will be faster.

But in case of separate thread one can control execution process,
if it will be called from work queue then insert requests 
can appear one after another in a very interval,
so theirs processing will hurt insert performance.

> > > Plus keventd is thread-per-cpu and quite possibly would be faster.
> > 
> > I experimented with several usage cases for CBUS and it was proven 
> > to be the fastest case when only one sending thread exists which manages
> > only very limited amount of messages at a time [like 10 in CBUS
> > currently]
> 
> Maybe that's because the cbus data structures are insufficiently
> percpuified.  On really big machines that single kernel thread will be a
> big bottleneck.

It is not because of messages itself, but becouse of it's peaks,
if there is a peak then above mechanism will smooth it into
several pieces [for example 10 in each bundle, that value should be
changeable in run-time, will place it into TODO],
with keventd there is no guarantee that next peak will be processed
not just after the current one, but after some timeout.

> > without direct awakening [that is why wake_up() is commented in
> > cbus_insert()].
> 
> You mean the
> 
>   interruptible_sleep_on_timeout(_wait_queue, 10);
> 
> ?  (That should be HZ/100, btw).
> 
> That seems a bit kludgy - someone could queue 1 messages in that time,
> although they'll probably run out of memory first, if it's doing
> GFP_ATOMIC.

GFP_ATOMIC issues will be resolved first.

> Introducing an up-to-ten millisecond latency seems a lot worse than some
> reduction in peak bandwidth - it's not as if pumping 10 events/sec is a
> likely use case.  Using prepare_to_wait/finish_wait will provide some
> speedup in SMP environments due to reduced cacheline transfers.

It is a question actually...
If we allow peak processing, then we _definitely_ will have insert 
performance degradation, it was observed in my tests.
The main goal of CBUS was exactly insert speed - so
it somehow must smooth shape performance peaks, and thus
above budget was introdyced.
It is similar to NAPI in some abstract way, but with different aims - 
NAPI for speed improovement, but here we have peak smootheness.

> > If too many deferred insert works will be called simultaneously
> > [which may happen with keventd] it will slow down insert operations
> > noticeably.
> 
> What is a "deferred insert work"?  Insertion is always synchronous?

Insert is synchronous in one CPU, but actuall message delivering is
deferred.

> > I did not try that case with the keventd but with one kernel thread 
> > it was tested and showed worse performance.
> 
> But your current implementation has only one kernel thread?

It has a budget and timeout between each bundle processing.
keventd does not allow to create such a timeout between each bundle
processing.

-- 
Evgeniy 

Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
>
> On Thu, 2005-03-31 at 23:59 -0800, Andrew Morton wrote:
> > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> > >
> > > On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
> > > > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > > +static int cbus_event_thread(void *data)
> > > > >  > > +{
> > > > >  > > +int i, non_empty = 0, empty = 0;
> > > > >  > > +struct cbus_event_container *c;
> > > > >  > > +
> > > > >  > > +daemonize(cbus_name);
> > > > >  > > +allow_signal(SIGTERM);
> > > > >  > > +set_user_nice(current, 19);
> > > > >  > 
> > > > >  > Please use the kthread api for managing this thread.
> > > > >  > 
> > > > >  > Is a new kernel thread needed?
> > > > > 
> > > > >  Logic behind cbus is following: 
> > > > >  1. make insert operation return as soon as possible,
> > > > >  2. deferring actual message delivering to the safe time
> > > > > 
> > > > >  That thread does second point.
> > > > 
> > > > But does it need a new thread rather than using the existing keventd?
> > > 
> > > Yes, it is much cleaner [especially from performance tuning point] 
> > > to use own kernel thread than pospone all work to the queued work.
> > > 
> > 
> > Why?  Unless keventd is off doing something else (rare), it should be
> > exactly equivalent.  And if keventd _is_ off doing something else then that
> > will slow down this kernel thread too, of course.
> 
> keventd does very hard jobs on some of my test machines which 
> for example route big amount of traffic.

As I said - that's going to cause _your_ kernel thread to be slowed down as
well.

I mean, it's just a choice between two ways of multiplexing the CPU.  One
is via a context switch in schedule() and the other is via list traversal
in run_workqueue().  The latter will be faster.

> > Plus keventd is thread-per-cpu and quite possibly would be faster.
> 
> I experimented with several usage cases for CBUS and it was proven 
> to be the fastest case when only one sending thread exists which manages
> only very limited amount of messages at a time [like 10 in CBUS
> currently]

Maybe that's because the cbus data structures are insufficiently
percpuified.  On really big machines that single kernel thread will be a
big bottleneck.

> without direct awakening [that is why wake_up() is commented in
> cbus_insert()].

You mean the

interruptible_sleep_on_timeout(_wait_queue, 10);

?  (That should be HZ/100, btw).

That seems a bit kludgy - someone could queue 1 messages in that time,
although they'll probably run out of memory first, if it's doing
GFP_ATOMIC.

Introducing an up-to-ten millisecond latency seems a lot worse than some
reduction in peak bandwidth - it's not as if pumping 10 events/sec is a
likely use case.  Using prepare_to_wait/finish_wait will provide some
speedup in SMP environments due to reduced cacheline transfers.

> If too many deferred insert works will be called simultaneously
> [which may happen with keventd] it will slow down insert operations
> noticeably.

What is a "deferred insert work"?  Insertion is always synchronous?

> I did not try that case with the keventd but with one kernel thread 
> it was tested and showed worse performance.

But your current implementation has only one kernel thread?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Evgeniy Polyakov
On Thu, 2005-03-31 at 23:59 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >
> > On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
> > > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > > +static int cbus_event_thread(void *data)
> > > >  > > +{
> > > >  > > +  int i, non_empty = 0, empty = 0;
> > > >  > > +  struct cbus_event_container *c;
> > > >  > > +
> > > >  > > +  daemonize(cbus_name);
> > > >  > > +  allow_signal(SIGTERM);
> > > >  > > +  set_user_nice(current, 19);
> > > >  > 
> > > >  > Please use the kthread api for managing this thread.
> > > >  > 
> > > >  > Is a new kernel thread needed?
> > > > 
> > > >  Logic behind cbus is following: 
> > > >  1. make insert operation return as soon as possible,
> > > >  2. deferring actual message delivering to the safe time
> > > > 
> > > >  That thread does second point.
> > > 
> > > But does it need a new thread rather than using the existing keventd?
> > 
> > Yes, it is much cleaner [especially from performance tuning point] 
> > to use own kernel thread than pospone all work to the queued work.
> > 
> 
> Why?  Unless keventd is off doing something else (rare), it should be
> exactly equivalent.  And if keventd _is_ off doing something else then that
> will slow down this kernel thread too, of course.

keventd does very hard jobs on some of my test machines which 
for example route big amount of traffic.

> Plus keventd is thread-per-cpu and quite possibly would be faster.

I experimented with several usage cases for CBUS and it was proven 
to be the fastest case when only one sending thread exists which manages
only very limited amount of messages at a time [like 10 in CBUS
currently]
without direct awakening [that is why wake_up() is commented in
cbus_insert()].
If too many deferred insert works will be called simultaneously
[which may happen with keventd] it will slow down insert operations
noticeably.
I did not try that case with the keventd but with one kernel thread 
it was tested and showed worse performance.

-- 
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


signature.asc
Description: This is a digitally signed message part


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
>
> On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
> > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> > >
> > > > > +static int cbus_event_thread(void *data)
> > >  > > +{
> > >  > > +int i, non_empty = 0, empty = 0;
> > >  > > +struct cbus_event_container *c;
> > >  > > +
> > >  > > +daemonize(cbus_name);
> > >  > > +allow_signal(SIGTERM);
> > >  > > +set_user_nice(current, 19);
> > >  > 
> > >  > Please use the kthread api for managing this thread.
> > >  > 
> > >  > Is a new kernel thread needed?
> > > 
> > >  Logic behind cbus is following: 
> > >  1. make insert operation return as soon as possible,
> > >  2. deferring actual message delivering to the safe time
> > > 
> > >  That thread does second point.
> > 
> > But does it need a new thread rather than using the existing keventd?
> 
> Yes, it is much cleaner [especially from performance tuning point] 
> to use own kernel thread than pospone all work to the queued work.
> 

Why?  Unless keventd is off doing something else (rare), it should be
exactly equivalent.  And if keventd _is_ off doing something else then that
will slow down this kernel thread too, of course.

Plus keventd is thread-per-cpu and quite possibly would be faster.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
  Evgeniy Polyakov [EMAIL PROTECTED] wrote:
  
 +static int cbus_event_thread(void *data)
  +{
  +int i, non_empty = 0, empty = 0;
  +struct cbus_event_container *c;
  +
  +daemonize(cbus_name);
  +allow_signal(SIGTERM);
  +set_user_nice(current, 19);
 
 Please use the kthread api for managing this thread.
 
 Is a new kernel thread needed?
   
Logic behind cbus is following: 
1. make insert operation return as soon as possible,
2. deferring actual message delivering to the safe time
   
That thread does second point.
  
  But does it need a new thread rather than using the existing keventd?
 
 Yes, it is much cleaner [especially from performance tuning point] 
 to use own kernel thread than pospone all work to the queued work.
 

Why?  Unless keventd is off doing something else (rare), it should be
exactly equivalent.  And if keventd _is_ off doing something else then that
will slow down this kernel thread too, of course.

Plus keventd is thread-per-cpu and quite possibly would be faster.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Evgeniy Polyakov
On Thu, 2005-03-31 at 23:59 -0800, Andrew Morton wrote:
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
  On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
   Evgeniy Polyakov [EMAIL PROTECTED] wrote:
   
  +static int cbus_event_thread(void *data)
   +{
   +  int i, non_empty = 0, empty = 0;
   +  struct cbus_event_container *c;
   +
   +  daemonize(cbus_name);
   +  allow_signal(SIGTERM);
   +  set_user_nice(current, 19);
  
  Please use the kthread api for managing this thread.
  
  Is a new kernel thread needed?

 Logic behind cbus is following: 
 1. make insert operation return as soon as possible,
 2. deferring actual message delivering to the safe time

 That thread does second point.
   
   But does it need a new thread rather than using the existing keventd?
  
  Yes, it is much cleaner [especially from performance tuning point] 
  to use own kernel thread than pospone all work to the queued work.
  
 
 Why?  Unless keventd is off doing something else (rare), it should be
 exactly equivalent.  And if keventd _is_ off doing something else then that
 will slow down this kernel thread too, of course.

keventd does very hard jobs on some of my test machines which 
for example route big amount of traffic.

 Plus keventd is thread-per-cpu and quite possibly would be faster.

I experimented with several usage cases for CBUS and it was proven 
to be the fastest case when only one sending thread exists which manages
only very limited amount of messages at a time [like 10 in CBUS
currently]
without direct awakening [that is why wake_up() is commented in
cbus_insert()].
If too many deferred insert works will be called simultaneously
[which may happen with keventd] it will slow down insert operations
noticeably.
I did not try that case with the keventd but with one kernel thread 
it was tested and showed worse performance.

-- 
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


signature.asc
Description: This is a digitally signed message part


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 On Thu, 2005-03-31 at 23:59 -0800, Andrew Morton wrote:
  Evgeniy Polyakov [EMAIL PROTECTED] wrote:
  
   On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

   +static int cbus_event_thread(void *data)
+{
+int i, non_empty = 0, empty = 0;
+struct cbus_event_container *c;
+
+daemonize(cbus_name);
+allow_signal(SIGTERM);
+set_user_nice(current, 19);
   
   Please use the kthread api for managing this thread.
   
   Is a new kernel thread needed?
 
  Logic behind cbus is following: 
  1. make insert operation return as soon as possible,
  2. deferring actual message delivering to the safe time
 
  That thread does second point.

But does it need a new thread rather than using the existing keventd?
   
   Yes, it is much cleaner [especially from performance tuning point] 
   to use own kernel thread than pospone all work to the queued work.
   
  
  Why?  Unless keventd is off doing something else (rare), it should be
  exactly equivalent.  And if keventd _is_ off doing something else then that
  will slow down this kernel thread too, of course.
 
 keventd does very hard jobs on some of my test machines which 
 for example route big amount of traffic.

As I said - that's going to cause _your_ kernel thread to be slowed down as
well.

I mean, it's just a choice between two ways of multiplexing the CPU.  One
is via a context switch in schedule() and the other is via list traversal
in run_workqueue().  The latter will be faster.

  Plus keventd is thread-per-cpu and quite possibly would be faster.
 
 I experimented with several usage cases for CBUS and it was proven 
 to be the fastest case when only one sending thread exists which manages
 only very limited amount of messages at a time [like 10 in CBUS
 currently]

Maybe that's because the cbus data structures are insufficiently
percpuified.  On really big machines that single kernel thread will be a
big bottleneck.

 without direct awakening [that is why wake_up() is commented in
 cbus_insert()].

You mean the

interruptible_sleep_on_timeout(cbus_wait_queue, 10);

?  (That should be HZ/100, btw).

That seems a bit kludgy - someone could queue 1 messages in that time,
although they'll probably run out of memory first, if it's doing
GFP_ATOMIC.

Introducing an up-to-ten millisecond latency seems a lot worse than some
reduction in peak bandwidth - it's not as if pumping 10 events/sec is a
likely use case.  Using prepare_to_wait/finish_wait will provide some
speedup in SMP environments due to reduced cacheline transfers.

 If too many deferred insert works will be called simultaneously
 [which may happen with keventd] it will slow down insert operations
 noticeably.

What is a deferred insert work?  Insertion is always synchronous?

 I did not try that case with the keventd but with one kernel thread 
 it was tested and showed worse performance.

But your current implementation has only one kernel thread?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Evgeniy Polyakov
On Fri, 2005-04-01 at 01:25 -0800, Andrew Morton wrote:
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
  On Thu, 2005-03-31 at 23:59 -0800, Andrew Morton wrote:
   Evgeniy Polyakov [EMAIL PROTECTED] wrote:
   
On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
+static int cbus_event_thread(void *data)
 +{
 +  int i, non_empty = 0, empty = 0;
 +  struct cbus_event_container *c;
 +
 +  daemonize(cbus_name);
 +  allow_signal(SIGTERM);
 +  set_user_nice(current, 19);

Please use the kthread api for managing this thread.

Is a new kernel thread needed?
  
   Logic behind cbus is following: 
   1. make insert operation return as soon as possible,
   2. deferring actual message delivering to the safe time
  
   That thread does second point.
 
 But does it need a new thread rather than using the existing keventd?

Yes, it is much cleaner [especially from performance tuning point] 
to use own kernel thread than pospone all work to the queued work.

   
   Why?  Unless keventd is off doing something else (rare), it should be
   exactly equivalent.  And if keventd _is_ off doing something else then 
   that
   will slow down this kernel thread too, of course.
  
  keventd does very hard jobs on some of my test machines which 
  for example route big amount of traffic.
 
 As I said - that's going to cause _your_ kernel thread to be slowed down as
 well.

Yes, but it does not solve peak performance issues - all scheduled
jobs can run one after another which will decrease insert performance.

 I mean, it's just a choice between two ways of multiplexing the CPU.  One
 is via a context switch in schedule() and the other is via list traversal
 in run_workqueue().  The latter will be faster.

But in case of separate thread one can control execution process,
if it will be called from work queue then insert requests 
can appear one after another in a very interval,
so theirs processing will hurt insert performance.

   Plus keventd is thread-per-cpu and quite possibly would be faster.
  
  I experimented with several usage cases for CBUS and it was proven 
  to be the fastest case when only one sending thread exists which manages
  only very limited amount of messages at a time [like 10 in CBUS
  currently]
 
 Maybe that's because the cbus data structures are insufficiently
 percpuified.  On really big machines that single kernel thread will be a
 big bottleneck.

It is not because of messages itself, but becouse of it's peaks,
if there is a peak then above mechanism will smooth it into
several pieces [for example 10 in each bundle, that value should be
changeable in run-time, will place it into TODO],
with keventd there is no guarantee that next peak will be processed
not just after the current one, but after some timeout.

  without direct awakening [that is why wake_up() is commented in
  cbus_insert()].
 
 You mean the
 
   interruptible_sleep_on_timeout(cbus_wait_queue, 10);
 
 ?  (That should be HZ/100, btw).
 
 That seems a bit kludgy - someone could queue 1 messages in that time,
 although they'll probably run out of memory first, if it's doing
 GFP_ATOMIC.

GFP_ATOMIC issues will be resolved first.

 Introducing an up-to-ten millisecond latency seems a lot worse than some
 reduction in peak bandwidth - it's not as if pumping 10 events/sec is a
 likely use case.  Using prepare_to_wait/finish_wait will provide some
 speedup in SMP environments due to reduced cacheline transfers.

It is a question actually...
If we allow peak processing, then we _definitely_ will have insert 
performance degradation, it was observed in my tests.
The main goal of CBUS was exactly insert speed - so
it somehow must smooth shape performance peaks, and thus
above budget was introdyced.
It is similar to NAPI in some abstract way, but with different aims - 
NAPI for speed improovement, but here we have peak smootheness.

  If too many deferred insert works will be called simultaneously
  [which may happen with keventd] it will slow down insert operations
  noticeably.
 
 What is a deferred insert work?  Insertion is always synchronous?

Insert is synchronous in one CPU, but actuall message delivering is
deferred.

  I did not try that case with the keventd but with one kernel thread 
  it was tested and showed worse performance.
 
 But your current implementation has only one kernel thread?

It has a budget and timeout between each bundle processing.
keventd does not allow to create such a timeout between each bundle
processing.

-- 
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


signature.asc
Description: This is a digitally signed message part


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

   keventd does very hard jobs on some of my test machines which 
   for example route big amount of traffic.
  
  As I said - that's going to cause _your_ kernel thread to be slowed down as
  well.
 
 Yes, but it does not solve peak performance issues - all scheduled
 jobs can run one after another which will decrease insert performance.

Well the keventd handler would simply process all currently-queued
messages.  It's not as if you'd only process one event per keventd callout.

  I mean, it's just a choice between two ways of multiplexing the CPU.  One
  is via a context switch in schedule() and the other is via list traversal
  in run_workqueue().  The latter will be faster.
 
 But in case of separate thread one can control execution process,
 if it will be called from work queue then insert requests 
 can appear one after another in a very interval,
 so theirs processing will hurt insert performance.

Why does insert performance matter so much?  These things still have to be
sent up to userspace.

(please remind me why cbus exists, btw.  What functionality does it offer
which the connector code doesn't?)

Plus keventd is thread-per-cpu and quite possibly would be faster.
   
   I experimented with several usage cases for CBUS and it was proven 
   to be the fastest case when only one sending thread exists which manages
   only very limited amount of messages at a time [like 10 in CBUS
   currently]
  
  Maybe that's because the cbus data structures are insufficiently
  percpuified.  On really big machines that single kernel thread will be a
  big bottleneck.
 
 It is not because of messages itself, but becouse of it's peaks,
 if there is a peak then above mechanism will smooth it into
 several pieces [for example 10 in each bundle, that value should be
 changeable in run-time, will place it into TODO],
 with keventd there is no guarantee that next peak will be processed
 not just after the current one, but after some timeout.

keventd should process all the currently-queued messages.  If messages are
being queued quickly then that will be a lot of messages on each keventd
callout.

  Introducing an up-to-ten millisecond latency seems a lot worse than some
  reduction in peak bandwidth - it's not as if pumping 10 events/sec is a
  likely use case.  Using prepare_to_wait/finish_wait will provide some
  speedup in SMP environments due to reduced cacheline transfers.
 
 It is a question actually...
 If we allow peak processing, then we _definitely_ will have insert 
 performance degradation, it was observed in my tests.

I don't understand your terminology.  What is peak processing and how
does it differ from maximum insertion rate?

 The main goal of CBUS was exactly insert speed

Why?

 - so
 it somehow must smooth shape performance peaks, and thus
 above budget was introdyced.

An up-to-ten-millisecond latency between the kernel-side queueing of a
message and the delivery of that message to userspace sounds like an
awfully bad restriction to me.  Even one millisecond will have impact in
some scenarios.

 It is similar to NAPI in some abstract way, but with different aims - 
 NAPI for speed improovement, but here we have peak smootheness.
 
   If too many deferred insert works will be called simultaneously
   [which may happen with keventd] it will slow down insert operations
   noticeably.
  
  What is a deferred insert work?  Insertion is always synchronous?
 
 Insert is synchronous in one CPU, but actuall message delivering is
 deferred.

OK, so why does it matter that If too many deferred insert works will be
called [which may happen with keventd] it will slow down insert operations
noticeably?

There's no point in being able to internally queue messages at a higher
frequency than we can deliver them to userspace.  Confused.

   I did not try that case with the keventd but with one kernel thread 
   it was tested and showed worse performance.
  
  But your current implementation has only one kernel thread?
 
 It has a budget and timeout between each bundle processing.
 keventd does not allow to create such a timeout between each bundle
 processing.

Yes, there's batching there.  But I don't understand why the ability to
internally queue events at a high rate is so much more important than the
latency which that batching will introduce.

(And keventd _does_ allow such batching.  schedule_delayed_work()).

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Evgeniy Polyakov
On Fri, 2005-04-01 at 02:30 -0800, Andrew Morton wrote:
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
keventd does very hard jobs on some of my test machines which 
for example route big amount of traffic.
   
   As I said - that's going to cause _your_ kernel thread to be slowed down 
   as
   well.
  
  Yes, but it does not solve peak performance issues - all scheduled
  jobs can run one after another which will decrease insert performance.
 
 Well the keventd handler would simply process all currently-queued
 messages.  It's not as if you'd only process one event per keventd callout.

But that will hurt insert performance.
Processing all messages without splitting them up into pieces noticebly
slows insert operation down.

   I mean, it's just a choice between two ways of multiplexing the CPU.  One
   is via a context switch in schedule() and the other is via list traversal
   in run_workqueue().  The latter will be faster.
  
  But in case of separate thread one can control execution process,
  if it will be called from work queue then insert requests 
  can appear one after another in a very interval,
  so theirs processing will hurt insert performance.
 
 Why does insert performance matter so much?  These things still have to be
 sent up to userspace.
 
 (please remind me why cbus exists, btw.  What functionality does it offer
 which the connector code doesn't?)

The main goal of CBUS is insert operation performance.
Anyone who wants to have maximum delivery speed should use direct
connector's
methods instead.

 Plus keventd is thread-per-cpu and quite possibly would be faster.

I experimented with several usage cases for CBUS and it was proven 
to be the fastest case when only one sending thread exists which manages
only very limited amount of messages at a time [like 10 in CBUS
currently]
   
   Maybe that's because the cbus data structures are insufficiently
   percpuified.  On really big machines that single kernel thread will be a
   big bottleneck.
  
  It is not because of messages itself, but becouse of it's peaks,
  if there is a peak then above mechanism will smooth it into
  several pieces [for example 10 in each bundle, that value should be
  changeable in run-time, will place it into TODO],
  with keventd there is no guarantee that next peak will be processed
  not just after the current one, but after some timeout.
 
 keventd should process all the currently-queued messages.  If messages are
 being queued quickly then that will be a lot of messages on each keventd
 callout.

But for maximum _insert_ operatios, none should process _all_ messages
at once, 
even if there are many of them.
One needs to smooth peak message number into pieces and process each
piece after previous ane after some timeout.

   Introducing an up-to-ten millisecond latency seems a lot worse than some
   reduction in peak bandwidth - it's not as if pumping 10 events/sec is 
   a
   likely use case.  Using prepare_to_wait/finish_wait will provide some
   speedup in SMP environments due to reduced cacheline transfers.
  
  It is a question actually...
  If we allow peak processing, then we _definitely_ will have insert 
  performance degradation, it was observed in my tests.
 
 I don't understand your terminology.  What is peak processing and how
 does it differ from maximum insertion rate?
 
  The main goal of CBUS was exactly insert speed
 
 Why?

To allow connector usage in a really fast pathes.
If one cares about _delivery_ speed then one should use cn_netlink_send
().

  - so
  it somehow must smooth shape performance peaks, and thus
  above budget was introdyced.
 
 An up-to-ten-millisecond latency between the kernel-side queueing of a
 message and the delivery of that message to userspace sounds like an
 awfully bad restriction to me.  Even one millisecond will have impact in
 some scenarios.

If you care about delivery speed stronger than insertion, 
then you do not need to use CBUS, but cn_netlink_send() instead.

I will test smaller values, but doubt it will have better insert
performance.

  It is similar to NAPI in some abstract way, but with different aims - 
  NAPI for speed improovement, but here we have peak smootheness.
  
If too many deferred insert works will be called simultaneously
[which may happen with keventd] it will slow down insert operations
noticeably.
   
   What is a deferred insert work?  Insertion is always synchronous?
  
  Insert is synchronous in one CPU, but actuall message delivering is
  deferred.
 
 OK, so why does it matter that If too many deferred insert works will be
 called [which may happen with keventd] it will slow down insert operations
 noticeably?
 
 There's no point in being able to internally queue messages at a higher
 frequency than we can deliver them to userspace.  Confused.

Concider fork() connector - it is better for userspace
to return from system call as soon as possible, while information
delivering
about 

Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 On Fri, 2005-04-01 at 02:30 -0800, Andrew Morton wrote:
  Evgeniy Polyakov [EMAIL PROTECTED] wrote:
  
 keventd does very hard jobs on some of my test machines which 
 for example route big amount of traffic.

As I said - that's going to cause _your_ kernel thread to be slowed 
down as
well.
   
   Yes, but it does not solve peak performance issues - all scheduled
   jobs can run one after another which will decrease insert performance.
  
  Well the keventd handler would simply process all currently-queued
  messages.  It's not as if you'd only process one event per keventd callout.
 
 But that will hurt insert performance.

Why?  All it involves is one schedule_work() on the insert side.  And that
will involve just a single test_bit() in the great majority of cases
because the work will already be pending.

 Processing all messages without splitting them up into pieces noticebly
 slows insert operation down.

What does splitting them up into pieces mean?  They're single messages
end-to-end.  You've been discussing batching of messages, which is the
opposite?

  (please remind me why cbus exists, btw.  What functionality does it offer
  which the connector code doesn't?)
 
 The main goal of CBUS is insert operation performance.
 Anyone who wants to have maximum delivery speed should use direct
 connector's
 methods instead.

Delivery speed is not the same thing as insertion speed.  All the insertion
does is to place the event onto an internal queue.  We still need to do a
context switch and send the thing.  Provided there's a reasonable amount of
batching, the CPU consumption will be the same in both cases.

Introducing an up-to-ten millisecond latency seems a lot worse than some
reduction in peak bandwidth - it's not as if pumping 10 events/sec 
is a
likely use case.  Using prepare_to_wait/finish_wait will provide some
speedup in SMP environments due to reduced cacheline transfers.
   
   It is a question actually...
   If we allow peak processing, then we _definitely_ will have insert 
   performance degradation, it was observed in my tests.
  
  I don't understand your terminology.  What is peak processing and how
  does it differ from maximum insertion rate?
  
   The main goal of CBUS was exactly insert speed
  
  Why?
 
 To allow connector usage in a really fast pathes.
 If one cares about _delivery_ speed then one should use cn_netlink_send
 ().

We care about both insertion and delivery!  Sure, simply sticking the event
onto a queue is faster than delivering it.  But we still need to deliver it
sometime so there's no aggregate gain.  Still confused.

   - so
   it somehow must smooth shape performance peaks, and thus
   above budget was introdyced.
  
  An up-to-ten-millisecond latency between the kernel-side queueing of a
  message and the delivery of that message to userspace sounds like an
  awfully bad restriction to me.  Even one millisecond will have impact in
  some scenarios.
 
 If you care about delivery speed stronger than insertion, 
 then you do not need to use CBUS, but cn_netlink_send() instead.
 
 I will test smaller values, but doubt it will have better insert
 performance.

I fail to see why it is desirable to defer the delivery.  The delivery has
to happen some time, and we've now incurred additional context switch
overhead.

 Concider fork() connector - it is better for userspace
 to return from system call as soon as possible, while information
 delivering
 about that event will be delayed.

Why?  The amount of CPU required to deliver the information regarding a
fork is the same in either case.  Probably more, in the deferred case.

 Noone says that queueing is done in much higher rate then delivering, 
 it only shapes shart peaks when it is unacceptably to wait untill
 delivery
 is finished.

Maybe cbus gave better lmbench numbers because the forking was happening on
one CPU and the event delivery was pushed across to the other one.  OK for
a microbenchmark, but dubious for the real world.

I can see that there might be CPU consumption improvements due to the
batching of work and hence more effective CPU cache utilisation.  That
would need to be carefully measured, and you'd get the same effect by
simply doing a list_add+schedule_work for each insertion.


 I did not try that case with the keventd but with one kernel thread 
 it was tested and showed worse performance.

But your current implementation has only one kernel thread?
   
   It has a budget and timeout between each bundle processing.
   keventd does not allow to create such a timeout between each bundle
   processing.
  
  Yes, there's batching there.  But I don't understand why the ability to
  internally queue events at a high rate is so much more important than the
  latency which that batching will introduce.
 
 With such mechanism we may use event notification in a really fast
 pathes.
 We always defer 

Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Evgeniy Polyakov
On Fri, 2005-04-01 at 03:20 -0800, Andrew Morton wrote: 
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
  On Fri, 2005-04-01 at 02:30 -0800, Andrew Morton wrote:
   Evgeniy Polyakov [EMAIL PROTECTED] wrote:
   
  keventd does very hard jobs on some of my test machines which 
  for example route big amount of traffic.
 
 As I said - that's going to cause _your_ kernel thread to be slowed 
 down as
 well.

Yes, but it does not solve peak performance issues - all scheduled
jobs can run one after another which will decrease insert performance.
   
   Well the keventd handler would simply process all currently-queued
   messages.  It's not as if you'd only process one event per keventd 
   callout.
  
  But that will hurt insert performance.
 
 Why?  All it involves is one schedule_work() on the insert side.  And that
 will involve just a single test_bit() in the great majority of cases
 because the work will already be pending.

Here is example:
schedule_work();
keventd-cbus_process(), which has 2 variants:
1. process all pending events.
2. process only number of them.

In the first case we will hurt very noticeble insert performance,
because actual delivery takes some time, so one process will
take time_for_one_delivery * number_of_events_to_be_delivered, 
since in a peak number_of_events_to_be_delivered may be very high,
it will take too much time to flush the event queue and deliver all 
messages.

In the second case we finish our work in predictible time, 
but it can not help us with the keventd, which may [and is] cought
new schedule_work(), and thus will run cbus_process() again
without time window after previous delivering.

That time window is _very_ helpfull for the insert performance
and thus low latencies.

  Processing all messages without splitting them up into pieces noticebly
  slows insert operation down.
 
 What does splitting them up into pieces mean?  They're single messages
 end-to-end.  You've been discussing batching of messages, which is the
 opposite?

There is a queue of single event messages, if we process them all in one
shot, 
then there is no time to insert new event [each CPU is running keventd
thread]
untill all previous are sent.
So we split that queue into pieces of [currently] 10 messages in each, 
and send only them, then we sleep for some time, in which new inserts 
can be completed, then process next 10...

   (please remind me why cbus exists, btw.  What functionality does it offer
   which the connector code doesn't?)
  
  The main goal of CBUS is insert operation performance.
  Anyone who wants to have maximum delivery speed should use direct
  connector's
  methods instead.
 
 Delivery speed is not the same thing as insertion speed.  All the insertion
 does is to place the event onto an internal queue.  We still need to do a
 context switch and send the thing.  Provided there's a reasonable amount of
 batching, the CPU consumption will be the same in both cases.

Sending is slow [in comparison to insertion], so it can be deferred 
for better latency.
Context switch and actuall sending will happen after main function(like
fork(), 
or any other in fast path) is already finished, so we do not hurt it's
performance.

 Introducing an up-to-ten millisecond latency seems a lot worse than 
 some
 reduction in peak bandwidth - it's not as if pumping 10 
 events/sec is a
 likely use case.  Using prepare_to_wait/finish_wait will provide some
 speedup in SMP environments due to reduced cacheline transfers.

It is a question actually...
If we allow peak processing, then we _definitely_ will have insert 
performance degradation, it was observed in my tests.
   
   I don't understand your terminology.  What is peak processing and how
   does it differ from maximum insertion rate?
   
The main goal of CBUS was exactly insert speed
   
   Why?
  
  To allow connector usage in a really fast pathes.
  If one cares about _delivery_ speed then one should use cn_netlink_send
  ().
 
 We care about both insertion and delivery!  Sure, simply sticking the event
 onto a queue is faster than delivering it.  But we still need to deliver it
 sometime so there's no aggregate gain.  Still confused.

If one needs low latency events in peaks - use CBUS, it will smooth 
shapes, since actual delivering will be postponed.

There is no _aggregate_ gain - only immediate low latenciy with work
deferring.

- so
it somehow must smooth shape performance peaks, and thus
above budget was introdyced.
   
   An up-to-ten-millisecond latency between the kernel-side queueing of a
   message and the delivery of that message to userspace sounds like an
   awfully bad restriction to me.  Even one millisecond will have impact in
   some scenarios.
  
  If you care about delivery speed stronger than insertion, 
  then you do not need to use CBUS, but cn_netlink_send() instead.
  
  I will test smaller values, but doubt it will 

Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-04-01 Thread Andrew Morton
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 Andrew, CBUS is not intended to be faster than connector itself, 
  it is just not possible, since it calls connector's methods
  with some preparation, which takes time.

Right - it's simply transferring work from one place to another.

  CBUS was designed to provide very fast _insert_ operation.

I just don't see any point in doing this.  If the aggregate system load is
unchanged (possibly increased) then why is cbus desirable?

The only advantage I can see is that it permits irq-context messaging.

  It is needed not only for fork() accounting, but for any
  fast path, when we do not want to slow process down just to
  send notification about it, instead we can create such a notification,
  and deliver it later.

And when we deliver it later, we slow processes down!

  Why do we defer all work from HW IRQ into BH context?

To enable more interrupts to come in while we're doing that work.

  Because while we are in HW IRQ we can not perform other tasks,
  so with connector and CBUS we have the same situation.

I agree that being able to send from irq context would be handy.  If we had
any code which wants to do that, and at present we do not.

But I fail to see any advantage in moving work out of fork() context, into
kthread context and incurring additional context switch overhead.  Apart
from conceivably better CPU cache utilisation.

The fact that deferred delivery can cause an arbitrarily large backlog and,
ultimately, deliberate message droppage or oom significantly impacts the
reliability of the whole scheme and means that well-designed code must use
synchronous delivery _anyway_.  The deferred delivery should only be used
from IRQ context for low-bandwidth delivery rates.

  While we are sending a low priority event, we stops actuall work,
  which is not acceptible in many situations.

Have you tested the average forking rate on a uniprocessor machine with
this patch?  If the forking continues for (say) one second, does the patch
provide any performance benefit?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-31 Thread Evgeniy Polyakov
On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >
> > > > +static int cbus_event_thread(void *data)
> >  > > +{
> >  > > +  int i, non_empty = 0, empty = 0;
> >  > > +  struct cbus_event_container *c;
> >  > > +
> >  > > +  daemonize(cbus_name);
> >  > > +  allow_signal(SIGTERM);
> >  > > +  set_user_nice(current, 19);
> >  > 
> >  > Please use the kthread api for managing this thread.
> >  > 
> >  > Is a new kernel thread needed?
> > 
> >  Logic behind cbus is following: 
> >  1. make insert operation return as soon as possible,
> >  2. deferring actual message delivering to the safe time
> > 
> >  That thread does second point.
> 
> But does it need a new thread rather than using the existing keventd?

Yes, it is much cleaner [especially from performance tuning point] 
to use own kernel thread than pospone all work to the queued work.

-- 
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


signature.asc
Description: This is a digitally signed message part


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-31 Thread Andrew Morton
Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
>
> > > +static int cbus_event_thread(void *data)
>  > > +{
>  > > +int i, non_empty = 0, empty = 0;
>  > > +struct cbus_event_container *c;
>  > > +
>  > > +daemonize(cbus_name);
>  > > +allow_signal(SIGTERM);
>  > > +set_user_nice(current, 19);
>  > 
>  > Please use the kthread api for managing this thread.
>  > 
>  > Is a new kernel thread needed?
> 
>  Logic behind cbus is following: 
>  1. make insert operation return as soon as possible,
>  2. deferring actual message delivering to the safe time
> 
>  That thread does second point.

But does it need a new thread rather than using the existing keventd?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-31 Thread Evgeniy Polyakov
On Thu, 2005-03-31 at 16:27 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >
> > I'm pleased to annouce CBUS - ultra fast (for insert operations)
> > message bus.
> 
> > +static int cbus_enqueue(struct cbus_event_container *c, struct cn_msg *msg)
> > +{
> > +   int err;
> > +   struct cbus_event *event;
> > +   unsigned long flags;
> > +
> > +   event = kmalloc(sizeof(*event) + msg->len, GFP_ATOMIC);
> 
> Using GFP_ATOMIC is a bit lame.  It would be better to require the caller
> to pass in the gfp_flags.  Or simply require that all callers not hold
> locks and use GFP_KERNEL.

New API with GFP flags provided is a next step in connector's TODO list.

> > +static int cbus_process(struct cbus_event_container *c, int all)
> > +{
> > +   struct cbus_event *event;
> > +   int len, i, num;
> > +   
> > +   if (list_empty(>event_list))
> > +   return 0;
> > +
> > +   if (all)
> > +   len = c->qlen;
> > +   else
> > +   len = 1;
> > +
> > +   num = 0;
> > +   for (i=0; i > +   event = cbus_dequeue(c);
> > +   if (!event)
> > +   continue;
> > +
> > +   cn_netlink_send(>msg, 0);
> > +   num++;
> > +
> > +   kfree(event);
> > +   }
> > +   
> > +   return num;
> > +}
> 
> It might be cleaner to pass in an item count rather than a boolean `all'
> here.  Then again, it seems racy.

It was called somehow like
we_are_at_the_end_and_must_process_all_events_remain, 
so cbus_process() could be called from the ->exit() routing.
So I decided to call it that way, but I'm not so impracticabile about
it.

> The initial list_empty() call could fail to detect new events due to lack
> of locking and memory barriers.

It is perfectly normal, and locking does not exist here for performance
reasons.
cbus_process() is too low priority in comparison with insert operation,
so it can easily miss one entry and process it next time.

> We conventionally code for loops as
> 
>   for (i = 0; i < len; i++)

Grrr

> > +static int cbus_event_thread(void *data)
> > +{
> > +   int i, non_empty = 0, empty = 0;
> > +   struct cbus_event_container *c;
> > +
> > +   daemonize(cbus_name);
> > +   allow_signal(SIGTERM);
> > +   set_user_nice(current, 19);
> 
> Please use the kthread api for managing this thread.
> 
> Is a new kernel thread needed?

Logic behind cbus is following: 
1. make insert operation return as soon as possible,
2. deferring actual message delivering to the safe time

That thread does second point.

> > +   while (!cbus_need_exit) {
> > +   if (empty || non_empty == 0 || non_empty > 10) {
> > +   interruptible_sleep_on_timeout(_wait_queue, 10);
> 
> interruptible_sleep_on_timeout() is heavily deprecated and is racy without
> external locking (it pretty much has to be the BKL).  Use 
> wait_event_timeout().

Ok.

> > +int __devinit cbus_init(void)
> > +{
> > +   int i, err = 0;
> > +   struct cbus_event_container *c;
> > +   
> > +   for_each_cpu(i) {
> > +   c = _cpu(cbus_event_list, i);
> > +   cbus_init_event_container(c);
> > +   }
> > +
> > +   init_completion(_thread_exited);
> > +
> > +   cbus_pid = kernel_thread(cbus_event_thread, NULL, CLONE_FS | 
> > CLONE_FILES);
> 
> Using the kthread API would clean this up.
> 
> > +   if (IS_ERR((void *)cbus_pid)) {
> 
> The weird cast here might not even work at all on 64-bit architectures.  It
> depends if they sign extend ints when casting them to pointers.  I guess
> they do.  If cbus_pid is indeed an s32.
> 
> Much better to do
> 
>   if (cbus_pid < 0)

I will do it after above issues resolved.

> > +void __devexit cbus_fini(void)
> > +{
> > +   int i;
> > +   struct cbus_event_container *c;
> > +
> > +   cbus_need_exit = 1;
> > +   kill_proc(cbus_pid, SIGTERM, 0);
> > +   wait_for_completion(_thread_exited);
> > +   
> > +   for_each_cpu(i) {
> > +   c = _cpu(cbus_event_list, i);
> > +   cbus_fini_event_container(c);
> > +   }
> > +}
> 
> I think this is racy.  What stops new events from being queued while this
> function is in progress?

cbus_insert() should check need_exit flag - patch exists, 
but against my tree, so I wait untill CBUS showed in public,
so I can resync with it.

-- 
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


signature.asc
Description: This is a digitally signed message part


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-31 Thread Andrew Morton
Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
>
> I'm pleased to annouce CBUS - ultra fast (for insert operations)
> message bus.

> +static int cbus_enqueue(struct cbus_event_container *c, struct cn_msg *msg)
> +{
> + int err;
> + struct cbus_event *event;
> + unsigned long flags;
> +
> + event = kmalloc(sizeof(*event) + msg->len, GFP_ATOMIC);

Using GFP_ATOMIC is a bit lame.  It would be better to require the caller
to pass in the gfp_flags.  Or simply require that all callers not hold
locks and use GFP_KERNEL.

> +static int cbus_process(struct cbus_event_container *c, int all)
> +{
> + struct cbus_event *event;
> + int len, i, num;
> + 
> + if (list_empty(>event_list))
> + return 0;
> +
> + if (all)
> + len = c->qlen;
> + else
> + len = 1;
> +
> + num = 0;
> + for (i=0; i + event = cbus_dequeue(c);
> + if (!event)
> + continue;
> +
> + cn_netlink_send(>msg, 0);
> + num++;
> +
> + kfree(event);
> + }
> + 
> + return num;
> +}

It might be cleaner to pass in an item count rather than a boolean `all'
here.  Then again, it seems racy.

The initial list_empty() call could fail to detect new events due to lack
of locking and memory barriers.

We conventionally code for loops as

for (i = 0; i < len; i++)

> +static int cbus_event_thread(void *data)
> +{
> + int i, non_empty = 0, empty = 0;
> + struct cbus_event_container *c;
> +
> + daemonize(cbus_name);
> + allow_signal(SIGTERM);
> + set_user_nice(current, 19);

Please use the kthread api for managing this thread.

Is a new kernel thread needed?

> + while (!cbus_need_exit) {
> + if (empty || non_empty == 0 || non_empty > 10) {
> + interruptible_sleep_on_timeout(_wait_queue, 10);

interruptible_sleep_on_timeout() is heavily deprecated and is racy without
external locking (it pretty much has to be the BKL).  Use wait_event_timeout().

> +int __devinit cbus_init(void)
> +{
> + int i, err = 0;
> + struct cbus_event_container *c;
> + 
> + for_each_cpu(i) {
> + c = _cpu(cbus_event_list, i);
> + cbus_init_event_container(c);
> + }
> +
> + init_completion(_thread_exited);
> +
> + cbus_pid = kernel_thread(cbus_event_thread, NULL, CLONE_FS | 
> CLONE_FILES);

Using the kthread API would clean this up.

> + if (IS_ERR((void *)cbus_pid)) {

The weird cast here might not even work at all on 64-bit architectures.  It
depends if they sign extend ints when casting them to pointers.  I guess
they do.  If cbus_pid is indeed an s32.

Much better to do

if (cbus_pid < 0)

> +void __devexit cbus_fini(void)
> +{
> + int i;
> + struct cbus_event_container *c;
> +
> + cbus_need_exit = 1;
> + kill_proc(cbus_pid, SIGTERM, 0);
> + wait_for_completion(_thread_exited);
> + 
> + for_each_cpu(i) {
> + c = _cpu(cbus_event_list, i);
> + cbus_fini_event_container(c);
> + }
> +}

I think this is racy.  What stops new events from being queued while this
function is in progress?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-31 Thread Andrew Morton
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 I'm pleased to annouce CBUS - ultra fast (for insert operations)
 message bus.

 +static int cbus_enqueue(struct cbus_event_container *c, struct cn_msg *msg)
 +{
 + int err;
 + struct cbus_event *event;
 + unsigned long flags;
 +
 + event = kmalloc(sizeof(*event) + msg-len, GFP_ATOMIC);

Using GFP_ATOMIC is a bit lame.  It would be better to require the caller
to pass in the gfp_flags.  Or simply require that all callers not hold
locks and use GFP_KERNEL.

 +static int cbus_process(struct cbus_event_container *c, int all)
 +{
 + struct cbus_event *event;
 + int len, i, num;
 + 
 + if (list_empty(c-event_list))
 + return 0;
 +
 + if (all)
 + len = c-qlen;
 + else
 + len = 1;
 +
 + num = 0;
 + for (i=0; ilen; ++i) {
 + event = cbus_dequeue(c);
 + if (!event)
 + continue;
 +
 + cn_netlink_send(event-msg, 0);
 + num++;
 +
 + kfree(event);
 + }
 + 
 + return num;
 +}

It might be cleaner to pass in an item count rather than a boolean `all'
here.  Then again, it seems racy.

The initial list_empty() call could fail to detect new events due to lack
of locking and memory barriers.

We conventionally code for loops as

for (i = 0; i  len; i++)

 +static int cbus_event_thread(void *data)
 +{
 + int i, non_empty = 0, empty = 0;
 + struct cbus_event_container *c;
 +
 + daemonize(cbus_name);
 + allow_signal(SIGTERM);
 + set_user_nice(current, 19);

Please use the kthread api for managing this thread.

Is a new kernel thread needed?

 + while (!cbus_need_exit) {
 + if (empty || non_empty == 0 || non_empty  10) {
 + interruptible_sleep_on_timeout(cbus_wait_queue, 10);

interruptible_sleep_on_timeout() is heavily deprecated and is racy without
external locking (it pretty much has to be the BKL).  Use wait_event_timeout().

 +int __devinit cbus_init(void)
 +{
 + int i, err = 0;
 + struct cbus_event_container *c;
 + 
 + for_each_cpu(i) {
 + c = per_cpu(cbus_event_list, i);
 + cbus_init_event_container(c);
 + }
 +
 + init_completion(cbus_thread_exited);
 +
 + cbus_pid = kernel_thread(cbus_event_thread, NULL, CLONE_FS | 
 CLONE_FILES);

Using the kthread API would clean this up.

 + if (IS_ERR((void *)cbus_pid)) {

The weird cast here might not even work at all on 64-bit architectures.  It
depends if they sign extend ints when casting them to pointers.  I guess
they do.  If cbus_pid is indeed an s32.

Much better to do

if (cbus_pid  0)

 +void __devexit cbus_fini(void)
 +{
 + int i;
 + struct cbus_event_container *c;
 +
 + cbus_need_exit = 1;
 + kill_proc(cbus_pid, SIGTERM, 0);
 + wait_for_completion(cbus_thread_exited);
 + 
 + for_each_cpu(i) {
 + c = per_cpu(cbus_event_list, i);
 + cbus_fini_event_container(c);
 + }
 +}

I think this is racy.  What stops new events from being queued while this
function is in progress?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-31 Thread Evgeniy Polyakov
On Thu, 2005-03-31 at 16:27 -0800, Andrew Morton wrote:
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
  I'm pleased to annouce CBUS - ultra fast (for insert operations)
  message bus.
 
  +static int cbus_enqueue(struct cbus_event_container *c, struct cn_msg *msg)
  +{
  +   int err;
  +   struct cbus_event *event;
  +   unsigned long flags;
  +
  +   event = kmalloc(sizeof(*event) + msg-len, GFP_ATOMIC);
 
 Using GFP_ATOMIC is a bit lame.  It would be better to require the caller
 to pass in the gfp_flags.  Or simply require that all callers not hold
 locks and use GFP_KERNEL.

New API with GFP flags provided is a next step in connector's TODO list.

  +static int cbus_process(struct cbus_event_container *c, int all)
  +{
  +   struct cbus_event *event;
  +   int len, i, num;
  +   
  +   if (list_empty(c-event_list))
  +   return 0;
  +
  +   if (all)
  +   len = c-qlen;
  +   else
  +   len = 1;
  +
  +   num = 0;
  +   for (i=0; ilen; ++i) {
  +   event = cbus_dequeue(c);
  +   if (!event)
  +   continue;
  +
  +   cn_netlink_send(event-msg, 0);
  +   num++;
  +
  +   kfree(event);
  +   }
  +   
  +   return num;
  +}
 
 It might be cleaner to pass in an item count rather than a boolean `all'
 here.  Then again, it seems racy.

It was called somehow like
we_are_at_the_end_and_must_process_all_events_remain, 
so cbus_process() could be called from the -exit() routing.
So I decided to call it that way, but I'm not so impracticabile about
it.

 The initial list_empty() call could fail to detect new events due to lack
 of locking and memory barriers.

It is perfectly normal, and locking does not exist here for performance
reasons.
cbus_process() is too low priority in comparison with insert operation,
so it can easily miss one entry and process it next time.

 We conventionally code for loops as
 
   for (i = 0; i  len; i++)

Grrr

  +static int cbus_event_thread(void *data)
  +{
  +   int i, non_empty = 0, empty = 0;
  +   struct cbus_event_container *c;
  +
  +   daemonize(cbus_name);
  +   allow_signal(SIGTERM);
  +   set_user_nice(current, 19);
 
 Please use the kthread api for managing this thread.
 
 Is a new kernel thread needed?

Logic behind cbus is following: 
1. make insert operation return as soon as possible,
2. deferring actual message delivering to the safe time

That thread does second point.

  +   while (!cbus_need_exit) {
  +   if (empty || non_empty == 0 || non_empty  10) {
  +   interruptible_sleep_on_timeout(cbus_wait_queue, 10);
 
 interruptible_sleep_on_timeout() is heavily deprecated and is racy without
 external locking (it pretty much has to be the BKL).  Use 
 wait_event_timeout().

Ok.

  +int __devinit cbus_init(void)
  +{
  +   int i, err = 0;
  +   struct cbus_event_container *c;
  +   
  +   for_each_cpu(i) {
  +   c = per_cpu(cbus_event_list, i);
  +   cbus_init_event_container(c);
  +   }
  +
  +   init_completion(cbus_thread_exited);
  +
  +   cbus_pid = kernel_thread(cbus_event_thread, NULL, CLONE_FS | 
  CLONE_FILES);
 
 Using the kthread API would clean this up.
 
  +   if (IS_ERR((void *)cbus_pid)) {
 
 The weird cast here might not even work at all on 64-bit architectures.  It
 depends if they sign extend ints when casting them to pointers.  I guess
 they do.  If cbus_pid is indeed an s32.
 
 Much better to do
 
   if (cbus_pid  0)

I will do it after above issues resolved.

  +void __devexit cbus_fini(void)
  +{
  +   int i;
  +   struct cbus_event_container *c;
  +
  +   cbus_need_exit = 1;
  +   kill_proc(cbus_pid, SIGTERM, 0);
  +   wait_for_completion(cbus_thread_exited);
  +   
  +   for_each_cpu(i) {
  +   c = per_cpu(cbus_event_list, i);
  +   cbus_fini_event_container(c);
  +   }
  +}
 
 I think this is racy.  What stops new events from being queued while this
 function is in progress?

cbus_insert() should check need_exit flag - patch exists, 
but against my tree, so I wait untill CBUS showed in public,
so I can resync with it.

-- 
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


signature.asc
Description: This is a digitally signed message part


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-31 Thread Andrew Morton
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

   +static int cbus_event_thread(void *data)
+{
+int i, non_empty = 0, empty = 0;
+struct cbus_event_container *c;
+
+daemonize(cbus_name);
+allow_signal(SIGTERM);
+set_user_nice(current, 19);
   
   Please use the kthread api for managing this thread.
   
   Is a new kernel thread needed?
 
  Logic behind cbus is following: 
  1. make insert operation return as soon as possible,
  2. deferring actual message delivering to the safe time
 
  That thread does second point.

But does it need a new thread rather than using the existing keventd?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-31 Thread Evgeniy Polyakov
On Thu, 2005-03-31 at 23:26 -0800, Andrew Morton wrote:
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
+static int cbus_event_thread(void *data)
 +{
 +  int i, non_empty = 0, empty = 0;
 +  struct cbus_event_container *c;
 +
 +  daemonize(cbus_name);
 +  allow_signal(SIGTERM);
 +  set_user_nice(current, 19);

Please use the kthread api for managing this thread.

Is a new kernel thread needed?
  
   Logic behind cbus is following: 
   1. make insert operation return as soon as possible,
   2. deferring actual message delivering to the safe time
  
   That thread does second point.
 
 But does it need a new thread rather than using the existing keventd?

Yes, it is much cleaner [especially from performance tuning point] 
to use own kernel thread than pospone all work to the queued work.

-- 
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


signature.asc
Description: This is a digitally signed message part


[1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-20 Thread Evgeniy Polyakov
Hello, developers.

I'm pleased to annouce CBUS - ultra fast (for insert operations)
message bus.

This message bus allows message passing between different agents
using connector's infrastructure.
It is extremly fast for insert operations so it can be used in performance
critical pathes instead of direct connector's methods calls.

CBUS uses per CPU variables and thus allows message reordering,
caller must be prepared (and use CPU id in it's messages).

Usage is very simple - just call cbus_insert(struct cn_msg *msg);

Benchmark with modified fork connector and fork bomb on 2-way system
did not show any differencies between vanilla 2.6.11 and CBUS.

--- ./drivers/connector/Kconfig.orig2005-03-20 11:11:27.0 +0300
+++ ./drivers/connector/Kconfig 2005-03-20 11:15:16.0 +0300
@@ -10,4 +10,18 @@
  Connector support can also be built as a module.  If so, the module
  will be called cn.ko.
 
+config CBUS
+   tristate "CBUS - ultra fast (for insert operations) message bus based 
on connector"
+   depends on CONNECTOR
+   ---help---
+ This message bus allows message passing between different agents
+ using connector's infrastructure.
+ It is extremly fast for insert operations so it can be used in 
performance
+ critical pathes instead of direct connector's methods calls.
+
+ CBUS uses per CPU variables and thus allows message reordering,
+ caller must be prepared (and use CPU id in it's messages).
+ 
+ CBUS support can also be built as a module.  If so, the module
+ will be called cbus.
 endmenu
--- ./drivers/connector/Makefile.orig   2005-03-20 11:10:59.0 +0300
+++ ./drivers/connector/Makefile2005-03-20 11:11:17.0 +0300
@@ -1,2 +1,3 @@
 obj-$(CONFIG_CONNECTOR)+= cn.o
+obj-$(CONFIG_CBUS) += cbus.o
 cn-objs:= cn_queue.o connector.o
--- /dev/null   2004-09-17 14:58:06.0 +0400
+++ ./drivers/connector/cbus.c  2005-03-20 11:09:25.0 +0300
@@ -0,0 +1,247 @@
+/*
+ * cbus.c
+ * 
+ * 2005 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]>
+ * All rights reserved.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Evgeniy Polyakov <[EMAIL PROTECTED]>");
+MODULE_DESCRIPTION("Ultrafast message bus based on kernel connector.");
+
+static DEFINE_PER_CPU(struct cbus_event_container, cbus_event_list);
+static int cbus_pid, cbus_need_exit;
+static struct completion cbus_thread_exited;
+static DECLARE_WAIT_QUEUE_HEAD(cbus_wait_queue);
+
+static char cbus_name[] = "cbus";
+
+struct cbus_event_container
+{
+   struct list_headevent_list;
+   spinlock_t  event_lock;
+   int qlen;
+};
+
+struct cbus_event
+{
+   struct list_headevent_entry;
+   u32 cpu;
+   struct cn_msg   msg;
+};
+
+static inline struct cbus_event *__cbus_dequeue(struct cbus_event_container *c)
+{
+   struct list_head *next = c->event_list.next;
+
+   list_del(next);
+   c->qlen--;
+
+   if (c->qlen < 0) {
+   printk(KERN_ERR "%s: qlen=%d after dequeue on CPU%u.\n",
+   cbus_name, c->qlen, smp_processor_id());
+   c->qlen = 0;
+   }
+   
+   return list_entry(next, struct cbus_event, event_entry);
+}
+
+static inline struct cbus_event *cbus_dequeue(struct cbus_event_container *c)
+{
+   struct cbus_event *event;
+   unsigned long flags;
+   
+   if (list_empty(>event_list))
+   return NULL;
+   
+   spin_lock_irqsave(>event_lock, flags);
+   event = __cbus_dequeue(c);
+   spin_unlock_irqrestore(>event_lock, flags);
+
+   return event;
+}
+
+static inline void __cbus_enqueue(struct cbus_event_container *c, struct 
cbus_event *event)
+{
+   list_add_tail(>event_entry, >event_list);
+   c->qlen++;
+}
+
+static int cbus_enqueue(struct cbus_event_container *c, struct cn_msg *msg)
+{
+   int err;
+   struct cbus_event *event;
+   unsigned long flags;
+
+   event = kmalloc(sizeof(*event) + msg->len, GFP_ATOMIC);
+   if (!event) {
+   err 

[1/1] CBUS: new very fast (for insert operations) message bus based on kenel connector.

2005-03-20 Thread Evgeniy Polyakov
Hello, developers.

I'm pleased to annouce CBUS - ultra fast (for insert operations)
message bus.

This message bus allows message passing between different agents
using connector's infrastructure.
It is extremly fast for insert operations so it can be used in performance
critical pathes instead of direct connector's methods calls.

CBUS uses per CPU variables and thus allows message reordering,
caller must be prepared (and use CPU id in it's messages).

Usage is very simple - just call cbus_insert(struct cn_msg *msg);

Benchmark with modified fork connector and fork bomb on 2-way system
did not show any differencies between vanilla 2.6.11 and CBUS.

--- ./drivers/connector/Kconfig.orig2005-03-20 11:11:27.0 +0300
+++ ./drivers/connector/Kconfig 2005-03-20 11:15:16.0 +0300
@@ -10,4 +10,18 @@
  Connector support can also be built as a module.  If so, the module
  will be called cn.ko.
 
+config CBUS
+   tristate CBUS - ultra fast (for insert operations) message bus based 
on connector
+   depends on CONNECTOR
+   ---help---
+ This message bus allows message passing between different agents
+ using connector's infrastructure.
+ It is extremly fast for insert operations so it can be used in 
performance
+ critical pathes instead of direct connector's methods calls.
+
+ CBUS uses per CPU variables and thus allows message reordering,
+ caller must be prepared (and use CPU id in it's messages).
+ 
+ CBUS support can also be built as a module.  If so, the module
+ will be called cbus.
 endmenu
--- ./drivers/connector/Makefile.orig   2005-03-20 11:10:59.0 +0300
+++ ./drivers/connector/Makefile2005-03-20 11:11:17.0 +0300
@@ -1,2 +1,3 @@
 obj-$(CONFIG_CONNECTOR)+= cn.o
+obj-$(CONFIG_CBUS) += cbus.o
 cn-objs:= cn_queue.o connector.o
--- /dev/null   2004-09-17 14:58:06.0 +0400
+++ ./drivers/connector/cbus.c  2005-03-20 11:09:25.0 +0300
@@ -0,0 +1,247 @@
+/*
+ * cbus.c
+ * 
+ * 2005 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED]
+ * All rights reserved.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include linux/kernel.h
+#include linux/module.h
+#include linux/connector.h
+#include linux/list.h
+#include linux/moduleparam.h
+
+MODULE_LICENSE(GPL);
+MODULE_AUTHOR(Evgeniy Polyakov [EMAIL PROTECTED]);
+MODULE_DESCRIPTION(Ultrafast message bus based on kernel connector.);
+
+static DEFINE_PER_CPU(struct cbus_event_container, cbus_event_list);
+static int cbus_pid, cbus_need_exit;
+static struct completion cbus_thread_exited;
+static DECLARE_WAIT_QUEUE_HEAD(cbus_wait_queue);
+
+static char cbus_name[] = cbus;
+
+struct cbus_event_container
+{
+   struct list_headevent_list;
+   spinlock_t  event_lock;
+   int qlen;
+};
+
+struct cbus_event
+{
+   struct list_headevent_entry;
+   u32 cpu;
+   struct cn_msg   msg;
+};
+
+static inline struct cbus_event *__cbus_dequeue(struct cbus_event_container *c)
+{
+   struct list_head *next = c-event_list.next;
+
+   list_del(next);
+   c-qlen--;
+
+   if (c-qlen  0) {
+   printk(KERN_ERR %s: qlen=%d after dequeue on CPU%u.\n,
+   cbus_name, c-qlen, smp_processor_id());
+   c-qlen = 0;
+   }
+   
+   return list_entry(next, struct cbus_event, event_entry);
+}
+
+static inline struct cbus_event *cbus_dequeue(struct cbus_event_container *c)
+{
+   struct cbus_event *event;
+   unsigned long flags;
+   
+   if (list_empty(c-event_list))
+   return NULL;
+   
+   spin_lock_irqsave(c-event_lock, flags);
+   event = __cbus_dequeue(c);
+   spin_unlock_irqrestore(c-event_lock, flags);
+
+   return event;
+}
+
+static inline void __cbus_enqueue(struct cbus_event_container *c, struct 
cbus_event *event)
+{
+   list_add_tail(event-event_entry, c-event_list);
+   c-qlen++;
+}
+
+static int cbus_enqueue(struct cbus_event_container *c, struct cn_msg *msg)
+{
+   int err;
+   struct cbus_event *event;
+   unsigned long flags;
+
+   event = kmalloc(sizeof(*event) +