Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-15 Thread Peter Zijlstra
On Mon, Jul 14, 2014 at 12:50:50PM -0700, Tim Chen wrote:

 There is a generic multi-buffer infrastructure portion that manages
 pulling and queuing jobs on the crypto workqueue, and it is separated out
 in patch 1 of the patchset.

There's one very weird multi-line comment in that patch.

 The other portions are algorithm specific that defines
 algorithm specific data structure and  does the crypto computation 
 for a particular algorithm, mostly in
 assemblies and C glue code.  The infrastructure code is 
 meant to be reused for other similar 
 multi-buffer algorithms.

The flushing part that uses the sched thing is sha1 specific, even
though it strikes me as not being so. Flushing buffers on idle seems
like a 'generic' thing.

 We use nr_running_cpu to check whether there are other tasks running on
 the *current* cpu, (not for another cpu),

And yet, the function allows you do to exactly that..

 to decide if we should flush
 and compute crypto jobs accumulated.  If there's nobody else running,
 we can take advantage of available cpu cycles on the cpu we are running 
 on to do computation on the existing jobs in a SIMD mannner. 
 Waiting a bit longer may accumulate more jobs to process in parallel
 in a single SIMD instruction, but will have more delay.  

So you already have an idle notifier (which is x86 only, we should fix
that I suppose), and you then double check there really isn't anything
else running.

How much, if anything, does that second check buy you? There's just not
a single word on that.

Also, there is not a word on the latency vs throughput tradeoff you
make. I can imagine that for very short idle durations you loose, not
win with this thing.

So for now I still see no reason for doing this.

Also, I wonder about SMT, the point of this is to make best use of the
SIMD pipelines, does it still make sense to use siblings at the same
time even though you're running hand crafted ASM to stuff the pipelines
to the brim? Should this thing be SMT aware and not gather queues for
both siblings?


pgpPiBvaLyXVd.pgp
Description: PGP signature


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-15 Thread Peter Zijlstra
On Tue, Jul 15, 2014 at 11:50:45AM +0200, Peter Zijlstra wrote:
 So you already have an idle notifier (which is x86 only, we should fix
 that I suppose), and you then double check there really isn't anything
 else running.

Note that we've already done a large part of the expense of going idle
by the time we call that idle notifier -- in specific, we've
reprogrammed the clock to stop the tick.

Its really wasteful to then generate work again, which means we have to
again reprogram the clock etc.


pgp6Nc_x_nwSe.pgp
Description: PGP signature


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-15 Thread Peter Zijlstra
On Tue, Jul 15, 2014 at 04:45:25PM +0200, Mike Galbraith wrote:
 
 3.0.101-default3.753363 usecs/loop -- avg 3.770737 530.4 KHz   1.000
 3.14.10-default4.145348 usecs/loop -- avg 4.139987 483.1 KHz.910  
1.000
 3.15.4-default 4.355594 usecs/loop -- avg 4.351961 459.6 KHz.866  
 .9511.000
 3.16.0-default 4.537279 usecs/loop -- avg 4.543532 440.2 KHz.829  
 .911 .957
 
 3.0.101-smp3.692377 usecs/loop -- avg 3.690774 541.9 KHz   1.000
 3.14.10-smp4.010009 usecs/loop -- avg 4.009019 498.9 KHz.920
 3.15.4-smp 3.882398 usecs/loop -- avg 3.884095 514.9 KHz.950
 3.16.0-master  4.061003 usecs/loop -- avg 4.058244 492.8 KHz.909

Urgh,.. I need to go fix that :/


pgpReCZArzLnp.pgp
Description: PGP signature


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-15 Thread Mike Galbraith
On Tue, 2014-07-15 at 21:03 +0200, Peter Zijlstra wrote: 
 On Tue, Jul 15, 2014 at 08:06:55PM +0200, Mike Galbraith wrote:
  On Tue, 2014-07-15 at 16:53 +0200, Peter Zijlstra wrote: 
   On Tue, Jul 15, 2014 at 04:45:25PM +0200, Mike Galbraith wrote:

3.0.101-default3.753363 usecs/loop -- avg 3.770737 530.4 KHz   
1.000
3.14.10-default4.145348 usecs/loop -- avg 4.139987 483.1 KHz
.910 1.000
3.15.4-default 4.355594 usecs/loop -- avg 4.351961 459.6 KHz
.866  .9511.000
3.16.0-default 4.537279 usecs/loop -- avg 4.543532 440.2 KHz
.829  .911 .957

3.0.101-smp3.692377 usecs/loop -- avg 3.690774 541.9 KHz   
1.000
3.14.10-smp4.010009 usecs/loop -- avg 4.009019 498.9 KHz
.920
3.15.4-smp 3.882398 usecs/loop -- avg 3.884095 514.9 KHz
.950
3.16.0-master  4.061003 usecs/loop -- avg 4.058244 492.8 KHz
.909
   
   Urgh,.. I need to go fix that :/
  
  I'm poking about.  It's not just one thing 'course, just lots of change
  adding up to less than wonderful.  Idle changes are costing some, for
  obese config, avg goop.  The select_next_task() reorganization appears
  to be costing, but eyeballing, I can see no excuse for that at all.
 
 How is the idle stuff costing, cpu-affine pipe-test should pretty much
 peg a cpu at 100%, right? Or did I mis-understand and are you running a
 loose pipe-test?

Exactly, I'm measuring the cost of popping in and out of idle while
trying to reclaim overlap (which doesn't exist in pipe-test case).

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-15 Thread Thomas Gleixner
On Tue, 15 Jul 2014, Tim Chen wrote:
 On Tue, 2014-07-15 at 14:59 +0200, Thomas Gleixner wrote:
  On Tue, 15 Jul 2014, Peter Zijlstra wrote:
  
   On Tue, Jul 15, 2014 at 11:50:45AM +0200, Peter Zijlstra wrote:
So you already have an idle notifier (which is x86 only, we should fix
that I suppose), and you then double check there really isn't anything
else running.
   
   Note that we've already done a large part of the expense of going idle
   by the time we call that idle notifier -- in specific, we've
   reprogrammed the clock to stop the tick.
   
   Its really wasteful to then generate work again, which means we have to
   again reprogram the clock etc.
  
  Doing anything which is not related to idle itself in the idle
  notifier is just plain wrong.
 
 I don't like the kicking the multi-buffer job flush using idle_notifier
 path either.  I'll try another version of the patch by doing this in the
 multi-buffer job handler path.
  
  
  If that stuff wants to utilize idle slots, we really need to come up
  with a generic and general solution. Otherwise we'll grow those warts
  all over the architecture space, with slightly different ways of
  wreckaging the world an some more.
  
  This whole attidute of people thinking that they need their own
  specialized scheduling around the real scheduler is a PITA. All this
  stuff is just damanging any sensible approach of power saving, load
  balancing, etc.
  
  What we really want is infrastructure, which allows the scheduler to
  actively query the async work situation and based on the results
  actively decide when to process it and where.
 
 I agree with you.  It will be great if we have such infrastructure. 

You are heartly invited to come up with that. :)
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Tim Chen
On Mon, 2014-07-14 at 12:16 +0200, Peter Zijlstra wrote:
 On Fri, Jul 11, 2014 at 01:33:04PM -0700, Tim Chen wrote:
  This function will help a thread decide if it wants to to do work
  that can be delayed, to accumulate more tasks for more efficient
  batch processing later.
  
  However, if no other tasks are running on the cpu, it can take
  advantgae of the available cpu cycles to complete the tasks
  for immediate processing to minimize delay, otherwise it will yield.
 
 Ugh.. and ignore topology and everything else.
 
 Yet another scheduler on top of the scheduler.
 
 We have the padata muck, also only ever used by crypto.
 We have the workqueue nonsense, used all over the place
 And we have btrfs doing their own padata like muck.
 And I'm sure there's at least one more out there, just because.
 
 Why do we want yet another thing?
 
 I'm inclined to go NAK and get people to reduce the amount of async
 queueing and processing crap.

The mult-buffer class of crypto algorithms is by nature
asynchronous.  The algorithm gathers several crypto jobs, and
put the buffer from each job in a data lane of the SIMD register.
This allows for parallel processing and increases throughput.
The gathering of the crypto jobs is an async process and
queuing is necessary for this class of algorithm.

Tim

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Peter Zijlstra
On Mon, Jul 14, 2014 at 09:10:14AM -0700, Tim Chen wrote:
 On Mon, 2014-07-14 at 12:16 +0200, Peter Zijlstra wrote:
  On Fri, Jul 11, 2014 at 01:33:04PM -0700, Tim Chen wrote:
   This function will help a thread decide if it wants to to do work
   that can be delayed, to accumulate more tasks for more efficient
   batch processing later.
   
   However, if no other tasks are running on the cpu, it can take
   advantgae of the available cpu cycles to complete the tasks
   for immediate processing to minimize delay, otherwise it will yield.
  
  Ugh.. and ignore topology and everything else.
  
  Yet another scheduler on top of the scheduler.
  
  We have the padata muck, also only ever used by crypto.
  We have the workqueue nonsense, used all over the place
  And we have btrfs doing their own padata like muck.
  And I'm sure there's at least one more out there, just because.
  
  Why do we want yet another thing?
  
  I'm inclined to go NAK and get people to reduce the amount of async
  queueing and processing crap.
 
 The mult-buffer class of crypto algorithms is by nature
 asynchronous.  The algorithm gathers several crypto jobs, and
 put the buffer from each job in a data lane of the SIMD register.
 This allows for parallel processing and increases throughput.
 The gathering of the crypto jobs is an async process and
 queuing is necessary for this class of algorithm.

How is that related to me saying we've got too much of this crap
already?


pgpjuRya9kDxR.pgp
Description: PGP signature


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Tim Chen
On Mon, 2014-07-14 at 18:14 +0200, Peter Zijlstra wrote:
 On Mon, Jul 14, 2014 at 09:10:14AM -0700, Tim Chen wrote:
  On Mon, 2014-07-14 at 12:16 +0200, Peter Zijlstra wrote:
   On Fri, Jul 11, 2014 at 01:33:04PM -0700, Tim Chen wrote:
This function will help a thread decide if it wants to to do work
that can be delayed, to accumulate more tasks for more efficient
batch processing later.

However, if no other tasks are running on the cpu, it can take
advantgae of the available cpu cycles to complete the tasks
for immediate processing to minimize delay, otherwise it will yield.
   
   Ugh.. and ignore topology and everything else.
   
   Yet another scheduler on top of the scheduler.
   
   We have the padata muck, also only ever used by crypto.
   We have the workqueue nonsense, used all over the place
   And we have btrfs doing their own padata like muck.
   And I'm sure there's at least one more out there, just because.
   
   Why do we want yet another thing?
   
   I'm inclined to go NAK and get people to reduce the amount of async
   queueing and processing crap.
  
  The mult-buffer class of crypto algorithms is by nature
  asynchronous.  The algorithm gathers several crypto jobs, and
  put the buffer from each job in a data lane of the SIMD register.
  This allows for parallel processing and increases throughput.
  The gathering of the crypto jobs is an async process and
  queuing is necessary for this class of algorithm.
 
 How is that related to me saying we've got too much of this crap
 already?

I was trying to explain why the algorithm is implemented this way
because of its batching nature.

There is a whole class of async algorithm that can provide
substantial speedup by doing batch processing and uses workqueue.
The multi-buffer sha1 version has 2.2x speedup over existing
AVX2 version, and can have even more speedup when AVX3 
comes round.  Workqueue is a natural way to implement 
this.  I don't think a throughput speedup of 2.2x is crap.

We are not inventing anything new, but ask for a 
very simple helper function to know if there's something else
running on our cpu to help us make a better decision 
of whether we should flush the batched jobs immediately.

And also asynchronous crypto interface is already used substantially
in crypto and has a well established infrastructure.  

Thanks.

Tim


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Tim Chen
On Sat, 2014-07-12 at 13:25 +0400, Kirill Tkhai wrote:

   
  +unsigned long nr_running_cpu(int cpu)
  +{
  +   if (cpumask_test_cpu(cpu, cpu_online_mask))
  +   return cpu_rq(cpu)-nr_running;
  +   else
  +   return 0;
  +}
  +
 
 Offline cpu should have nr_running equal to 0. We park last
 enqueued thread (migration_thread) at the end of take_cpu_down().
 
 So, it's enough to return cpu_rq(cpu)-nr_running.

Thanks.  This seems reasonable.

Tim

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Peter Zijlstra
On Mon, Jul 14, 2014 at 10:05:34AM -0700, Tim Chen wrote:
 I was trying to explain why the algorithm is implemented this way
 because of its batching nature.
 
 There is a whole class of async algorithm that can provide
 substantial speedup by doing batch processing and uses workqueue.
 The multi-buffer sha1 version has 2.2x speedup over existing
 AVX2 version, and can have even more speedup when AVX3 
 comes round.  Workqueue is a natural way to implement 
 this.  I don't think a throughput speedup of 2.2x is crap.
 
 We are not inventing anything new, but ask for a 
 very simple helper function to know if there's something else
 running on our cpu to help us make a better decision 
 of whether we should flush the batched jobs immediately.
 
 And also asynchronous crypto interface is already used substantially
 in crypto and has a well established infrastructure.  

The crap I was talking about is that there's a metric ton of 'async'
interfaces all different.

Your multi-buffer thing isn't generic either, it seems lmiited to sha1.
It does not reuse padata, it does not extend workqueues, it does not
remove the btrfs nonsense, it adds yet anotehr thing.


pgpnmUH25Ciz9.pgp
Description: PGP signature


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Tim Chen
On Mon, 2014-07-14 at 20:17 +0200, Peter Zijlstra wrote:
 On Mon, Jul 14, 2014 at 10:05:34AM -0700, Tim Chen wrote:
  I was trying to explain why the algorithm is implemented this way
  because of its batching nature.
  
  There is a whole class of async algorithm that can provide
  substantial speedup by doing batch processing and uses workqueue.
  The multi-buffer sha1 version has 2.2x speedup over existing
  AVX2 version, and can have even more speedup when AVX3 
  comes round.  Workqueue is a natural way to implement 
  this.  I don't think a throughput speedup of 2.2x is crap.
  
  We are not inventing anything new, but ask for a 
  very simple helper function to know if there's something else
  running on our cpu to help us make a better decision 
  of whether we should flush the batched jobs immediately.
  
  And also asynchronous crypto interface is already used substantially
  in crypto and has a well established infrastructure.  
 
 The crap I was talking about is that there's a metric ton of 'async'
 interfaces all different.

Async interfaces when used appropriately, actually speed things up
substantially for crypto.  We actually have a case with
ecyrptfs not using the async crypto interface, causing cpu to stall
and slowing things down substantially with AES-NI.  And async interface
with workqueue speed things up (30% to 35% on encryption with SSD). 
http://marc.info/?l=ecryptfs-usersm=136520541407248 
http://www.spinics.net/lists/ecryptfs/msg00228.html 
 
 
 Your multi-buffer thing isn't generic either, it seems lmiited to sha1.

We actually have many other multi-buffer crypto algorithms already
published for encryption and other IPSec usages.  So 
multi-buffer algorithm is not just limited to SHA1.
We hope to port those to the kernel crypto library eventually.
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-multi-buffer-ipsec-implementations-ia-processors-paper.pdf
 

 It does not reuse padata, 
padata tries to speed things up by parallelizing jobs to *multiple*
cpus.  Whereas multi-buffer tries to speed things up by speeding things
up by using multiple data lanes in SIMD register in a *single* cpu.
These two usages are complementary but not the same.

 it does not extend workqueues, 
Why do I need to extend workqueues if the existing ones already
meet my needs?

 it does not
 remove the btrfs nonsense, 
Not much I can do about btrfs as I don't understand the issues there.

 it adds yet anotehr thing.


Thanks.

Tim




--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Peter Zijlstra
On Mon, Jul 14, 2014 at 12:08:28PM -0700, Tim Chen wrote:
 On Mon, 2014-07-14 at 20:17 +0200, Peter Zijlstra wrote:

  Your multi-buffer thing isn't generic either, it seems lmiited to sha1.
 
 We actually have many other multi-buffer crypto algorithms already
 published for encryption and other IPSec usages.  So 
 multi-buffer algorithm is not just limited to SHA1.
 We hope to port those to the kernel crypto library eventually.
 http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-multi-buffer-ipsec-implementations-ia-processors-paper.pdf
  

That's all nice and such; but the code as I've seen in these patches is
very much sha1 specific. The mb part isn't separated out.

  It does not reuse padata, 
 padata tries to speed things up by parallelizing jobs to *multiple*
 cpus.  Whereas multi-buffer tries to speed things up by speeding things
 up by using multiple data lanes in SIMD register in a *single* cpu.
 These two usages are complementary but not the same.

And if its single cpu, wth do you need that nr_running thing for another
cpu for?

Also, this difference wasn't clear to me.

I still loathe all the async work, because it makes a mockery of
accounting etc.. but that's a story for another day I suppose :-(




pgpco9662seN9.pgp
Description: PGP signature


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Tim Chen
On Mon, 2014-07-14 at 21:15 +0200, Peter Zijlstra wrote:
 On Mon, Jul 14, 2014 at 12:08:28PM -0700, Tim Chen wrote:
  On Mon, 2014-07-14 at 20:17 +0200, Peter Zijlstra wrote:
 
   Your multi-buffer thing isn't generic either, it seems lmiited to sha1.
  
  We actually have many other multi-buffer crypto algorithms already
  published for encryption and other IPSec usages.  So 
  multi-buffer algorithm is not just limited to SHA1.
  We hope to port those to the kernel crypto library eventually.
  http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-multi-buffer-ipsec-implementations-ia-processors-paper.pdf
   
 
 That's all nice and such; but the code as I've seen in these patches is
 very much sha1 specific. The mb part isn't separated out.

There is a generic multi-buffer infrastructure portion that manages
pulling and queuing jobs on the crypto workqueue, and it is separated out
in patch 1 of the patchset.  The other portions are algorithm specific that 
defines
algorithm specific data structure and  does the crypto computation 
for a particular algorithm, mostly in
assemblies and C glue code.  The infrastructure code is 
meant to be reused for other similar 
multi-buffer algorithms.

 
   It does not reuse padata, 
  padata tries to speed things up by parallelizing jobs to *multiple*
  cpus.  Whereas multi-buffer tries to speed things up by speeding things
  up by using multiple data lanes in SIMD register in a *single* cpu.
  These two usages are complementary but not the same.
 
 And if its single cpu, wth do you need that nr_running thing for another
 cpu for?

We use nr_running_cpu to check whether there are other tasks running on
the *current* cpu, (not for another cpu), to decide if we should flush
and compute crypto jobs accumulated.  If there's nobody else running,
we can take advantage of available cpu cycles on the cpu we are running 
on to do computation on the existing jobs in a SIMD mannner. 
Waiting a bit longer may accumulate more jobs to process in parallel
in a single SIMD instruction, but will have more delay.  

 
 Also, this difference wasn't clear to me.
 
 I still loathe all the async work, because it makes a mockery of
 accounting etc.. but that's a story for another day I suppose :-(
 
 

Thanks.

Tim


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-14 Thread Tim Chen
On Sat, 2014-07-12 at 07:21 -0700, Tadeusz Struk wrote:
 On 07/11/2014 01:33 PM, Tim Chen wrote:
  +unsigned long nr_running_cpu(int cpu)
  +{
  +   if (cpumask_test_cpu(cpu, cpu_online_mask))
  +   return cpu_rq(cpu)-nr_running;
  +   else
  +   return 0;
  +}
  +
 EXPORT_SYMBOL?

Yes, thanks.

Tim

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

2014-07-12 Thread Kirill Tkhai
On 12.07.2014 00:33, Tim Chen wrote:
 This function will help a thread decide if it wants to to do work
 that can be delayed, to accumulate more tasks for more efficient
 batch processing later.
 
 However, if no other tasks are running on the cpu, it can take
 advantgae of the available cpu cycles to complete the tasks
 for immediate processing to minimize delay, otherwise it will yield.
 
 Signed-off-by: Tim Chen tim.c.c...@linux.intel.com
 ---
  include/linux/sched.h | 1 +
  kernel/sched/core.c   | 8 
  2 files changed, 9 insertions(+)
 
 diff --git a/include/linux/sched.h b/include/linux/sched.h
 index 7cb07fd..0884250 100644
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -168,6 +168,7 @@ extern int nr_threads;
  DECLARE_PER_CPU(unsigned long, process_counts);
  extern int nr_processes(void);
  extern unsigned long nr_running(void);
 +extern unsigned long nr_running_cpu(int cpu);
  extern unsigned long nr_iowait(void);
  extern unsigned long nr_iowait_cpu(int cpu);
  extern unsigned long this_cpu_load(void);
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 9cae286..d5bb8e6 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -2283,6 +2283,14 @@ unsigned long nr_running(void)
   return sum;
  }
  
 +unsigned long nr_running_cpu(int cpu)
 +{
 + if (cpumask_test_cpu(cpu, cpu_online_mask))
 + return cpu_rq(cpu)-nr_running;
 + else
 + return 0;
 +}
 +

Offline cpu should have nr_running equal to 0. We park last
enqueued thread (migration_thread) at the end of take_cpu_down().

So, it's enough to return cpu_rq(cpu)-nr_running.

  unsigned long long nr_context_switches(void)
  {
   int i;
 

Thanks,
Kirill
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html