Re: [PATCH-v3 1/4] idr: Percpu ida

2013-08-28 Thread Kent Overstreet
On Tue, Aug 20, 2013 at 02:31:57PM -0700, Andrew Morton wrote:
 On Fri, 16 Aug 2013 23:09:06 + Nicholas A. Bellinger 
 n...@linux-iscsi.org wrote:
  +   /*
  +* Bitmap of cpus that (may) have tags on their percpu freelists:
  +* steal_tags() uses this to decide when to steal tags, and which cpus
  +* to try stealing from.
  +*
  +* It's ok for a freelist to be empty when its bit is set - steal_tags()
  +* will just keep looking - but the bitmap _must_ be set whenever a
  +* percpu freelist does have tags.
  +*/
  +   unsigned long   *cpus_have_tags;
 
 Why not cpumask_t?

I hadn't encountered it before - looks like it's probably what I want.

I don't see any explanation for the parallel set of operations for
working on cpumasks - e.g. next_cpu()/cpumask_next(). For now I'm going
with the cpumask_* versions, is that what I want?o

If you can have a look at the fixup patch that'll be most appreciated.

  +   struct {
  +   spinlock_t  lock;
  +   /*
  +* When we go to steal tags from another cpu (see steal_tags()),
  +* we want to pick a cpu at random. Cycling through them every
  +* time we steal is a bit easier and more or less equivalent:
  +*/
  +   unsignedcpu_last_stolen;
  +
  +   /* For sleeping on allocation failure */
  +   wait_queue_head_t   wait;
  +
  +   /*
  +* Global freelist - it's a stack where nr_free points to the
  +* top
  +*/
  +   unsignednr_free;
  +   unsigned*freelist;
  +   } cacheline_aligned_in_smp;
 
 Why the cacheline_aligned_in_smp?

It's separating the RW stuff that isn't always touched from the RO stuff
that's used on every allocation.

 
  +};
  
  ...
 
  +
  +/* Percpu IDA */
  +
  +/*
  + * Number of tags we move between the percpu freelist and the global 
  freelist at
  + * a time
 
 between a percpu freelist would be more accurate?

No, because when we're stealing tags we always grab all of the remote
percpu freelist's tags - IDA_PCPU_BATCH_MOVE is only used when moving
to/from the global freelist.

 
  + */
  +#define IDA_PCPU_BATCH_MOVE32U
  +
  +/* Max size of percpu freelist, */
  +#define IDA_PCPU_SIZE  ((IDA_PCPU_BATCH_MOVE * 3) / 2)
  +
  +struct percpu_ida_cpu {
  +   spinlock_t  lock;
  +   unsignednr_free;
  +   unsignedfreelist[];
  +};
 
 Data structure needs documentation.  There's one of these per cpu.  I
 guess nr_free and freelist are clear enough.  The presence of a lock
 in a percpu data structure is a surprise.  It's for cross-cpu stealing,
 I assume?

Yeah, I'll add some comments.

  +static inline void alloc_global_tags(struct percpu_ida *pool,
  +struct percpu_ida_cpu *tags)
  +{
  +   move_tags(tags-freelist, tags-nr_free,
  + pool-freelist, pool-nr_free,
  + min(pool-nr_free, IDA_PCPU_BATCH_MOVE));
  +}
 
 Document this function?

Will do

  +   while (1) {
  +   spin_lock(pool-lock);
  +
  +   /*
  +* prepare_to_wait() must come before steal_tags(), in case
  +* percpu_ida_free() on another cpu flips a bit in
  +* cpus_have_tags
  +*
  +* global lock held and irqs disabled, don't need percpu lock
  +*/
  +   prepare_to_wait(pool-wait, wait, TASK_UNINTERRUPTIBLE);
  +
  +   if (!tags-nr_free)
  +   alloc_global_tags(pool, tags);
  +   if (!tags-nr_free)
  +   steal_tags(pool, tags);
  +
  +   if (tags-nr_free) {
  +   tag = tags-freelist[--tags-nr_free];
  +   if (tags-nr_free)
  +   set_bit(smp_processor_id(),
  +   pool-cpus_have_tags);
  +   }
  +
  +   spin_unlock(pool-lock);
  +   local_irq_restore(flags);
  +
  +   if (tag = 0 || !(gfp  __GFP_WAIT))
  +   break;
  +
  +   schedule();
  +
  +   local_irq_save(flags);
  +   tags = this_cpu_ptr(pool-tag_cpu);
  +   }
 
 What guarantees that this wait will terminate?

It seems fairly clear to me from the break statement a couple lines up;
if we were passed __GFP_WAIT we terminate iff we succesfully allocated a
tag. If we weren't passed __GFP_WAIT we never actually sleep.

I can add a comment if you think it needs one.

  +   finish_wait(pool-wait, wait);
  +   return tag;
  +}
  +EXPORT_SYMBOL_GPL(percpu_ida_alloc);
  +
  +/**
  + * percpu_ida_free - free a tag
  + * @pool: pool @tag was allocated from
  + * @tag: a tag previously allocated with percpu_ida_alloc()
  + *
  + * Safe to be called from interrupt context.
  + */
  +void percpu_ida_free(struct percpu_ida *pool, 

[PATCH] percpu ida: Switch to cpumask_t, add some comments

2013-08-28 Thread Kent Overstreet
Fixup patch, addressing Andrew's review feedback:

Signed-off-by: Kent Overstreet k...@daterainc.com
---
 include/linux/idr.h |  2 +-
 lib/idr.c   | 38 +-
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index f0db12b..cdf39be 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -267,7 +267,7 @@ struct percpu_ida {
 * will just keep looking - but the bitmap _must_ be set whenever a
 * percpu freelist does have tags.
 */
-   unsigned long   *cpus_have_tags;
+   cpumask_t   cpus_have_tags;
 
struct {
spinlock_t  lock;
diff --git a/lib/idr.c b/lib/idr.c
index 26495e1..15c021c 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -1178,7 +1178,13 @@ EXPORT_SYMBOL(ida_init);
 #define IDA_PCPU_SIZE  ((IDA_PCPU_BATCH_MOVE * 3) / 2)
 
 struct percpu_ida_cpu {
+   /*
+* Even though this is percpu, we need a lock for tag stealing by remote
+* CPUs:
+*/
spinlock_t  lock;
+
+   /* nr_free/freelist form a stack of free IDs */
unsignednr_free;
unsignedfreelist[];
 };
@@ -1209,21 +1215,21 @@ static inline void steal_tags(struct percpu_ida *pool,
unsigned cpus_have_tags, cpu = pool-cpu_last_stolen;
struct percpu_ida_cpu *remote;
 
-   for (cpus_have_tags = bitmap_weight(pool-cpus_have_tags, nr_cpu_ids);
+   for (cpus_have_tags = cpumask_weight(pool-cpus_have_tags);
 cpus_have_tags * IDA_PCPU_SIZE  pool-nr_tags / 2;
 cpus_have_tags--) {
-   cpu = find_next_bit(pool-cpus_have_tags, nr_cpu_ids, cpu);
+   cpu = cpumask_next(cpu, pool-cpus_have_tags);
 
-   if (cpu == nr_cpu_ids)
-   cpu = find_first_bit(pool-cpus_have_tags, nr_cpu_ids);
+   if (cpu = nr_cpu_ids)
+   cpu = cpumask_first(pool-cpus_have_tags);
 
-   if (cpu == nr_cpu_ids)
+   if (cpu = nr_cpu_ids)
BUG();
 
pool-cpu_last_stolen = cpu;
remote = per_cpu_ptr(pool-tag_cpu, cpu);
 
-   clear_bit(cpu, pool-cpus_have_tags);
+   cpumask_clear_cpu(cpu, pool-cpus_have_tags);
 
if (remote == tags)
continue;
@@ -1246,6 +1252,10 @@ static inline void steal_tags(struct percpu_ida *pool,
}
 }
 
+/*
+ * Pop up to IDA_PCPU_BATCH_MOVE IDs off the global freelist, and push them 
onto
+ * our percpu freelist:
+ */
 static inline void alloc_global_tags(struct percpu_ida *pool,
 struct percpu_ida_cpu *tags)
 {
@@ -1317,8 +1327,8 @@ int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
if (tags-nr_free) {
tag = tags-freelist[--tags-nr_free];
if (tags-nr_free)
-   set_bit(smp_processor_id(),
-   pool-cpus_have_tags);
+   cpumask_set_cpu(smp_processor_id(),
+   pool-cpus_have_tags);
}
 
spin_unlock(pool-lock);
@@ -1363,8 +1373,8 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned 
tag)
spin_unlock(tags-lock);
 
if (nr_free == 1) {
-   set_bit(smp_processor_id(),
-   pool-cpus_have_tags);
+   cpumask_set_cpu(smp_processor_id(),
+   pool-cpus_have_tags);
wake_up(pool-wait);
}
 
@@ -1398,7 +1408,6 @@ EXPORT_SYMBOL_GPL(percpu_ida_free);
 void percpu_ida_destroy(struct percpu_ida *pool)
 {
free_percpu(pool-tag_cpu);
-   kfree(pool-cpus_have_tags);
free_pages((unsigned long) pool-freelist,
   get_order(pool-nr_tags * sizeof(unsigned)));
 }
@@ -1428,7 +1437,7 @@ int percpu_ida_init(struct percpu_ida *pool, unsigned 
long nr_tags)
 
/* Guard against overflow */
if (nr_tags  (unsigned) INT_MAX + 1) {
-   pr_err(tags.c: nr_tags too large\n);
+   pr_err(percpu_ida_init(): nr_tags too large\n);
return -EINVAL;
}
 
@@ -1442,11 +1451,6 @@ int percpu_ida_init(struct percpu_ida *pool, unsigned 
long nr_tags)
 
pool-nr_free = nr_tags;
 
-   pool-cpus_have_tags = kzalloc(BITS_TO_LONGS(nr_cpu_ids) *
-  sizeof(unsigned long), GFP_KERNEL);
-   if (!pool-cpus_have_tags)
-   goto err;
-
pool-tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
   IDA_PCPU_SIZE * sizeof(unsigned),
   sizeof(unsigned));
-- 
1.8.4.rc3

___

Re: [PATCH-v3 1/4] idr: Percpu ida

2013-08-28 Thread Andrew Morton
On Wed, 28 Aug 2013 12:53:17 -0700 Kent Overstreet k...@daterainc.com wrote:

   + while (1) {
   + spin_lock(pool-lock);
   +
   + /*
   +  * prepare_to_wait() must come before steal_tags(), in case
   +  * percpu_ida_free() on another cpu flips a bit in
   +  * cpus_have_tags
   +  *
   +  * global lock held and irqs disabled, don't need percpu lock
   +  */
   + prepare_to_wait(pool-wait, wait, TASK_UNINTERRUPTIBLE);
   +
   + if (!tags-nr_free)
   + alloc_global_tags(pool, tags);
   + if (!tags-nr_free)
   + steal_tags(pool, tags);
   +
   + if (tags-nr_free) {
   + tag = tags-freelist[--tags-nr_free];
   + if (tags-nr_free)
   + set_bit(smp_processor_id(),
   + pool-cpus_have_tags);
   + }
   +
   + spin_unlock(pool-lock);
   + local_irq_restore(flags);
   +
   + if (tag = 0 || !(gfp  __GFP_WAIT))
   + break;
   +
   + schedule();
   +
   + local_irq_save(flags);
   + tags = this_cpu_ptr(pool-tag_cpu);
   + }
  
  What guarantees that this wait will terminate?
 
 It seems fairly clear to me from the break statement a couple lines up;
 if we were passed __GFP_WAIT we terminate iff we succesfully allocated a
 tag. If we weren't passed __GFP_WAIT we never actually sleep.

OK ;)  Let me rephrase.  What guarantees that a tag will become available?

If what we have here is an open-coded __GFP_NOFAIL then that is
potentially problematic.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] percpu ida: Switch to cpumask_t, add some comments

2013-08-28 Thread Andrew Morton
On Wed, 28 Aug 2013 12:55:17 -0700 Kent Overstreet k...@daterainc.com wrote:

 Fixup patch, addressing Andrew's review feedback:

Looks reasonable.

  lib/idr.c   | 38 +-

I still don't think it should be in this file.

You say that some as-yet-unmerged patches will tie the new code into
the old ida code.  But will it do it in a manner which requires that
the two reside in the same file?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH-v3 1/4] idr: Percpu ida

2013-08-28 Thread Andrew Morton
On Wed, 28 Aug 2013 13:44:54 -0700 Kent Overstreet k...@daterainc.com wrote:

What guarantees that this wait will terminate?
   
   It seems fairly clear to me from the break statement a couple lines up;
   if we were passed __GFP_WAIT we terminate iff we succesfully allocated a
   tag. If we weren't passed __GFP_WAIT we never actually sleep.
  
  OK ;)  Let me rephrase.  What guarantees that a tag will become available?
  
  If what we have here is an open-coded __GFP_NOFAIL then that is
  potentially problematic.
 
 It's the same semantics as a mempool, really - it'll succeed when a tag
 gets freed.

OK, that's reasonable if the code is being used to generate IO tags -
we expect the in-flight tags to eventually be returned.

But if a client of this code is using the allocator for something
totally different, there is no guarantee that the act of waiting will
result in any tags being returned.

(These are core design principles/constraints which should be
explicitly documented in a place where future readers will see them!)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] percpu ida: Switch to cpumask_t, add some comments

2013-08-28 Thread Andrew Morton
On Wed, 28 Aug 2013 14:00:10 -0700 Kent Overstreet k...@daterainc.com wrote:

 On Wed, Aug 28, 2013 at 01:25:50PM -0700, Andrew Morton wrote:
  On Wed, 28 Aug 2013 12:55:17 -0700 Kent Overstreet k...@daterainc.com 
  wrote:
  
   Fixup patch, addressing Andrew's review feedback:
  
  Looks reasonable.
  
lib/idr.c   | 38 +-
  
  I still don't think it should be in this file.
  
  You say that some as-yet-unmerged patches will tie the new code into
  the old ida code.  But will it do it in a manner which requires that
  the two reside in the same file?
 
 Not require, no - but it's just intimate enough with my ida rewrite that
 I think it makes sense; it makes some use of stuff that should be
 internal to the ida code.
 
 Mostly just sharing the lock though, since I got rid of the ida
 interfaces that don't do locking, but percpu ida needs a lock that also
 covers what ida needs.
 
 It also makes use of a ganged allocation interface, but there's no real
 reason ida can't expose that, it's just unlikely to be useful to
 anything but percpu ida.
 
 The other reason I think it makes sense to live in idr.c is more for
 users of the code; as you pointed out as far as the user's perspective
 percpu ida isn't doing anything fundamentally different from ida, so I
 think it makes sense for the code to live in the same place as a
 kindness to future kernel developers who are trying to find their way
 around the various library code.

I found things to be quite the opposite - it took 5 minutes of staring,
head-scratching, double-checking and penny-dropping before I was
confident that the newly-added code actually has nothing at all to do
with the current code.  Putting it in the same file was misleading, and
I got misled.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH-v3 1/4] idr: Percpu ida

2013-08-28 Thread Kent Overstreet
On Wed, Aug 28, 2013 at 01:50:42PM -0700, Andrew Morton wrote:
 On Wed, 28 Aug 2013 13:44:54 -0700 Kent Overstreet k...@daterainc.com wrote:
 
 What guarantees that this wait will terminate?

It seems fairly clear to me from the break statement a couple lines up;
if we were passed __GFP_WAIT we terminate iff we succesfully allocated a
tag. If we weren't passed __GFP_WAIT we never actually sleep.
   
   OK ;)  Let me rephrase.  What guarantees that a tag will become available?
   
   If what we have here is an open-coded __GFP_NOFAIL then that is
   potentially problematic.
  
  It's the same semantics as a mempool, really - it'll succeed when a tag
  gets freed.
 
 OK, that's reasonable if the code is being used to generate IO tags -
 we expect the in-flight tags to eventually be returned.
 
 But if a client of this code is using the allocator for something
 totally different, there is no guarantee that the act of waiting will
 result in any tags being returned.

Yeah, and I did wonder a bit whether the waiting mechanism belonged in
the percpu ida code; arguably (certainly just looking at this code, not
any of the users) if it belongs in this code it should be common to
regular ida, not specific to percpu ida.

For now I've just decided to punt on changing that for now, since all
the percpu ida users I've come across do want the waiting mechanism, but
none of the regular ida users that I've looked at want it. There's
probably a reason for that I haven't thought of yet.

 (These are core design principles/constraints which should be
 explicitly documented in a place where future readers will see them!)

*nod* I suppose it should be said explicitly that the gfp_t parameter
indicates whether or not to wait until a _tag_ is available, and not
some internal memory allocation or something.

How's this look?

diff --git a/lib/idr.c b/lib/idr.c
index 15c021c..a3f8e9a 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -1288,6 +1288,11 @@ static inline unsigned alloc_local_tag(struct percpu_ida 
*pool,
  * Safe to be called from interrupt context (assuming it isn't passed
  * __GFP_WAIT, of course).
  *
+ * @gfp indicates whether or not to wait until a free id is available (it's not
+ * used for internal memory allocations); thus if passed __GFP_WAIT we may 
sleep
+ * however long it takes until another thread frees an id (same semantics as a
+ * mempool).
+ *
  * Will not fail if passed __GFP_WAIT.
  */
 int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH-v3 1/4] idr: Percpu ida

2013-08-28 Thread Andrew Morton
On Wed, 28 Aug 2013 14:12:17 -0700 Kent Overstreet k...@daterainc.com wrote:

 How's this look?
 
 diff --git a/lib/idr.c b/lib/idr.c
 index 15c021c..a3f8e9a 100644
 --- a/lib/idr.c
 +++ b/lib/idr.c
 @@ -1288,6 +1288,11 @@ static inline unsigned alloc_local_tag(struct 
 percpu_ida *pool,
   * Safe to be called from interrupt context (assuming it isn't passed
   * __GFP_WAIT, of course).
   *
 + * @gfp indicates whether or not to wait until a free id is available (it's 
 not
 + * used for internal memory allocations); thus if passed __GFP_WAIT we may 
 sleep
 + * however long it takes until another thread frees an id (same semantics as 
 a
 + * mempool).

Looks good.  Mentioning the mempool thing is effective - people
understand that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] percpu ida: Switch to cpumask_t, add some comments

2013-08-28 Thread Kent Overstreet
On Wed, Aug 28, 2013 at 02:10:19PM -0700, Andrew Morton wrote:
 On Wed, 28 Aug 2013 14:00:10 -0700 Kent Overstreet k...@daterainc.com wrote:
 
  On Wed, Aug 28, 2013 at 01:25:50PM -0700, Andrew Morton wrote:
   On Wed, 28 Aug 2013 12:55:17 -0700 Kent Overstreet k...@daterainc.com 
   wrote:
   
Fixup patch, addressing Andrew's review feedback:
   
   Looks reasonable.
   
 lib/idr.c   | 38 +-
   
   I still don't think it should be in this file.
   
   You say that some as-yet-unmerged patches will tie the new code into
   the old ida code.  But will it do it in a manner which requires that
   the two reside in the same file?
  
  Not require, no - but it's just intimate enough with my ida rewrite that
  I think it makes sense; it makes some use of stuff that should be
  internal to the ida code.
  
  Mostly just sharing the lock though, since I got rid of the ida
  interfaces that don't do locking, but percpu ida needs a lock that also
  covers what ida needs.
  
  It also makes use of a ganged allocation interface, but there's no real
  reason ida can't expose that, it's just unlikely to be useful to
  anything but percpu ida.
  
  The other reason I think it makes sense to live in idr.c is more for
  users of the code; as you pointed out as far as the user's perspective
  percpu ida isn't doing anything fundamentally different from ida, so I
  think it makes sense for the code to live in the same place as a
  kindness to future kernel developers who are trying to find their way
  around the various library code.
 
 I found things to be quite the opposite - it took 5 minutes of staring,
 head-scratching, double-checking and penny-dropping before I was
 confident that the newly-added code actually has nothing at all to do
 with the current code.  Putting it in the same file was misleading, and
 I got misled.

Ok... and I could see how the fact that it currently _doesn't_ have
anything to do with the existing code would be confusing...

Do you think that if/when it's making use of the ida rewrite it'll be
ok? Or would you still prefer to have it in a new file (and if so, any
preference on the naming?)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] percpu ida: Switch to cpumask_t, add some comments

2013-08-28 Thread Andrew Morton
On Wed, 28 Aug 2013 14:23:58 -0700 Kent Overstreet k...@daterainc.com wrote:

  I found things to be quite the opposite - it took 5 minutes of staring,
  head-scratching, double-checking and penny-dropping before I was
  confident that the newly-added code actually has nothing at all to do
  with the current code.  Putting it in the same file was misleading, and
  I got misled.
 
 Ok... and I could see how the fact that it currently _doesn't_ have
 anything to do with the existing code would be confusing...
 
 Do you think that if/when it's making use of the ida rewrite it'll be
 ok? Or would you still prefer to have it in a new file

I'm constitutionally reluctant to ever assume that any out-of-tree code
will be merged.  Maybe you'll get hit by a bus, and maybe the code
sucks ;)

Are you sure that the two things are so tangled together that they must
live in the same file?  If there's some nice layering between ida and
percpu_ida then perhaps such a physical separation would remain
appropriate?

 (and if so, any preference on the naming?)

percpu_ida.c?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization