Re: [PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-12-06 Thread Omar Sandoval
On Thu, Dec 06, 2018 at 11:07:46AM -0500, Steven Sistare wrote:
> On 11/27/2018 8:19 PM, Omar Sandoval wrote:
> > On Tue, Nov 27, 2018 at 10:16:56AM -0500, Steven Sistare wrote:
> >> On 11/9/2018 7:50 AM, Steve Sistare wrote:
> >>> From: Steve Sistare 
> >>>
> >>> Provide struct sparsemask and functions to manipulate it.  A sparsemask is
> >>> a sparse bitmap.  It reduces cache contention vs the usual bitmap when 
> >>> many
> >>> threads concurrently set, clear, and visit elements, by reducing the 
> >>> number
> >>> of significant bits per cacheline.  For each 64 byte chunk of the mask,
> >>> only the first K bits of the first word are used, and the remaining bits
> >>> are ignored, where K is a creation time parameter.  Thus a sparsemask that
> >>> can represent a set of N elements is approximately (N/K * 64) bytes in
> >>> size.
> >>>
> >>> Signed-off-by: Steve Sistare 
> >>> ---
> >>>  include/linux/sparsemask.h | 260 
> >>> +
> >>>  lib/Makefile   |   2 +-
> >>>  lib/sparsemask.c   | 142 +
> >>>  3 files changed, 403 insertions(+), 1 deletion(-)
> >>>  create mode 100644 include/linux/sparsemask.h
> >>>  create mode 100644 lib/sparsemask.c
> >>
> >> Hi Peter and Ingo,
> >>   I need your opinion: would you prefer that I keep the new sparsemask 
> >> type, 
> >> or fold it into the existing sbitmap type?  There is some overlap between 
> >> the 
> >> two, but mostly in trivial one line functions. The main differences are:
> > 
> > Adding Jens and myself.
> > 
> >>   * sparsemask defines iterators that allow an inline loop body, like 
> >> cpumask,
> >>   whereas the sbitmap iterator forces us to define a callback function for
> >>   the body, which is awkward.
> >>
> >>   * sparsemask is slightly more efficient.  The struct and variable length
> >>   bitmap are allocated contiguously,
> > 
> > That just means you have the pointer indirection elsewhere :) The users
> > of sbitmap embed it in whatever structure they have.
>  
> Yes, the sparsemask can be embedded in one place, but in my use case I also 
> cache
> pointers to the mask from elsewhere, and those sites incur the cost of 2 
> indirections
> to perform bitmap operations.
> 
> >>   and sbitmap uses an extra field "depth"
> >>   per bitmap cacheline.
> > 
> > The depth field is memory which would otherwise be unused, and it's only
> > used for sbitmap_get(), so it doesn't have any cost if you're using it
> > like a cpumask.
> > 
> >>   * The order of arguments is different for the sparsemask accessors and
> >>   sbitmap accessors.  sparsemask mimics cpumask which is used extensively
> >>   in the sched code.
> >>
> >>   * Much of the sbitmap code supports queueing, sleeping, and waking on bit
> >>   allocation, which is N/A for scheduler load load balancing.  However, we
> >>   can call the basic functions which do not use queueing.
> >>
> >> I could add the sparsemask iterators to sbitmap (90 lines), and define
> >> a thin layer to change the argument order to mimic cpumask, but that
> >> essentially recreates sparsemask.
> > 
> > We only use sbitmap_for_each_set() in a few places. Maybe a for_each()
> > style macro would be cleaner for those users, too, in which case I
> > wouldn't be opposed to changing it. The cpumask argument order thing is
> > a annoying, though.
> > 
> >> Also, pushing sparsemask into sbitmap would limit our freedom to evolve the
> >> type to meet the future needs of sched, as sbitmap has its own maintainer,
> >> and is used by drivers, so changes to its API and ABI will be frowned upon.
> > 
> > It's a generic data structure, so of course Jens and I have no problem
> > with changing it to meet more needs :) Personally, I'd prefer to only
> > have one datastructure for this, but I suppose it depends on whether
> > Peter and Ingo think the argument order is important enough.
> 
> The argument order is a minor thing, not a blocker to adoption, but 
> efficiency 
> is important in the core scheduler code.  I actually did the work to write a
> for_each macro with inline body to sbitmap, and converted my patches to use 
> sbitmap.
> But then I noticed your very recent patch adding the cleared word to each 
> cacheline, 
> which must be loaded and ANDed with each bitset word in the for_each 
> traversal,
> adding more overhead which we don't need for the scheduler use case, on top 
> of the
> extra indirection noted above. You might add more such things in the future (a
> "deferred set" word?) to support the needs of the block drivers who are the 
> intended clients of sbitmap.
> 
> Your sbitmap is more than a simple bitmap abstraction, and for the scheduler 
> we
> just need simple.  Therefore, I propose to trim sparsemask to the bare 
> minimum,
> and move it to kernel/sched for use 
> by sched only.
>   It was 400 lines, but will
> be 200, and 80 of those are comments.
> 
> If anyone objects, please speak now.

Yes, after the recent 

Re: [PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-12-06 Thread Omar Sandoval
On Thu, Dec 06, 2018 at 11:07:46AM -0500, Steven Sistare wrote:
> On 11/27/2018 8:19 PM, Omar Sandoval wrote:
> > On Tue, Nov 27, 2018 at 10:16:56AM -0500, Steven Sistare wrote:
> >> On 11/9/2018 7:50 AM, Steve Sistare wrote:
> >>> From: Steve Sistare 
> >>>
> >>> Provide struct sparsemask and functions to manipulate it.  A sparsemask is
> >>> a sparse bitmap.  It reduces cache contention vs the usual bitmap when 
> >>> many
> >>> threads concurrently set, clear, and visit elements, by reducing the 
> >>> number
> >>> of significant bits per cacheline.  For each 64 byte chunk of the mask,
> >>> only the first K bits of the first word are used, and the remaining bits
> >>> are ignored, where K is a creation time parameter.  Thus a sparsemask that
> >>> can represent a set of N elements is approximately (N/K * 64) bytes in
> >>> size.
> >>>
> >>> Signed-off-by: Steve Sistare 
> >>> ---
> >>>  include/linux/sparsemask.h | 260 
> >>> +
> >>>  lib/Makefile   |   2 +-
> >>>  lib/sparsemask.c   | 142 +
> >>>  3 files changed, 403 insertions(+), 1 deletion(-)
> >>>  create mode 100644 include/linux/sparsemask.h
> >>>  create mode 100644 lib/sparsemask.c
> >>
> >> Hi Peter and Ingo,
> >>   I need your opinion: would you prefer that I keep the new sparsemask 
> >> type, 
> >> or fold it into the existing sbitmap type?  There is some overlap between 
> >> the 
> >> two, but mostly in trivial one line functions. The main differences are:
> > 
> > Adding Jens and myself.
> > 
> >>   * sparsemask defines iterators that allow an inline loop body, like 
> >> cpumask,
> >>   whereas the sbitmap iterator forces us to define a callback function for
> >>   the body, which is awkward.
> >>
> >>   * sparsemask is slightly more efficient.  The struct and variable length
> >>   bitmap are allocated contiguously,
> > 
> > That just means you have the pointer indirection elsewhere :) The users
> > of sbitmap embed it in whatever structure they have.
>  
> Yes, the sparsemask can be embedded in one place, but in my use case I also 
> cache
> pointers to the mask from elsewhere, and those sites incur the cost of 2 
> indirections
> to perform bitmap operations.
> 
> >>   and sbitmap uses an extra field "depth"
> >>   per bitmap cacheline.
> > 
> > The depth field is memory which would otherwise be unused, and it's only
> > used for sbitmap_get(), so it doesn't have any cost if you're using it
> > like a cpumask.
> > 
> >>   * The order of arguments is different for the sparsemask accessors and
> >>   sbitmap accessors.  sparsemask mimics cpumask which is used extensively
> >>   in the sched code.
> >>
> >>   * Much of the sbitmap code supports queueing, sleeping, and waking on bit
> >>   allocation, which is N/A for scheduler load load balancing.  However, we
> >>   can call the basic functions which do not use queueing.
> >>
> >> I could add the sparsemask iterators to sbitmap (90 lines), and define
> >> a thin layer to change the argument order to mimic cpumask, but that
> >> essentially recreates sparsemask.
> > 
> > We only use sbitmap_for_each_set() in a few places. Maybe a for_each()
> > style macro would be cleaner for those users, too, in which case I
> > wouldn't be opposed to changing it. The cpumask argument order thing is
> > a annoying, though.
> > 
> >> Also, pushing sparsemask into sbitmap would limit our freedom to evolve the
> >> type to meet the future needs of sched, as sbitmap has its own maintainer,
> >> and is used by drivers, so changes to its API and ABI will be frowned upon.
> > 
> > It's a generic data structure, so of course Jens and I have no problem
> > with changing it to meet more needs :) Personally, I'd prefer to only
> > have one datastructure for this, but I suppose it depends on whether
> > Peter and Ingo think the argument order is important enough.
> 
> The argument order is a minor thing, not a blocker to adoption, but 
> efficiency 
> is important in the core scheduler code.  I actually did the work to write a
> for_each macro with inline body to sbitmap, and converted my patches to use 
> sbitmap.
> But then I noticed your very recent patch adding the cleared word to each 
> cacheline, 
> which must be loaded and ANDed with each bitset word in the for_each 
> traversal,
> adding more overhead which we don't need for the scheduler use case, on top 
> of the
> extra indirection noted above. You might add more such things in the future (a
> "deferred set" word?) to support the needs of the block drivers who are the 
> intended clients of sbitmap.
> 
> Your sbitmap is more than a simple bitmap abstraction, and for the scheduler 
> we
> just need simple.  Therefore, I propose to trim sparsemask to the bare 
> minimum,
> and move it to kernel/sched for use 
> by sched only.
>   It was 400 lines, but will
> be 200, and 80 of those are comments.
> 
> If anyone objects, please speak now.

Yes, after the recent 

Re: [PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-12-06 Thread Steven Sistare
On 11/27/2018 8:19 PM, Omar Sandoval wrote:
> On Tue, Nov 27, 2018 at 10:16:56AM -0500, Steven Sistare wrote:
>> On 11/9/2018 7:50 AM, Steve Sistare wrote:
>>> From: Steve Sistare 
>>>
>>> Provide struct sparsemask and functions to manipulate it.  A sparsemask is
>>> a sparse bitmap.  It reduces cache contention vs the usual bitmap when many
>>> threads concurrently set, clear, and visit elements, by reducing the number
>>> of significant bits per cacheline.  For each 64 byte chunk of the mask,
>>> only the first K bits of the first word are used, and the remaining bits
>>> are ignored, where K is a creation time parameter.  Thus a sparsemask that
>>> can represent a set of N elements is approximately (N/K * 64) bytes in
>>> size.
>>>
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>  include/linux/sparsemask.h | 260 
>>> +
>>>  lib/Makefile   |   2 +-
>>>  lib/sparsemask.c   | 142 +
>>>  3 files changed, 403 insertions(+), 1 deletion(-)
>>>  create mode 100644 include/linux/sparsemask.h
>>>  create mode 100644 lib/sparsemask.c
>>
>> Hi Peter and Ingo,
>>   I need your opinion: would you prefer that I keep the new sparsemask type, 
>> or fold it into the existing sbitmap type?  There is some overlap between 
>> the 
>> two, but mostly in trivial one line functions. The main differences are:
> 
> Adding Jens and myself.
> 
>>   * sparsemask defines iterators that allow an inline loop body, like 
>> cpumask,
>>   whereas the sbitmap iterator forces us to define a callback function for
>>   the body, which is awkward.
>>
>>   * sparsemask is slightly more efficient.  The struct and variable length
>>   bitmap are allocated contiguously,
> 
> That just means you have the pointer indirection elsewhere :) The users
> of sbitmap embed it in whatever structure they have.
 
Yes, the sparsemask can be embedded in one place, but in my use case I also 
cache
pointers to the mask from elsewhere, and those sites incur the cost of 2 
indirections
to perform bitmap operations.

>>   and sbitmap uses an extra field "depth"
>>   per bitmap cacheline.
> 
> The depth field is memory which would otherwise be unused, and it's only
> used for sbitmap_get(), so it doesn't have any cost if you're using it
> like a cpumask.
> 
>>   * The order of arguments is different for the sparsemask accessors and
>>   sbitmap accessors.  sparsemask mimics cpumask which is used extensively
>>   in the sched code.
>>
>>   * Much of the sbitmap code supports queueing, sleeping, and waking on bit
>>   allocation, which is N/A for scheduler load load balancing.  However, we
>>   can call the basic functions which do not use queueing.
>>
>> I could add the sparsemask iterators to sbitmap (90 lines), and define
>> a thin layer to change the argument order to mimic cpumask, but that
>> essentially recreates sparsemask.
> 
> We only use sbitmap_for_each_set() in a few places. Maybe a for_each()
> style macro would be cleaner for those users, too, in which case I
> wouldn't be opposed to changing it. The cpumask argument order thing is
> a annoying, though.
> 
>> Also, pushing sparsemask into sbitmap would limit our freedom to evolve the
>> type to meet the future needs of sched, as sbitmap has its own maintainer,
>> and is used by drivers, so changes to its API and ABI will be frowned upon.
> 
> It's a generic data structure, so of course Jens and I have no problem
> with changing it to meet more needs :) Personally, I'd prefer to only
> have one datastructure for this, but I suppose it depends on whether
> Peter and Ingo think the argument order is important enough.

The argument order is a minor thing, not a blocker to adoption, but efficiency 
is important in the core scheduler code.  I actually did the work to write a
for_each macro with inline body to sbitmap, and converted my patches to use 
sbitmap.
But then I noticed your very recent patch adding the cleared word to each 
cacheline, 
which must be loaded and ANDed with each bitset word in the for_each traversal,
adding more overhead which we don't need for the scheduler use case, on top of 
the
extra indirection noted above. You might add more such things in the future (a
"deferred set" word?) to support the needs of the block drivers who are the 
intended clients of sbitmap.

Your sbitmap is more than a simple bitmap abstraction, and for the scheduler we
just need simple.  Therefore, I propose to trim sparsemask to the bare minimum,
and move it to kernel/sched for use 
by sched only.
  It was 400 lines, but will
be 200, and 80 of those are comments.

If anyone objects, please speak now.

- Steve

>> FWIW, here is the amount of code involved:
>>
>> include/linux/sbitmap.h
>>   250 lines basic operations
>>   284 lines for queueing
>>   ---
>>   534 lines total
>>
>> lib/sbitmap.c
>>   201 lines basic operations
>>   380 lines for queueing
>>   ---
>>   581 lines total
>>
>> include/linux/sparsemask.h
>>   260 

Re: [PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-12-06 Thread Steven Sistare
On 11/27/2018 8:19 PM, Omar Sandoval wrote:
> On Tue, Nov 27, 2018 at 10:16:56AM -0500, Steven Sistare wrote:
>> On 11/9/2018 7:50 AM, Steve Sistare wrote:
>>> From: Steve Sistare 
>>>
>>> Provide struct sparsemask and functions to manipulate it.  A sparsemask is
>>> a sparse bitmap.  It reduces cache contention vs the usual bitmap when many
>>> threads concurrently set, clear, and visit elements, by reducing the number
>>> of significant bits per cacheline.  For each 64 byte chunk of the mask,
>>> only the first K bits of the first word are used, and the remaining bits
>>> are ignored, where K is a creation time parameter.  Thus a sparsemask that
>>> can represent a set of N elements is approximately (N/K * 64) bytes in
>>> size.
>>>
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>  include/linux/sparsemask.h | 260 
>>> +
>>>  lib/Makefile   |   2 +-
>>>  lib/sparsemask.c   | 142 +
>>>  3 files changed, 403 insertions(+), 1 deletion(-)
>>>  create mode 100644 include/linux/sparsemask.h
>>>  create mode 100644 lib/sparsemask.c
>>
>> Hi Peter and Ingo,
>>   I need your opinion: would you prefer that I keep the new sparsemask type, 
>> or fold it into the existing sbitmap type?  There is some overlap between 
>> the 
>> two, but mostly in trivial one line functions. The main differences are:
> 
> Adding Jens and myself.
> 
>>   * sparsemask defines iterators that allow an inline loop body, like 
>> cpumask,
>>   whereas the sbitmap iterator forces us to define a callback function for
>>   the body, which is awkward.
>>
>>   * sparsemask is slightly more efficient.  The struct and variable length
>>   bitmap are allocated contiguously,
> 
> That just means you have the pointer indirection elsewhere :) The users
> of sbitmap embed it in whatever structure they have.
 
Yes, the sparsemask can be embedded in one place, but in my use case I also 
cache
pointers to the mask from elsewhere, and those sites incur the cost of 2 
indirections
to perform bitmap operations.

>>   and sbitmap uses an extra field "depth"
>>   per bitmap cacheline.
> 
> The depth field is memory which would otherwise be unused, and it's only
> used for sbitmap_get(), so it doesn't have any cost if you're using it
> like a cpumask.
> 
>>   * The order of arguments is different for the sparsemask accessors and
>>   sbitmap accessors.  sparsemask mimics cpumask which is used extensively
>>   in the sched code.
>>
>>   * Much of the sbitmap code supports queueing, sleeping, and waking on bit
>>   allocation, which is N/A for scheduler load load balancing.  However, we
>>   can call the basic functions which do not use queueing.
>>
>> I could add the sparsemask iterators to sbitmap (90 lines), and define
>> a thin layer to change the argument order to mimic cpumask, but that
>> essentially recreates sparsemask.
> 
> We only use sbitmap_for_each_set() in a few places. Maybe a for_each()
> style macro would be cleaner for those users, too, in which case I
> wouldn't be opposed to changing it. The cpumask argument order thing is
> a annoying, though.
> 
>> Also, pushing sparsemask into sbitmap would limit our freedom to evolve the
>> type to meet the future needs of sched, as sbitmap has its own maintainer,
>> and is used by drivers, so changes to its API and ABI will be frowned upon.
> 
> It's a generic data structure, so of course Jens and I have no problem
> with changing it to meet more needs :) Personally, I'd prefer to only
> have one datastructure for this, but I suppose it depends on whether
> Peter and Ingo think the argument order is important enough.

The argument order is a minor thing, not a blocker to adoption, but efficiency 
is important in the core scheduler code.  I actually did the work to write a
for_each macro with inline body to sbitmap, and converted my patches to use 
sbitmap.
But then I noticed your very recent patch adding the cleared word to each 
cacheline, 
which must be loaded and ANDed with each bitset word in the for_each traversal,
adding more overhead which we don't need for the scheduler use case, on top of 
the
extra indirection noted above. You might add more such things in the future (a
"deferred set" word?) to support the needs of the block drivers who are the 
intended clients of sbitmap.

Your sbitmap is more than a simple bitmap abstraction, and for the scheduler we
just need simple.  Therefore, I propose to trim sparsemask to the bare minimum,
and move it to kernel/sched for use 
by sched only.
  It was 400 lines, but will
be 200, and 80 of those are comments.

If anyone objects, please speak now.

- Steve

>> FWIW, here is the amount of code involved:
>>
>> include/linux/sbitmap.h
>>   250 lines basic operations
>>   284 lines for queueing
>>   ---
>>   534 lines total
>>
>> lib/sbitmap.c
>>   201 lines basic operations
>>   380 lines for queueing
>>   ---
>>   581 lines total
>>
>> include/linux/sparsemask.h
>>   260 

Re: [PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-11-27 Thread Omar Sandoval
On Tue, Nov 27, 2018 at 10:16:56AM -0500, Steven Sistare wrote:
> On 11/9/2018 7:50 AM, Steve Sistare wrote:
> > From: Steve Sistare 
> > 
> > Provide struct sparsemask and functions to manipulate it.  A sparsemask is
> > a sparse bitmap.  It reduces cache contention vs the usual bitmap when many
> > threads concurrently set, clear, and visit elements, by reducing the number
> > of significant bits per cacheline.  For each 64 byte chunk of the mask,
> > only the first K bits of the first word are used, and the remaining bits
> > are ignored, where K is a creation time parameter.  Thus a sparsemask that
> > can represent a set of N elements is approximately (N/K * 64) bytes in
> > size.
> > 
> > Signed-off-by: Steve Sistare 
> > ---
> >  include/linux/sparsemask.h | 260 
> > +
> >  lib/Makefile   |   2 +-
> >  lib/sparsemask.c   | 142 +
> >  3 files changed, 403 insertions(+), 1 deletion(-)
> >  create mode 100644 include/linux/sparsemask.h
> >  create mode 100644 lib/sparsemask.c
> 
> Hi Peter and Ingo,
>   I need your opinion: would you prefer that I keep the new sparsemask type, 
> or fold it into the existing sbitmap type?  There is some overlap between the 
> two, but mostly in trivial one line functions. The main differences are:

Adding Jens and myself.

>   * sparsemask defines iterators that allow an inline loop body, like cpumask,
>   whereas the sbitmap iterator forces us to define a callback function for
>   the body, which is awkward.
>
>   * sparsemask is slightly more efficient.  The struct and variable length
>   bitmap are allocated contiguously,

That just means you have the pointer indirection elsewhere :) The users
of sbitmap embed it in whatever structure they have.

>   and sbitmap uses an extra field "depth"
>   per bitmap cacheline.

The depth field is memory which would otherwise be unused, and it's only
used for sbitmap_get(), so it doesn't have any cost if you're using it
like a cpumask.

>   * The order of arguments is different for the sparsemask accessors and
>   sbitmap accessors.  sparsemask mimics cpumask which is used extensively
>   in the sched code.
> 
>   * Much of the sbitmap code supports queueing, sleeping, and waking on bit
>   allocation, which is N/A for scheduler load load balancing.  However, we
>   can call the basic functions which do not use queueing.
> 
> I could add the sparsemask iterators to sbitmap (90 lines), and define
> a thin layer to change the argument order to mimic cpumask, but that
> essentially recreates sparsemask.

We only use sbitmap_for_each_set() in a few places. Maybe a for_each()
style macro would be cleaner for those users, too, in which case I
wouldn't be opposed to changing it. The cpumask argument order thing is
a annoying, though.

> Also, pushing sparsemask into sbitmap would limit our freedom to evolve the
> type to meet the future needs of sched, as sbitmap has its own maintainer,
> and is used by drivers, so changes to its API and ABI will be frowned upon.

It's a generic data structure, so of course Jens and I have no problem
with changing it to meet more needs :) Personally, I'd prefer to only
have one datastructure for this, but I suppose it depends on whether
Peter and Ingo think the argument order is important enough.

> FWIW, here is the amount of code involved:
> 
> include/linux/sbitmap.h
>   250 lines basic operations
>   284 lines for queueing
>   ---
>   534 lines total
> 
> lib/sbitmap.c
>   201 lines basic operations
>   380 lines for queueing
>   ---
>   581 lines total
> 
> include/linux/sparsemask.h
>   260 lines total
>   https://lkml.org/lkml/2018/11/9/1176
> 
> lib/sparsemask.c
>   142 lines total
>   https://lkml.org/lkml/2018/11/9/1176
> 
> - Steve


Re: [PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-11-27 Thread Omar Sandoval
On Tue, Nov 27, 2018 at 10:16:56AM -0500, Steven Sistare wrote:
> On 11/9/2018 7:50 AM, Steve Sistare wrote:
> > From: Steve Sistare 
> > 
> > Provide struct sparsemask and functions to manipulate it.  A sparsemask is
> > a sparse bitmap.  It reduces cache contention vs the usual bitmap when many
> > threads concurrently set, clear, and visit elements, by reducing the number
> > of significant bits per cacheline.  For each 64 byte chunk of the mask,
> > only the first K bits of the first word are used, and the remaining bits
> > are ignored, where K is a creation time parameter.  Thus a sparsemask that
> > can represent a set of N elements is approximately (N/K * 64) bytes in
> > size.
> > 
> > Signed-off-by: Steve Sistare 
> > ---
> >  include/linux/sparsemask.h | 260 
> > +
> >  lib/Makefile   |   2 +-
> >  lib/sparsemask.c   | 142 +
> >  3 files changed, 403 insertions(+), 1 deletion(-)
> >  create mode 100644 include/linux/sparsemask.h
> >  create mode 100644 lib/sparsemask.c
> 
> Hi Peter and Ingo,
>   I need your opinion: would you prefer that I keep the new sparsemask type, 
> or fold it into the existing sbitmap type?  There is some overlap between the 
> two, but mostly in trivial one line functions. The main differences are:

Adding Jens and myself.

>   * sparsemask defines iterators that allow an inline loop body, like cpumask,
>   whereas the sbitmap iterator forces us to define a callback function for
>   the body, which is awkward.
>
>   * sparsemask is slightly more efficient.  The struct and variable length
>   bitmap are allocated contiguously,

That just means you have the pointer indirection elsewhere :) The users
of sbitmap embed it in whatever structure they have.

>   and sbitmap uses an extra field "depth"
>   per bitmap cacheline.

The depth field is memory which would otherwise be unused, and it's only
used for sbitmap_get(), so it doesn't have any cost if you're using it
like a cpumask.

>   * The order of arguments is different for the sparsemask accessors and
>   sbitmap accessors.  sparsemask mimics cpumask which is used extensively
>   in the sched code.
> 
>   * Much of the sbitmap code supports queueing, sleeping, and waking on bit
>   allocation, which is N/A for scheduler load load balancing.  However, we
>   can call the basic functions which do not use queueing.
> 
> I could add the sparsemask iterators to sbitmap (90 lines), and define
> a thin layer to change the argument order to mimic cpumask, but that
> essentially recreates sparsemask.

We only use sbitmap_for_each_set() in a few places. Maybe a for_each()
style macro would be cleaner for those users, too, in which case I
wouldn't be opposed to changing it. The cpumask argument order thing is
a annoying, though.

> Also, pushing sparsemask into sbitmap would limit our freedom to evolve the
> type to meet the future needs of sched, as sbitmap has its own maintainer,
> and is used by drivers, so changes to its API and ABI will be frowned upon.

It's a generic data structure, so of course Jens and I have no problem
with changing it to meet more needs :) Personally, I'd prefer to only
have one datastructure for this, but I suppose it depends on whether
Peter and Ingo think the argument order is important enough.

> FWIW, here is the amount of code involved:
> 
> include/linux/sbitmap.h
>   250 lines basic operations
>   284 lines for queueing
>   ---
>   534 lines total
> 
> lib/sbitmap.c
>   201 lines basic operations
>   380 lines for queueing
>   ---
>   581 lines total
> 
> include/linux/sparsemask.h
>   260 lines total
>   https://lkml.org/lkml/2018/11/9/1176
> 
> lib/sparsemask.c
>   142 lines total
>   https://lkml.org/lkml/2018/11/9/1176
> 
> - Steve


Re: [PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-11-27 Thread Steven Sistare
On 11/9/2018 7:50 AM, Steve Sistare wrote:
> From: Steve Sistare 
> 
> Provide struct sparsemask and functions to manipulate it.  A sparsemask is
> a sparse bitmap.  It reduces cache contention vs the usual bitmap when many
> threads concurrently set, clear, and visit elements, by reducing the number
> of significant bits per cacheline.  For each 64 byte chunk of the mask,
> only the first K bits of the first word are used, and the remaining bits
> are ignored, where K is a creation time parameter.  Thus a sparsemask that
> can represent a set of N elements is approximately (N/K * 64) bytes in
> size.
> 
> Signed-off-by: Steve Sistare 
> ---
>  include/linux/sparsemask.h | 260 
> +
>  lib/Makefile   |   2 +-
>  lib/sparsemask.c   | 142 +
>  3 files changed, 403 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/sparsemask.h
>  create mode 100644 lib/sparsemask.c

Hi Peter and Ingo,
  I need your opinion: would you prefer that I keep the new sparsemask type, 
or fold it into the existing sbitmap type?  There is some overlap between the 
two, but mostly in trivial one line functions. The main differences are:

  * sparsemask defines iterators that allow an inline loop body, like cpumask,
  whereas the sbitmap iterator forces us to define a callback function for
  the body, which is awkward.

  * sparsemask is slightly more efficient.  The struct and variable length
  bitmap are allocated contiguously, and sbitmap uses an extra field "depth"
  per bitmap cacheline.

  * The order of arguments is different for the sparsemask accessors and
  sbitmap accessors.  sparsemask mimics cpumask which is used extensively
  in the sched code.

  * Much of the sbitmap code supports queueing, sleeping, and waking on bit
  allocation, which is N/A for scheduler load load balancing.  However, we
  can call the basic functions which do not use queueing.

I could add the sparsemask iterators to sbitmap (90 lines), and define
a thin layer to change the argument order to mimic cpumask, but that
essentially recreates sparsemask.

Also, pushing sparsemask into sbitmap would limit our freedom to evolve the
type to meet the future needs of sched, as sbitmap has its own maintainer,
and is used by drivers, so changes to its API and ABI will be frowned upon.

FWIW, here is the amount of code involved:

include/linux/sbitmap.h
  250 lines basic operations
  284 lines for queueing
  ---
  534 lines total

lib/sbitmap.c
  201 lines basic operations
  380 lines for queueing
  ---
  581 lines total

include/linux/sparsemask.h
  260 lines total
  https://lkml.org/lkml/2018/11/9/1176

lib/sparsemask.c
  142 lines total
  https://lkml.org/lkml/2018/11/9/1176

- Steve


Re: [PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-11-27 Thread Steven Sistare
On 11/9/2018 7:50 AM, Steve Sistare wrote:
> From: Steve Sistare 
> 
> Provide struct sparsemask and functions to manipulate it.  A sparsemask is
> a sparse bitmap.  It reduces cache contention vs the usual bitmap when many
> threads concurrently set, clear, and visit elements, by reducing the number
> of significant bits per cacheline.  For each 64 byte chunk of the mask,
> only the first K bits of the first word are used, and the remaining bits
> are ignored, where K is a creation time parameter.  Thus a sparsemask that
> can represent a set of N elements is approximately (N/K * 64) bytes in
> size.
> 
> Signed-off-by: Steve Sistare 
> ---
>  include/linux/sparsemask.h | 260 
> +
>  lib/Makefile   |   2 +-
>  lib/sparsemask.c   | 142 +
>  3 files changed, 403 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/sparsemask.h
>  create mode 100644 lib/sparsemask.c

Hi Peter and Ingo,
  I need your opinion: would you prefer that I keep the new sparsemask type, 
or fold it into the existing sbitmap type?  There is some overlap between the 
two, but mostly in trivial one line functions. The main differences are:

  * sparsemask defines iterators that allow an inline loop body, like cpumask,
  whereas the sbitmap iterator forces us to define a callback function for
  the body, which is awkward.

  * sparsemask is slightly more efficient.  The struct and variable length
  bitmap are allocated contiguously, and sbitmap uses an extra field "depth"
  per bitmap cacheline.

  * The order of arguments is different for the sparsemask accessors and
  sbitmap accessors.  sparsemask mimics cpumask which is used extensively
  in the sched code.

  * Much of the sbitmap code supports queueing, sleeping, and waking on bit
  allocation, which is N/A for scheduler load load balancing.  However, we
  can call the basic functions which do not use queueing.

I could add the sparsemask iterators to sbitmap (90 lines), and define
a thin layer to change the argument order to mimic cpumask, but that
essentially recreates sparsemask.

Also, pushing sparsemask into sbitmap would limit our freedom to evolve the
type to meet the future needs of sched, as sbitmap has its own maintainer,
and is used by drivers, so changes to its API and ABI will be frowned upon.

FWIW, here is the amount of code involved:

include/linux/sbitmap.h
  250 lines basic operations
  284 lines for queueing
  ---
  534 lines total

lib/sbitmap.c
  201 lines basic operations
  380 lines for queueing
  ---
  581 lines total

include/linux/sparsemask.h
  260 lines total
  https://lkml.org/lkml/2018/11/9/1176

lib/sparsemask.c
  142 lines total
  https://lkml.org/lkml/2018/11/9/1176

- Steve


[PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-11-09 Thread Steve Sistare
From: Steve Sistare 

Provide struct sparsemask and functions to manipulate it.  A sparsemask is
a sparse bitmap.  It reduces cache contention vs the usual bitmap when many
threads concurrently set, clear, and visit elements, by reducing the number
of significant bits per cacheline.  For each 64 byte chunk of the mask,
only the first K bits of the first word are used, and the remaining bits
are ignored, where K is a creation time parameter.  Thus a sparsemask that
can represent a set of N elements is approximately (N/K * 64) bytes in
size.

Signed-off-by: Steve Sistare 
---
 include/linux/sparsemask.h | 260 +
 lib/Makefile   |   2 +-
 lib/sparsemask.c   | 142 +
 3 files changed, 403 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/sparsemask.h
 create mode 100644 lib/sparsemask.c

diff --git a/include/linux/sparsemask.h b/include/linux/sparsemask.h
new file mode 100644
index 000..d36a3be
--- /dev/null
+++ b/include/linux/sparsemask.h
@@ -0,0 +1,260 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * sparsemask.h - sparse bitmap operations
+ *
+ * Copyright (c) 2018 Oracle Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __LINUX_SPARSEMASK_H
+#define __LINUX_SPARSEMASK_H
+
+#include 
+#include 
+#include 
+
+/*
+ * A sparsemask is a sparse bitmap.  It reduces cache contention vs the usual
+ * bitmap when many threads concurrently set, clear, and visit elements.  For
+ * each 64 byte chunk of the mask, only the first K bits of the first word are
+ * used, and the remaining bits are ignored, where K is a creation time
+ * parameter.  Thus a sparsemask that can represent a set of N elements is
+ * approximately (N/K * 64) bytes in size.
+ *
+ * Clients pass and receive element numbers in the public API, and the
+ * implementation translates them to bit numbers to perform the bitmap
+ * operations.
+ *
+ * This file is partially derived from cpumask.h, and the public sparsemask
+ * operations are drop-in replacements for cpumask operations. However,
+ * sparsemask has no dependency on CPU definitions and can be used to
+ * represent any kind of elements.
+ */
+
+struct sparsemask {
+   short nelems;   /* current number of elements */
+   short density;  /* store 2^density elements per chunk */
+   unsigned long bits[0];  /* embedded array of chunks */
+};
+
+/* The maximum value for density, which implicitly defines the chunk size */
+
+#define _SMASK_DENSITY_MAX 6
+
+#define SMASK_DENSITY_TO_BYTES(density)(1U << (density))
+#define SMASK_DENSITY_TO_ELEMS(density)(1U << (density))
+
+/* The number of elements/bits/bytes/longs in a chunk */
+
+#define SMASK_ELEMS(mask)  SMASK_DENSITY_TO_ELEMS((mask)->density)
+#define SMASK_BYTESSMASK_DENSITY_TO_BYTES(_SMASK_DENSITY_MAX)
+#define SMASK_BITS (SMASK_BYTES * BITS_PER_BYTE)
+#define SMASK_LONGS(SMASK_BYTES / sizeof(long))
+
+/*
+ * Translate element index @elem to a bit/byte/long index.
+ * @density: the density of a chunk.
+ */
+
+#define _SMASK_ELEM_TO_BIT(elem, density)  \
+   ((elem) / SMASK_DENSITY_TO_ELEMS(density) * SMASK_BITS +\
+(elem) % SMASK_DENSITY_TO_ELEMS(density))
+
+#define _SMASK_ELEM_TO_BYTE(elem, density) \
+   (_SMASK_ELEM_TO_BIT(elem, density) / BITS_PER_BYTE)
+
+#define _SMASK_ELEM_TO_LONG(elem, density) \
+   (_SMASK_ELEM_TO_BYTE(elem, density) / sizeof(long))
+
+/* Translate @bit/@byte/@long index to an element index */
+
+#define _SMASK_BIT_TO_ELEM(bit, density)   \
+   ((bit) / SMASK_BITS * SMASK_DENSITY_TO_ELEMS(density) + \
+(bit) % SMASK_BITS)
+
+#define _SMASK_BYTE_TO_ELEM(byte, density) \
+   _SMASK_BIT_TO_ELEM((byte) * BITS_PER_BYTE, density)
+
+#define _SMASK_LONG_TO_ELEM(index, density)\
+   _SMASK_BYTE_TO_ELEM((index) * sizeof(long), density)
+
+/* Same translations as above, but taking sparsemask @m instead of density */
+
+#define SMASK_ELEM_TO_BYTE(elem, m)_SMASK_ELEM_TO_BYTE(elem, (m)->density)
+#define SMASK_ELEM_TO_BIT(elem, m) _SMASK_ELEM_TO_BIT(elem, (m)->density)
+#define SMASK_ELEM_TO_LONG(elem, m)_SMASK_ELEM_TO_LONG(elem, (m)->density)
+#define SMASK_BYTE_TO_ELEM(byte, m)_SMASK_BYTE_TO_ELEM(byte, (m)->density)
+#define SMASK_BIT_TO_ELEM(bit, m)  _SMASK_BIT_TO_ELEM(bit, (m)->density)
+#define SMASK_LONG_TO_ELEM(index, 

[PATCH v3 01/10] sched: Provide sparsemask, a reduced contention bitmap

2018-11-09 Thread Steve Sistare
From: Steve Sistare 

Provide struct sparsemask and functions to manipulate it.  A sparsemask is
a sparse bitmap.  It reduces cache contention vs the usual bitmap when many
threads concurrently set, clear, and visit elements, by reducing the number
of significant bits per cacheline.  For each 64 byte chunk of the mask,
only the first K bits of the first word are used, and the remaining bits
are ignored, where K is a creation time parameter.  Thus a sparsemask that
can represent a set of N elements is approximately (N/K * 64) bytes in
size.

Signed-off-by: Steve Sistare 
---
 include/linux/sparsemask.h | 260 +
 lib/Makefile   |   2 +-
 lib/sparsemask.c   | 142 +
 3 files changed, 403 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/sparsemask.h
 create mode 100644 lib/sparsemask.c

diff --git a/include/linux/sparsemask.h b/include/linux/sparsemask.h
new file mode 100644
index 000..d36a3be
--- /dev/null
+++ b/include/linux/sparsemask.h
@@ -0,0 +1,260 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * sparsemask.h - sparse bitmap operations
+ *
+ * Copyright (c) 2018 Oracle Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __LINUX_SPARSEMASK_H
+#define __LINUX_SPARSEMASK_H
+
+#include 
+#include 
+#include 
+
+/*
+ * A sparsemask is a sparse bitmap.  It reduces cache contention vs the usual
+ * bitmap when many threads concurrently set, clear, and visit elements.  For
+ * each 64 byte chunk of the mask, only the first K bits of the first word are
+ * used, and the remaining bits are ignored, where K is a creation time
+ * parameter.  Thus a sparsemask that can represent a set of N elements is
+ * approximately (N/K * 64) bytes in size.
+ *
+ * Clients pass and receive element numbers in the public API, and the
+ * implementation translates them to bit numbers to perform the bitmap
+ * operations.
+ *
+ * This file is partially derived from cpumask.h, and the public sparsemask
+ * operations are drop-in replacements for cpumask operations. However,
+ * sparsemask has no dependency on CPU definitions and can be used to
+ * represent any kind of elements.
+ */
+
+struct sparsemask {
+   short nelems;   /* current number of elements */
+   short density;  /* store 2^density elements per chunk */
+   unsigned long bits[0];  /* embedded array of chunks */
+};
+
+/* The maximum value for density, which implicitly defines the chunk size */
+
+#define _SMASK_DENSITY_MAX 6
+
+#define SMASK_DENSITY_TO_BYTES(density)(1U << (density))
+#define SMASK_DENSITY_TO_ELEMS(density)(1U << (density))
+
+/* The number of elements/bits/bytes/longs in a chunk */
+
+#define SMASK_ELEMS(mask)  SMASK_DENSITY_TO_ELEMS((mask)->density)
+#define SMASK_BYTESSMASK_DENSITY_TO_BYTES(_SMASK_DENSITY_MAX)
+#define SMASK_BITS (SMASK_BYTES * BITS_PER_BYTE)
+#define SMASK_LONGS(SMASK_BYTES / sizeof(long))
+
+/*
+ * Translate element index @elem to a bit/byte/long index.
+ * @density: the density of a chunk.
+ */
+
+#define _SMASK_ELEM_TO_BIT(elem, density)  \
+   ((elem) / SMASK_DENSITY_TO_ELEMS(density) * SMASK_BITS +\
+(elem) % SMASK_DENSITY_TO_ELEMS(density))
+
+#define _SMASK_ELEM_TO_BYTE(elem, density) \
+   (_SMASK_ELEM_TO_BIT(elem, density) / BITS_PER_BYTE)
+
+#define _SMASK_ELEM_TO_LONG(elem, density) \
+   (_SMASK_ELEM_TO_BYTE(elem, density) / sizeof(long))
+
+/* Translate @bit/@byte/@long index to an element index */
+
+#define _SMASK_BIT_TO_ELEM(bit, density)   \
+   ((bit) / SMASK_BITS * SMASK_DENSITY_TO_ELEMS(density) + \
+(bit) % SMASK_BITS)
+
+#define _SMASK_BYTE_TO_ELEM(byte, density) \
+   _SMASK_BIT_TO_ELEM((byte) * BITS_PER_BYTE, density)
+
+#define _SMASK_LONG_TO_ELEM(index, density)\
+   _SMASK_BYTE_TO_ELEM((index) * sizeof(long), density)
+
+/* Same translations as above, but taking sparsemask @m instead of density */
+
+#define SMASK_ELEM_TO_BYTE(elem, m)_SMASK_ELEM_TO_BYTE(elem, (m)->density)
+#define SMASK_ELEM_TO_BIT(elem, m) _SMASK_ELEM_TO_BIT(elem, (m)->density)
+#define SMASK_ELEM_TO_LONG(elem, m)_SMASK_ELEM_TO_LONG(elem, (m)->density)
+#define SMASK_BYTE_TO_ELEM(byte, m)_SMASK_BYTE_TO_ELEM(byte, (m)->density)
+#define SMASK_BIT_TO_ELEM(bit, m)  _SMASK_BIT_TO_ELEM(bit, (m)->density)
+#define SMASK_LONG_TO_ELEM(index,