Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-09 Thread Guillaume Knispel
On Mon, Aug 07, 2017 at 11:21:03AM -0700, Davidlohr Bueso wrote:
> On Mon, 31 Jul 2017, Guillaume Knispel wrote:
> > struct ipc_ids {
> > int in_use;
> > unsigned short seq;
> > +   bool tables_initialized;
> 
> So this is really ugly to have, but I understand why you added it. I
> wonder what folks would think if we just panic() in the rhashtable_init()
> ENOMEM case, and convert the EINVALs to WARNs. This way the function
> would always be called successfully. This is similar to what futex_init
> does, with the underlying hash table allocator panicing. sems and msg
> would probably have to be converted to pure_initcall, but hey, we could
> at least get the symmetry back.

I think we could only afford to panic() on ENOMEM during boot, but
ipc_init_ids() is also called through create_ipc_ns() on namespace
creation. Besides, I would not be very comfortable with only warning on
EINVAL but continuing execution using potentially uninitialized data.
Granted, this will probably never happen in production, but the intent
was to leave the system usable (except that it would not be possible to
create sysv ipc objects) with no risk of additionnal crash for cases
like people hacking rhashtable and testing their modifications, if they
merely introduce a correctly reported error.

Cheers!
Guillaume


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-09 Thread Guillaume Knispel
On Mon, Aug 07, 2017 at 11:21:03AM -0700, Davidlohr Bueso wrote:
> On Mon, 31 Jul 2017, Guillaume Knispel wrote:
> > struct ipc_ids {
> > int in_use;
> > unsigned short seq;
> > +   bool tables_initialized;
> 
> So this is really ugly to have, but I understand why you added it. I
> wonder what folks would think if we just panic() in the rhashtable_init()
> ENOMEM case, and convert the EINVALs to WARNs. This way the function
> would always be called successfully. This is similar to what futex_init
> does, with the underlying hash table allocator panicing. sems and msg
> would probably have to be converted to pure_initcall, but hey, we could
> at least get the symmetry back.

I think we could only afford to panic() on ENOMEM during boot, but
ipc_init_ids() is also called through create_ipc_ns() on namespace
creation. Besides, I would not be very comfortable with only warning on
EINVAL but continuing execution using potentially uninitialized data.
Granted, this will probably never happen in production, but the intent
was to leave the system usable (except that it would not be possible to
create sysv ipc objects) with no risk of additionnal crash for cases
like people hacking rhashtable and testing their modifications, if they
merely introduce a correctly reported error.

Cheers!
Guillaume


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-07 Thread Davidlohr Bueso

On Mon, 31 Jul 2017, Guillaume Knispel wrote:

struct ipc_ids {
int in_use;
unsigned short seq;
+   bool tables_initialized;


So this is really ugly to have, but I understand why you added it. I
wonder what folks would think if we just panic() in the rhashtable_init()
ENOMEM case, and convert the EINVALs to WARNs. This way the function
would always be called successfully. This is similar to what futex_init
does, with the underlying hash table allocator panicing. sems and msg
would probably have to be converted to pure_initcall, but hey, we could
at least get the symmetry back.


static int __init ipc_ns_init(void)
{
-   shm_init_ns(_ipc_ns);
-   return 0;
+   const int err = shm_init_ns(_ipc_ns);
+   WARN(err, "ipc: sysV shm_init_ns failed: %d\n", err);


nit: s/sysV/sysv


+   return err;
}


Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-07 Thread Davidlohr Bueso

On Mon, 31 Jul 2017, Guillaume Knispel wrote:

struct ipc_ids {
int in_use;
unsigned short seq;
+   bool tables_initialized;


So this is really ugly to have, but I understand why you added it. I
wonder what folks would think if we just panic() in the rhashtable_init()
ENOMEM case, and convert the EINVALs to WARNs. This way the function
would always be called successfully. This is similar to what futex_init
does, with the underlying hash table allocator panicing. sems and msg
would probably have to be converted to pure_initcall, but hey, we could
at least get the symmetry back.


static int __init ipc_ns_init(void)
{
-   shm_init_ns(_ipc_ns);
-   return 0;
+   const int err = shm_init_ns(_ipc_ns);
+   WARN(err, "ipc: sysV shm_init_ns failed: %d\n", err);


nit: s/sysV/sysv


+   return err;
}


Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-07 Thread Davidlohr Bueso

On Thu, 03 Aug 2017, Guillaume Knispel wrote:


In linux/init.h I saw that a pure_initcall is reserved to only
initialize variables and must have no dependency on anything else;
I interpreted that, + "pure" in the name, thinking we should not e.g.
allocate in a pure_initcall, however I see that net_ns_init() calls
kmem_cache_create() and others, so maybe we can keep ipc_ns_init() as
a pure_initcall?


Yeah, I don't see this as a limitation wrt link order. Among others,
filelocks also do this. Not to mention futexes with alloc_large_system_hash().
So lets just keep this as is.

Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-07 Thread Davidlohr Bueso

On Thu, 03 Aug 2017, Guillaume Knispel wrote:


In linux/init.h I saw that a pure_initcall is reserved to only
initialize variables and must have no dependency on anything else;
I interpreted that, + "pure" in the name, thinking we should not e.g.
allocate in a pure_initcall, however I see that net_ns_init() calls
kmem_cache_create() and others, so maybe we can keep ipc_ns_init() as
a pure_initcall?


Yeah, I don't see this as a limitation wrt link order. Among others,
filelocks also do this. Not to mention futexes with alloc_large_system_hash().
So lets just keep this as is.

Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-03 Thread Guillaume Knispel
On Wed, Aug 02, 2017 at 01:06:44PM -0700, Davidlohr Bueso wrote:
> On Mon, 31 Jul 2017, Guillaume Knispel wrote:
> >static int __init ipc_init(void)
> >{
> >-sem_init();
> >-msg_init();
> >+int err_sem, err_msg;
> >+
> >+err_sem = sem_init();
> >+WARN(err_sem, "ipc: sysV sem_init failed: %d\n", err_sem);
> >+err_msg = msg_init();
> >+WARN(err_msg, "ipc: sysV msg_init failed: %d\n", err_msg);
> > shm_init();
> 
> This shows the ugliness of the underlying ipc init asymmetry. Specifically,
> 140d0b2108f (Do 'shm_init_ns()' in an early pure_initcall) was the final
> nail in the coffin to fix an exit_shm() race.
> 
> While normally we could just initialize the ipc_ids fields statically and
> be over with initcall dependencies, your patch will require inits be done
> dynamically for the rhashtable_init(). Oh well.
> 
> Also, why do you do this?
> 
> >-pure_initcall(ipc_ns_init);
> >+core_initcall(ipc_ns_init);

In linux/init.h I saw that a pure_initcall is reserved to only
initialize variables and must have no dependency on anything else;
I interpreted that, + "pure" in the name, thinking we should not e.g.
allocate in a pure_initcall, however I see that net_ns_init() calls
kmem_cache_create() and others, so maybe we can keep ipc_ns_init() as
a pure_initcall?

Guillaume


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-03 Thread Guillaume Knispel
On Wed, Aug 02, 2017 at 01:06:44PM -0700, Davidlohr Bueso wrote:
> On Mon, 31 Jul 2017, Guillaume Knispel wrote:
> >static int __init ipc_init(void)
> >{
> >-sem_init();
> >-msg_init();
> >+int err_sem, err_msg;
> >+
> >+err_sem = sem_init();
> >+WARN(err_sem, "ipc: sysV sem_init failed: %d\n", err_sem);
> >+err_msg = msg_init();
> >+WARN(err_msg, "ipc: sysV msg_init failed: %d\n", err_msg);
> > shm_init();
> 
> This shows the ugliness of the underlying ipc init asymmetry. Specifically,
> 140d0b2108f (Do 'shm_init_ns()' in an early pure_initcall) was the final
> nail in the coffin to fix an exit_shm() race.
> 
> While normally we could just initialize the ipc_ids fields statically and
> be over with initcall dependencies, your patch will require inits be done
> dynamically for the rhashtable_init(). Oh well.
> 
> Also, why do you do this?
> 
> >-pure_initcall(ipc_ns_init);
> >+core_initcall(ipc_ns_init);

In linux/init.h I saw that a pure_initcall is reserved to only
initialize variables and must have no dependency on anything else;
I interpreted that, + "pure" in the name, thinking we should not e.g.
allocate in a pure_initcall, however I see that net_ns_init() calls
kmem_cache_create() and others, so maybe we can keep ipc_ns_init() as
a pure_initcall?

Guillaume


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-02 Thread Davidlohr Bueso

On Mon, 31 Jul 2017, Guillaume Knispel wrote:

static int __init ipc_init(void)
{
-   sem_init();
-   msg_init();
+   int err_sem, err_msg;
+
+   err_sem = sem_init();
+   WARN(err_sem, "ipc: sysV sem_init failed: %d\n", err_sem);
+   err_msg = msg_init();
+   WARN(err_msg, "ipc: sysV msg_init failed: %d\n", err_msg);
shm_init();


This shows the ugliness of the underlying ipc init asymmetry. Specifically,
140d0b2108f (Do 'shm_init_ns()' in an early pure_initcall) was the final
nail in the coffin to fix an exit_shm() race.

While normally we could just initialize the ipc_ids fields statically and
be over with initcall dependencies, your patch will require inits be done
dynamically for the rhashtable_init(). Oh well.

Also, why do you do this?


-pure_initcall(ipc_ns_init);
+core_initcall(ipc_ns_init);


Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-02 Thread Davidlohr Bueso

On Mon, 31 Jul 2017, Guillaume Knispel wrote:

static int __init ipc_init(void)
{
-   sem_init();
-   msg_init();
+   int err_sem, err_msg;
+
+   err_sem = sem_init();
+   WARN(err_sem, "ipc: sysV sem_init failed: %d\n", err_sem);
+   err_msg = msg_init();
+   WARN(err_msg, "ipc: sysV msg_init failed: %d\n", err_msg);
shm_init();


This shows the ugliness of the underlying ipc init asymmetry. Specifically,
140d0b2108f (Do 'shm_init_ns()' in an early pure_initcall) was the final
nail in the coffin to fix an exit_shm() race.

While normally we could just initialize the ipc_ids fields statically and
be over with initcall dependencies, your patch will require inits be done
dynamically for the rhashtable_init(). Oh well.

Also, why do you do this?


-pure_initcall(ipc_ns_init);
+core_initcall(ipc_ns_init);


Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-01 Thread Davidlohr Bueso

On Tue, 01 Aug 2017, Guillaume Knispel wrote:


On Mon, Jul 31, 2017 at 08:45:58AM -0700, Davidlohr Bueso wrote:

On Mon, 31 Jul 2017, Guillaume Knispel wrote:
>ipc_findkey() scanned all objects to look for the wanted key. This is
>slow when using a high number of keys, for example on an i5 laptop the
>following loop took 17 s, with last semget calls taking ~1 ms each.

I would argue that this is not the common case.


Well, Linux allows for 32000 objects, and if you want to allocate them
with keys, this initial (maybe diluted) duration is incompressible, and
is O(n²).

Besides, I maintain a program which, in some of its versions, uses tens
of thousands of semaphore sets with keys, and destroys and creates new
ones all the time.


Not impossible, just not the common case.


On 4.13-rc3 without and with the patch, the following loop takes on
my laptop, according to clock_gettime CLOCK_MONOTONIC calls not
shown here, for each value of KEYS starting right after a reboot
with initially 0 semaphore sets:

   for (int i = 0, k=0x424242; i < KEYS ; ++i)
   semget(k++, 1, IPC_CREAT | 0600);

total   total  max single  max single
  KEYSwithoutwithcall without   call with

 13.5 4.9   µs3.5 4.9
107.6 8.6   µs3.7 4.7
32   16.215.9   µs4.3 5.3
   100   72.941.8   µs3.7 4.7
  10005,630.0   502.0   µs *   *
 11,340,000.0 7,240.0   µs *   *
 31900   17,600,000,022,200.0   µs *   *

Repeating the test immediately (prior to the reboot) for the same value
of KEYS gives the times without creation (lookup only):

total   total  max single  max single
  KEYSwithoutwithcall without   call with

 12.1 2.5   µs2.1 2.5
104.5 4.8   µs2.2 2.3
32   13.010.8   µs2.3 2.8
   100   82.925.1   µs *  2.3
  10005,780.0   217.0   µs *   *
 11,470,000.0 2,520.0   µs *   *
 31900   17,400,000.0 7,810.0   µs *   *

*: unreliable measure: high variance

This is both on a laptop and within a VM, so even where I have not noted
high variance the figures are not very precise (especially for long
runs) but we can still see the tendencies.

I did one last benchmark, this time running each semget() in a new
process (and still only measuring the time taken by this syscall) and
got those figures (in a single run on each kernel) in µs:

creation:
total   total
  KEYSwithoutwith

 13.7 5.0   µs
10   32.936.7   µs
32  125.0   109.0   µs
   100  523.0   353.0   µs
  1000   20,300.0 3,280.0   µs
 12,470,000.046,700.0   µs
 31900   27,800,000.0   219,000.0   µs

lookup-only:
total   total
  KEYSwithoutwith

 12.5 2.7   µs
10   25.424.4   µs
32  106.072.6   µs
   100  591.0   352.0   µs
  1000   22,400.0 2,250.0   µs
 12,510,000.025,700.0   µs
 31900   28,200,000.0   115,000.0   µs

My provisional conclusion is that on my system this patch improves the
performance consistently from about n ~= 30, and below 30 the slowdown,
if any, is more than reasonable; it should be inconsequential for
properly written programs and of a limited impact on programs doing lots
of get() calls on small sets of ipc objects.


I agree, for smaller amounts of keys the overhead is negligible, and the
O(n) quickly kicks in. To the point that this patch also benefits in
more normal scenarios, where we don't have unrealisticly high amounts of
keys.

Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-08-01 Thread Davidlohr Bueso

On Tue, 01 Aug 2017, Guillaume Knispel wrote:


On Mon, Jul 31, 2017 at 08:45:58AM -0700, Davidlohr Bueso wrote:

On Mon, 31 Jul 2017, Guillaume Knispel wrote:
>ipc_findkey() scanned all objects to look for the wanted key. This is
>slow when using a high number of keys, for example on an i5 laptop the
>following loop took 17 s, with last semget calls taking ~1 ms each.

I would argue that this is not the common case.


Well, Linux allows for 32000 objects, and if you want to allocate them
with keys, this initial (maybe diluted) duration is incompressible, and
is O(n²).

Besides, I maintain a program which, in some of its versions, uses tens
of thousands of semaphore sets with keys, and destroys and creates new
ones all the time.


Not impossible, just not the common case.


On 4.13-rc3 without and with the patch, the following loop takes on
my laptop, according to clock_gettime CLOCK_MONOTONIC calls not
shown here, for each value of KEYS starting right after a reboot
with initially 0 semaphore sets:

   for (int i = 0, k=0x424242; i < KEYS ; ++i)
   semget(k++, 1, IPC_CREAT | 0600);

total   total  max single  max single
  KEYSwithoutwithcall without   call with

 13.5 4.9   µs3.5 4.9
107.6 8.6   µs3.7 4.7
32   16.215.9   µs4.3 5.3
   100   72.941.8   µs3.7 4.7
  10005,630.0   502.0   µs *   *
 11,340,000.0 7,240.0   µs *   *
 31900   17,600,000,022,200.0   µs *   *

Repeating the test immediately (prior to the reboot) for the same value
of KEYS gives the times without creation (lookup only):

total   total  max single  max single
  KEYSwithoutwithcall without   call with

 12.1 2.5   µs2.1 2.5
104.5 4.8   µs2.2 2.3
32   13.010.8   µs2.3 2.8
   100   82.925.1   µs *  2.3
  10005,780.0   217.0   µs *   *
 11,470,000.0 2,520.0   µs *   *
 31900   17,400,000.0 7,810.0   µs *   *

*: unreliable measure: high variance

This is both on a laptop and within a VM, so even where I have not noted
high variance the figures are not very precise (especially for long
runs) but we can still see the tendencies.

I did one last benchmark, this time running each semget() in a new
process (and still only measuring the time taken by this syscall) and
got those figures (in a single run on each kernel) in µs:

creation:
total   total
  KEYSwithoutwith

 13.7 5.0   µs
10   32.936.7   µs
32  125.0   109.0   µs
   100  523.0   353.0   µs
  1000   20,300.0 3,280.0   µs
 12,470,000.046,700.0   µs
 31900   27,800,000.0   219,000.0   µs

lookup-only:
total   total
  KEYSwithoutwith

 12.5 2.7   µs
10   25.424.4   µs
32  106.072.6   µs
   100  591.0   352.0   µs
  1000   22,400.0 2,250.0   µs
 12,510,000.025,700.0   µs
 31900   28,200,000.0   115,000.0   µs

My provisional conclusion is that on my system this patch improves the
performance consistently from about n ~= 30, and below 30 the slowdown,
if any, is more than reasonable; it should be inconsequential for
properly written programs and of a limited impact on programs doing lots
of get() calls on small sets of ipc objects.


I agree, for smaller amounts of keys the overhead is negligible, and the
O(n) quickly kicks in. To the point that this patch also benefits in
more normal scenarios, where we don't have unrealisticly high amounts of
keys.

Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-07-31 Thread Guillaume Knispel
On Mon, Jul 31, 2017 at 08:45:58AM -0700, Davidlohr Bueso wrote:
> On Mon, 31 Jul 2017, Guillaume Knispel wrote:
> >ipc_findkey() scanned all objects to look for the wanted key. This is
> >slow when using a high number of keys, for example on an i5 laptop the
> >following loop took 17 s, with last semget calls taking ~1 ms each.
> 
> I would argue that this is not the common case.

Well, Linux allows for 32000 objects, and if you want to allocate them
with keys, this initial (maybe diluted) duration is incompressible, and
is O(n²).

Besides, I maintain a program which, in some of its versions, uses tens
of thousands of semaphore sets with keys, and destroys and creates new
ones all the time.

> >This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so
> >that one lookup cease to be O(n). The above loop now takes ~10 ms. Each
> >lookup-only semget() call can take between ~120 ns and a few µs. Rarely,
> >some creations can take a few dozen of µs.
> 
> Could you please provide numbers for smaller amounts of keys?

On 4.13-rc3 without and with the patch, the following loop takes on
my laptop, according to clock_gettime CLOCK_MONOTONIC calls not
shown here, for each value of KEYS starting right after a reboot
with initially 0 semaphore sets:

for (int i = 0, k=0x424242; i < KEYS ; ++i)
semget(k++, 1, IPC_CREAT | 0600);

 total   total  max single  max single
   KEYSwithoutwithcall without   call with
  
  13.5 4.9   µs3.5 4.9
 107.6 8.6   µs3.7 4.7
 32   16.215.9   µs4.3 5.3
100   72.941.8   µs3.7 4.7
   10005,630.0   502.0   µs *   *
  11,340,000.0 7,240.0   µs *   *
  31900   17,600,000,022,200.0   µs *   *

Repeating the test immediately (prior to the reboot) for the same value
of KEYS gives the times without creation (lookup only):

 total   total  max single  max single
   KEYSwithoutwithcall without   call with
  
  12.1 2.5   µs2.1 2.5
 104.5 4.8   µs2.2 2.3
 32   13.010.8   µs2.3 2.8
100   82.925.1   µs *  2.3
   10005,780.0   217.0   µs *   *
  11,470,000.0 2,520.0   µs *   *
  31900   17,400,000.0 7,810.0   µs *   *

*: unreliable measure: high variance

This is both on a laptop and within a VM, so even where I have not noted
high variance the figures are not very precise (especially for long
runs) but we can still see the tendencies.

I did one last benchmark, this time running each semget() in a new
process (and still only measuring the time taken by this syscall) and
got those figures (in a single run on each kernel) in µs:

creation:
 total   total
   KEYSwithoutwith
  
  13.7 5.0   µs
 10   32.936.7   µs
 32  125.0   109.0   µs
100  523.0   353.0   µs
   1000   20,300.0 3,280.0   µs
  12,470,000.046,700.0   µs
  31900   27,800,000.0   219,000.0   µs

lookup-only:
 total   total
   KEYSwithoutwith
  
  12.5 2.7   µs
 10   25.424.4   µs
 32  106.072.6   µs
100  591.0   352.0   µs
   1000   22,400.0 2,250.0   µs
  12,510,000.025,700.0   µs
  31900   28,200,000.0   115,000.0   µs

My provisional conclusion is that on my system this patch improves the
performance consistently from about n ~= 30, and below 30 the slowdown,
if any, is more than reasonable; it should be inconsequential for
properly written programs and of a limited impact on programs doing lots
of get() calls on small sets of ipc objects.

For a huge number of keyed ipc objects (within the limits already
accepted by Linux), this patch render the get() performance
acceptable when it was previously quite poor.

Cheers!
Guillaume


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-07-31 Thread Guillaume Knispel
On Mon, Jul 31, 2017 at 08:45:58AM -0700, Davidlohr Bueso wrote:
> On Mon, 31 Jul 2017, Guillaume Knispel wrote:
> >ipc_findkey() scanned all objects to look for the wanted key. This is
> >slow when using a high number of keys, for example on an i5 laptop the
> >following loop took 17 s, with last semget calls taking ~1 ms each.
> 
> I would argue that this is not the common case.

Well, Linux allows for 32000 objects, and if you want to allocate them
with keys, this initial (maybe diluted) duration is incompressible, and
is O(n²).

Besides, I maintain a program which, in some of its versions, uses tens
of thousands of semaphore sets with keys, and destroys and creates new
ones all the time.

> >This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so
> >that one lookup cease to be O(n). The above loop now takes ~10 ms. Each
> >lookup-only semget() call can take between ~120 ns and a few µs. Rarely,
> >some creations can take a few dozen of µs.
> 
> Could you please provide numbers for smaller amounts of keys?

On 4.13-rc3 without and with the patch, the following loop takes on
my laptop, according to clock_gettime CLOCK_MONOTONIC calls not
shown here, for each value of KEYS starting right after a reboot
with initially 0 semaphore sets:

for (int i = 0, k=0x424242; i < KEYS ; ++i)
semget(k++, 1, IPC_CREAT | 0600);

 total   total  max single  max single
   KEYSwithoutwithcall without   call with
  
  13.5 4.9   µs3.5 4.9
 107.6 8.6   µs3.7 4.7
 32   16.215.9   µs4.3 5.3
100   72.941.8   µs3.7 4.7
   10005,630.0   502.0   µs *   *
  11,340,000.0 7,240.0   µs *   *
  31900   17,600,000,022,200.0   µs *   *

Repeating the test immediately (prior to the reboot) for the same value
of KEYS gives the times without creation (lookup only):

 total   total  max single  max single
   KEYSwithoutwithcall without   call with
  
  12.1 2.5   µs2.1 2.5
 104.5 4.8   µs2.2 2.3
 32   13.010.8   µs2.3 2.8
100   82.925.1   µs *  2.3
   10005,780.0   217.0   µs *   *
  11,470,000.0 2,520.0   µs *   *
  31900   17,400,000.0 7,810.0   µs *   *

*: unreliable measure: high variance

This is both on a laptop and within a VM, so even where I have not noted
high variance the figures are not very precise (especially for long
runs) but we can still see the tendencies.

I did one last benchmark, this time running each semget() in a new
process (and still only measuring the time taken by this syscall) and
got those figures (in a single run on each kernel) in µs:

creation:
 total   total
   KEYSwithoutwith
  
  13.7 5.0   µs
 10   32.936.7   µs
 32  125.0   109.0   µs
100  523.0   353.0   µs
   1000   20,300.0 3,280.0   µs
  12,470,000.046,700.0   µs
  31900   27,800,000.0   219,000.0   µs

lookup-only:
 total   total
   KEYSwithoutwith
  
  12.5 2.7   µs
 10   25.424.4   µs
 32  106.072.6   µs
100  591.0   352.0   µs
   1000   22,400.0 2,250.0   µs
  12,510,000.025,700.0   µs
  31900   28,200,000.0   115,000.0   µs

My provisional conclusion is that on my system this patch improves the
performance consistently from about n ~= 30, and below 30 the slowdown,
if any, is more than reasonable; it should be inconsequential for
properly written programs and of a limited impact on programs doing lots
of get() calls on small sets of ipc objects.

For a huge number of keyed ipc objects (within the limits already
accepted by Linux), this patch render the get() performance
acceptable when it was previously quite poor.

Cheers!
Guillaume


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-07-31 Thread Davidlohr Bueso

On Mon, 31 Jul 2017, Guillaume Knispel wrote:


ipc_findkey() scanned all objects to look for the wanted key. This is
slow when using a high number of keys, for example on an i5 laptop the
following loop took 17 s, with last semget calls taking ~1 ms each.


I would argue that this is not the common case.



   for (int i = 0, k=0x424242; i < 31900; ++i)
   semget(k++, 1, IPC_CREAT | 0600);

This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so
that one lookup cease to be O(n). The above loop now takes ~10 ms. Each
lookup-only semget() call can take between ~120 ns and a few µs. Rarely,
some creations can take a few dozen of µs.


Could you please provide numbers for smaller amounts of keys?

Thanks,
Davidlohr


Re: [PATCH] ipc: optimize semget/shmget/msgget for lots of keys

2017-07-31 Thread Davidlohr Bueso

On Mon, 31 Jul 2017, Guillaume Knispel wrote:


ipc_findkey() scanned all objects to look for the wanted key. This is
slow when using a high number of keys, for example on an i5 laptop the
following loop took 17 s, with last semget calls taking ~1 ms each.


I would argue that this is not the common case.



   for (int i = 0, k=0x424242; i < 31900; ++i)
   semget(k++, 1, IPC_CREAT | 0600);

This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so
that one lookup cease to be O(n). The above loop now takes ~10 ms. Each
lookup-only semget() call can take between ~120 ns and a few µs. Rarely,
some creations can take a few dozen of µs.


Could you please provide numbers for smaller amounts of keys?

Thanks,
Davidlohr