[PATCH v2 07/11] mm/slab: racy access/modify the slab color

2016-04-11 Thread js1304
From: Joonsoo Kim 

Slab color isn't needed to be changed strictly.  Because locking for
changing slab color could cause more lock contention so this patch
implements racy access/modify the slab color.  This is a preparation step
to implement lockless allocation path when there is no free objects in the
kmem_cache.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=355/750
Kmalloc N*alloc N*free(64): Average=452/812
Kmalloc N*alloc N*free(128): Average=559/1070
Kmalloc N*alloc N*free(256): Average=1176/980
Kmalloc N*alloc N*free(512): Average=1939/1189
Kmalloc N*alloc N*free(1024): Average=3521/1278
Kmalloc N*alloc N*free(2048): Average=7152/1838
Kmalloc N*alloc N*free(4096): Average=13438/2013

It shows that contention is reduced for object size >= 1024 and
performance increases by roughly 15%.

Acked-by: Christoph Lameter 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 6e61461..a3422bc 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2561,20 +2561,7 @@ static int cache_grow(struct kmem_cache *cachep,
}
local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
-   /* Take the node list lock to change the colour_next on this node */
check_irq_off();
-   n = get_node(cachep, nodeid);
-   spin_lock(>list_lock);
-
-   /* Get colour for the slab, and cal the next value. */
-   offset = n->colour_next;
-   n->colour_next++;
-   if (n->colour_next >= cachep->colour)
-   n->colour_next = 0;
-   spin_unlock(>list_lock);
-
-   offset *= cachep->colour_off;
-
if (gfpflags_allow_blocking(local_flags))
local_irq_enable();
 
@@ -2595,6 +2582,19 @@ static int cache_grow(struct kmem_cache *cachep,
if (!page)
goto failed;
 
+   n = get_node(cachep, nodeid);
+
+   /* Get colour for the slab, and cal the next value. */
+   n->colour_next++;
+   if (n->colour_next >= cachep->colour)
+   n->colour_next = 0;
+
+   offset = n->colour_next;
+   if (offset >= cachep->colour)
+   offset = 0;
+
+   offset *= cachep->colour_off;
+
/* Get slab management. */
freelist = alloc_slabmgmt(cachep, page, offset,
local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
-- 
1.9.1



[PATCH v2 07/11] mm/slab: racy access/modify the slab color

2016-04-11 Thread js1304
From: Joonsoo Kim 

Slab color isn't needed to be changed strictly.  Because locking for
changing slab color could cause more lock contention so this patch
implements racy access/modify the slab color.  This is a preparation step
to implement lockless allocation path when there is no free objects in the
kmem_cache.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=355/750
Kmalloc N*alloc N*free(64): Average=452/812
Kmalloc N*alloc N*free(128): Average=559/1070
Kmalloc N*alloc N*free(256): Average=1176/980
Kmalloc N*alloc N*free(512): Average=1939/1189
Kmalloc N*alloc N*free(1024): Average=3521/1278
Kmalloc N*alloc N*free(2048): Average=7152/1838
Kmalloc N*alloc N*free(4096): Average=13438/2013

It shows that contention is reduced for object size >= 1024 and
performance increases by roughly 15%.

Acked-by: Christoph Lameter 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 6e61461..a3422bc 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2561,20 +2561,7 @@ static int cache_grow(struct kmem_cache *cachep,
}
local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
-   /* Take the node list lock to change the colour_next on this node */
check_irq_off();
-   n = get_node(cachep, nodeid);
-   spin_lock(>list_lock);
-
-   /* Get colour for the slab, and cal the next value. */
-   offset = n->colour_next;
-   n->colour_next++;
-   if (n->colour_next >= cachep->colour)
-   n->colour_next = 0;
-   spin_unlock(>list_lock);
-
-   offset *= cachep->colour_off;
-
if (gfpflags_allow_blocking(local_flags))
local_irq_enable();
 
@@ -2595,6 +2582,19 @@ static int cache_grow(struct kmem_cache *cachep,
if (!page)
goto failed;
 
+   n = get_node(cachep, nodeid);
+
+   /* Get colour for the slab, and cal the next value. */
+   n->colour_next++;
+   if (n->colour_next >= cachep->colour)
+   n->colour_next = 0;
+
+   offset = n->colour_next;
+   if (offset >= cachep->colour)
+   offset = 0;
+
+   offset *= cachep->colour_off;
+
/* Get slab management. */
freelist = alloc_slabmgmt(cachep, page, offset,
local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
-- 
1.9.1