Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib



On 04/04/2018 12:16 AM, Rao Shoaib wrote:



On 04/03/2018 07:23 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:

On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.

I have just familiarized myself with what IDR is by reading your 
article. If

I am incorrect please correct me.

The list and head you have pointed are only used  if the container 
can not
be allocated. That could happen with IDR as well. Note that the 
containers

are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.


IDR seems to have some overhead, such as I have to specifically add the
pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.
The only overhead is a pointer that points to the head and an int to 
keep count. If I use idr, I would have to allocate an struct idr which 
is much larger. idr_alloc()/idr_destroy() operations are much more 
costly than updating two pointers. As the pointers are stored in 
slots/nodes corresponding to the id, I would  have to retrieve the 
pointers by calling idr_remove() to pass them to be freed, the 
slots/nodes would constantly be allocated and freed.


IDR is a very useful interface for allocating/managing ID's but I 
really do not see the justification for using it over here, perhaps 
you can elaborate more on the benefits and also on how I can just pass 
the array to be freed.


Shoaib

I may have mis-understood your comment. You are probably suggesting that 
I use IDR instead of allocating following containers.


+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;


IDR uses radix_tree_node which allocates following two arrays. since I 
do not need any ID's why not just use the radix_tree_node directly, but 
I do not need a radix tree either, so why not just use an array. That is 
what I am doing.


void __rcu  *slots[RADIX_TREE_MAP_SIZE];
unsigned long   tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS]; ==> Not 
needed


As far as allocation failure is concerned, the allocation has to be done 
at run time. If the allocation of a container can fail, so can the 
allocation of radix_tree_node as it also requires memory.


I really do not see any advantages of using IDR. The structure I have is 
much simpler and does exactly what I need.


Shoaib




Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib



On 04/04/2018 12:16 AM, Rao Shoaib wrote:



On 04/03/2018 07:23 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:

On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.

I have just familiarized myself with what IDR is by reading your 
article. If

I am incorrect please correct me.

The list and head you have pointed are only used  if the container 
can not
be allocated. That could happen with IDR as well. Note that the 
containers

are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.


IDR seems to have some overhead, such as I have to specifically add the
pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.
The only overhead is a pointer that points to the head and an int to 
keep count. If I use idr, I would have to allocate an struct idr which 
is much larger. idr_alloc()/idr_destroy() operations are much more 
costly than updating two pointers. As the pointers are stored in 
slots/nodes corresponding to the id, I would  have to retrieve the 
pointers by calling idr_remove() to pass them to be freed, the 
slots/nodes would constantly be allocated and freed.


IDR is a very useful interface for allocating/managing ID's but I 
really do not see the justification for using it over here, perhaps 
you can elaborate more on the benefits and also on how I can just pass 
the array to be freed.


Shoaib

I may have mis-understood your comment. You are probably suggesting that 
I use IDR instead of allocating following containers.


+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;


IDR uses radix_tree_node which allocates following two arrays. since I 
do not need any ID's why not just use the radix_tree_node directly, but 
I do not need a radix tree either, so why not just use an array. That is 
what I am doing.


void __rcu  *slots[RADIX_TREE_MAP_SIZE];
unsigned long   tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS]; ==> Not 
needed


As far as allocation failure is concerned, the allocation has to be done 
at run time. If the allocation of a container can fail, so can the 
allocation of radix_tree_node as it also requires memory.


I really do not see any advantages of using IDR. The structure I have is 
much simpler and does exactly what I need.


Shoaib




Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib



On 04/02/2018 10:20 AM, Christopher Lameter wrote:

On Sun, 1 Apr 2018, rao.sho...@oracle.com wrote:


kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

It would be even better if this approach could also use

kmem_cache_free_bulk()

or

kfree_bulk()
Sorry I do not understand your comment. The patch is using kfree_bulk() 
which is an inline function.


Shoaib


Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib



On 04/02/2018 10:20 AM, Christopher Lameter wrote:

On Sun, 1 Apr 2018, rao.sho...@oracle.com wrote:


kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

It would be even better if this approach could also use

kmem_cache_free_bulk()

or

kfree_bulk()
Sorry I do not understand your comment. The patch is using kfree_bulk() 
which is an inline function.


Shoaib


Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib



On 04/03/2018 07:23 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:

On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.


I have just familiarized myself with what IDR is by reading your article. If
I am incorrect please correct me.

The list and head you have pointed are only used  if the container can not
be allocated. That could happen with IDR as well. Note that the containers
are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.


IDR seems to have some overhead, such as I have to specifically add the
pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.
The only overhead is a pointer that points to the head and an int to 
keep count. If I use idr, I would have to allocate an struct idr which 
is much larger. idr_alloc()/idr_destroy() operations are much more 
costly than updating two pointers. As the pointers are stored in 
slots/nodes corresponding to the id, I would  have to retrieve the 
pointers by calling idr_remove() to pass them to be freed, the 
slots/nodes would constantly be allocated and freed.


IDR is a very useful interface for allocating/managing ID's but I really 
do not see the justification for using it over here, perhaps you can 
elaborate more on the benefits and also on how I can just pass the array 
to be freed.


Shoaib



Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib



On 04/03/2018 07:23 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:

On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.


I have just familiarized myself with what IDR is by reading your article. If
I am incorrect please correct me.

The list and head you have pointed are only used  if the container can not
be allocated. That could happen with IDR as well. Note that the containers
are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.


IDR seems to have some overhead, such as I have to specifically add the
pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.
The only overhead is a pointer that points to the head and an int to 
keep count. If I use idr, I would have to allocate an struct idr which 
is much larger. idr_alloc()/idr_destroy() operations are much more 
costly than updating two pointers. As the pointers are stored in 
slots/nodes corresponding to the id, I would  have to retrieve the 
pointers by calling idr_remove() to pass them to be freed, the 
slots/nodes would constantly be allocated and freed.


IDR is a very useful interface for allocating/managing ID's but I really 
do not see the justification for using it over here, perhaps you can 
elaborate more on the benefits and also on how I can just pass the array 
to be freed.


Shoaib



Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread Matthew Wilcox
On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:
> On 04/03/2018 01:58 PM, Matthew Wilcox wrote:
> > I think you might be better off with an IDR.  The IDR can always
> > contain one entry, so there's no need for this 'rbf_list_head' or
> > __rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
> > an array (if that array can be allocated), so it's compatible with the
> > kfree_bulk() interface.
> > 
> I have just familiarized myself with what IDR is by reading your article. If
> I am incorrect please correct me.
> 
> The list and head you have pointed are only used  if the container can not
> be allocated. That could happen with IDR as well. Note that the containers
> are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.

> IDR seems to have some overhead, such as I have to specifically add the
> pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.

IDR has a bulk-free option (idr_destroy()), but it doesn't have a get-bulk
function yet.  I think that's a relatively straightforward function to
add ...

/*
 * Return: number of elements pointed to by 'ptrs'.
 */
int idr_get_bulk(struct idr *idr, void __rcu ***ptrs, u32 *start)
{
struct radix_tree_iter iter;
void __rcu **slot;
unsigned long base = idr->idr_base;
unsigned long id = *start;

id = (id < base) ? 0 : id - base;
slot = radix_tree_iter_find(>idr_rt, , id);
if (!slot)
return 0;
*start = iter.index + base;
*ptrs = slot;
return iter.next_index - iter.index;
}

(completely untested, but you get the idea.  For your case, it's just
going to return a pointer to the first slot).

> The change would also require retesting. So I would like to keep the current
> design.

That's not how review works.


Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread Matthew Wilcox
On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:
> On 04/03/2018 01:58 PM, Matthew Wilcox wrote:
> > I think you might be better off with an IDR.  The IDR can always
> > contain one entry, so there's no need for this 'rbf_list_head' or
> > __rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
> > an array (if that array can be allocated), so it's compatible with the
> > kfree_bulk() interface.
> > 
> I have just familiarized myself with what IDR is by reading your article. If
> I am incorrect please correct me.
> 
> The list and head you have pointed are only used  if the container can not
> be allocated. That could happen with IDR as well. Note that the containers
> are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.

> IDR seems to have some overhead, such as I have to specifically add the
> pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.

IDR has a bulk-free option (idr_destroy()), but it doesn't have a get-bulk
function yet.  I think that's a relatively straightforward function to
add ...

/*
 * Return: number of elements pointed to by 'ptrs'.
 */
int idr_get_bulk(struct idr *idr, void __rcu ***ptrs, u32 *start)
{
struct radix_tree_iter iter;
void __rcu **slot;
unsigned long base = idr->idr_base;
unsigned long id = *start;

id = (id < base) ? 0 : id - base;
slot = radix_tree_iter_find(>idr_rt, , id);
if (!slot)
return 0;
*start = iter.index + base;
*ptrs = slot;
return iter.next_index - iter.index;
}

(completely untested, but you get the idea.  For your case, it's just
going to return a pointer to the first slot).

> The change would also require retesting. So I would like to keep the current
> design.

That's not how review works.


Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread Rao Shoaib


On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 10:22:53AM -0700, rao.sho...@oracle.com wrote:

+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
  } kmalloc_info[];
  
+#define	RCU_MAX_ACCUMULATE_SIZE	25

+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.

I have just familiarized myself with what IDR is by reading your 
article. If I am incorrect please correct me.


The list and head you have pointed are only used  if the container can 
not be allocated. That could happen with IDR as well. Note that the 
containers are allocated at boot time and are re-used.


IDR seems to have some overhead, such as I have to specifically add the 
pointer and free the ID, plus radix tree maintenance.


The change would also require retesting. So I would like to keep the 
current design.


Regards,

Shoaib





Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread Rao Shoaib


On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 10:22:53AM -0700, rao.sho...@oracle.com wrote:

+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
  } kmalloc_info[];
  
+#define	RCU_MAX_ACCUMULATE_SIZE	25

+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.

I have just familiarized myself with what IDR is by reading your 
article. If I am incorrect please correct me.


The list and head you have pointed are only used  if the container can 
not be allocated. That could happen with IDR as well. Note that the 
containers are allocated at boot time and are re-used.


IDR seems to have some overhead, such as I have to specifically add the 
pointer and free the ID, plus radix tree maintenance.


The change would also require retesting. So I would like to keep the 
current design.


Regards,

Shoaib





Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread Matthew Wilcox
On Tue, Apr 03, 2018 at 10:22:53AM -0700, rao.sho...@oracle.com wrote:
> +++ b/mm/slab.h
> @@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
>   unsigned long size;
>  } kmalloc_info[];
>  
> +#define  RCU_MAX_ACCUMULATE_SIZE 25
> +
> +struct rcu_bulk_free_container {
> + struct  rcu_head rbfc_rcu;
> + int rbfc_entries;
> + void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
> + struct  rcu_bulk_free *rbfc_rbf;
> +};
> +
> +struct rcu_bulk_free {
> + struct  rcu_head rbf_rcu; /* used to schedule monitor process */
> + spinlock_t  rbf_lock;
> + struct  rcu_bulk_free_container *rbf_container;
> + struct  rcu_bulk_free_container *rbf_cached_container;
> + struct  rcu_head *rbf_list_head;
> + int rbf_list_size;
> + int rbf_cpu;
> + int rbf_empty;
> + int rbf_polled;
> + boolrbf_init;
> + boolrbf_monitor;
> +};

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.



Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread Matthew Wilcox
On Tue, Apr 03, 2018 at 10:22:53AM -0700, rao.sho...@oracle.com wrote:
> +++ b/mm/slab.h
> @@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
>   unsigned long size;
>  } kmalloc_info[];
>  
> +#define  RCU_MAX_ACCUMULATE_SIZE 25
> +
> +struct rcu_bulk_free_container {
> + struct  rcu_head rbfc_rcu;
> + int rbfc_entries;
> + void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
> + struct  rcu_bulk_free *rbfc_rbf;
> +};
> +
> +struct rcu_bulk_free {
> + struct  rcu_head rbf_rcu; /* used to schedule monitor process */
> + spinlock_t  rbf_lock;
> + struct  rcu_bulk_free_container *rbf_container;
> + struct  rcu_bulk_free_container *rbf_cached_container;
> + struct  rcu_head *rbf_list_head;
> + int rbf_list_size;
> + int rbf_cpu;
> + int rbf_empty;
> + int rbf_polled;
> + boolrbf_init;
> + boolrbf_monitor;
> +};

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.



[PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread rao . shoaib
From: Rao Shoaib 

kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

Signed-off-by: Rao Shoaib 
---
 include/linux/mm.h   |   5 ++
 include/linux/rcupdate.h |   4 +-
 include/linux/rcutiny.h  |   8 ++-
 kernel/sysctl.c  |  40 
 mm/slab.h|  23 +++
 mm/slab_common.c | 166 ++-
 6 files changed, 242 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..fb1e54c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2673,5 +2673,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6338fb6..102a93f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,8 +55,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
-/* only for use by kfree_call_rcu() */
-void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
@@ -210,6 +208,8 @@ do { \
 
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 #include 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 #elif defined(CONFIG_TINY_RCU)
 #include 
 #else
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec..b9e9025 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..ab70c99 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1650,6 +1650,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index 5181323..a332ea6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container 

[PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread rao . shoaib
From: Rao Shoaib 

kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

Signed-off-by: Rao Shoaib 
---
 include/linux/mm.h   |   5 ++
 include/linux/rcupdate.h |   4 +-
 include/linux/rcutiny.h  |   8 ++-
 kernel/sysctl.c  |  40 
 mm/slab.h|  23 +++
 mm/slab_common.c | 166 ++-
 6 files changed, 242 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..fb1e54c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2673,5 +2673,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6338fb6..102a93f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,8 +55,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
-/* only for use by kfree_call_rcu() */
-void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
@@ -210,6 +208,8 @@ do { \
 
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 #include 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 #elif defined(CONFIG_TINY_RCU)
 #include 
 #else
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec..b9e9025 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..ab70c99 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1650,6 +1650,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index 5181323..a332ea6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head 

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-02 Thread Christopher Lameter
On Sun, 1 Apr 2018, rao.sho...@oracle.com wrote:

> kfree_rcu() should use the new kfree_bulk() interface for freeing
> rcu structures as it is more efficient.

It would be even better if this approach could also use

kmem_cache_free_bulk()

or

kfree_bulk()


Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-02 Thread Christopher Lameter
On Sun, 1 Apr 2018, rao.sho...@oracle.com wrote:

> kfree_rcu() should use the new kfree_bulk() interface for freeing
> rcu structures as it is more efficient.

It would be even better if this approach could also use

kmem_cache_free_bulk()

or

kfree_bulk()


Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-02 Thread kbuild test robot
Hi Rao,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on rcu/rcu/next]
[also build test ERROR on v4.16 next-20180329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/rao-shoaib-oracle-com/Move-kfree_rcu-out-of-rcu-code-and-use-kfree_bulk/20180402-135939
base:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
rcu/next
config: x86_64-randconfig-x010-201813 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/linux/rcupdate.h:214:0,
from include/linux/srcu.h:33,
from include/linux/notifier.h:16,
from include/linux/memory_hotplug.h:7,
from include/linux/mmzone.h:775,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from include/linux/crypto.h:24,
from arch/x86/kernel/asm-offsets.c:9:
>> include/linux/rcutiny.h:87:20: error: static declaration of 'call_rcu_lazy' 
>> follows non-static declaration
static inline void call_rcu_lazy(struct rcu_head *head,
   ^
   In file included from include/linux/srcu.h:33:0,
from include/linux/notifier.h:16,
from include/linux/memory_hotplug.h:7,
from include/linux/mmzone.h:775,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from include/linux/crypto.h:24,
from arch/x86/kernel/asm-offsets.c:9:
   include/linux/rcupdate.h:59:6: note: previous declaration of 'call_rcu_lazy' 
was here
void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 ^
   make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +/call_rcu_lazy +87 include/linux/rcutiny.h

86  
  > 87  static inline void call_rcu_lazy(struct rcu_head *head,
88   rcu_callback_t func)
89  {
90  call_rcu(head, func);
91  }
92  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-02 Thread kbuild test robot
Hi Rao,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on rcu/rcu/next]
[also build test ERROR on v4.16 next-20180329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/rao-shoaib-oracle-com/Move-kfree_rcu-out-of-rcu-code-and-use-kfree_bulk/20180402-135939
base:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
rcu/next
config: x86_64-randconfig-x010-201813 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/linux/rcupdate.h:214:0,
from include/linux/srcu.h:33,
from include/linux/notifier.h:16,
from include/linux/memory_hotplug.h:7,
from include/linux/mmzone.h:775,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from include/linux/crypto.h:24,
from arch/x86/kernel/asm-offsets.c:9:
>> include/linux/rcutiny.h:87:20: error: static declaration of 'call_rcu_lazy' 
>> follows non-static declaration
static inline void call_rcu_lazy(struct rcu_head *head,
   ^
   In file included from include/linux/srcu.h:33:0,
from include/linux/notifier.h:16,
from include/linux/memory_hotplug.h:7,
from include/linux/mmzone.h:775,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from include/linux/crypto.h:24,
from arch/x86/kernel/asm-offsets.c:9:
   include/linux/rcupdate.h:59:6: note: previous declaration of 'call_rcu_lazy' 
was here
void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 ^
   make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +/call_rcu_lazy +87 include/linux/rcutiny.h

86  
  > 87  static inline void call_rcu_lazy(struct rcu_head *head,
88   rcu_callback_t func)
89  {
90  call_rcu(head, func);
91  }
92  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-01 Thread rao . shoaib
From: Rao Shoaib 

kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

Signed-off-by: Rao Shoaib 
---
 include/linux/mm.h  |   5 ++
 include/linux/rcutiny.h |   8 ++-
 kernel/sysctl.c |  40 
 mm/slab.h   |  23 +++
 mm/slab_common.c| 164 +++-
 5 files changed, 238 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..fb1e54c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2673,5 +2673,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec..b9e9025 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..ab70c99 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1650,6 +1650,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index 5181323..a332ea6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};
+
 #ifndef CONFIG_SLOB
 /* Kmalloc array related functions */
 void setup_kmalloc_cache_index_table(void);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2ea9866..6e8afff 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -1525,13 +1526,174 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
+static DEFINE_PER_CPU(struct rcu_bulk_free, cpu_rbf);
+
+/* drain if atleast these many objects */
+int sysctl_kfree_rcu_drain_limit __read_mostly = 10;
+
+/* time to 

[PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-01 Thread rao . shoaib
From: Rao Shoaib 

kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

Signed-off-by: Rao Shoaib 
---
 include/linux/mm.h  |   5 ++
 include/linux/rcutiny.h |   8 ++-
 kernel/sysctl.c |  40 
 mm/slab.h   |  23 +++
 mm/slab_common.c| 164 +++-
 5 files changed, 238 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..fb1e54c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2673,5 +2673,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec..b9e9025 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..ab70c99 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1650,6 +1650,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index 5181323..a332ea6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};
+
 #ifndef CONFIG_SLOB
 /* Kmalloc array related functions */
 void setup_kmalloc_cache_index_table(void);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2ea9866..6e8afff 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -1525,13 +1526,174 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
+static DEFINE_PER_CPU(struct rcu_bulk_free, cpu_rbf);
+
+/* drain if atleast these many objects */
+int sysctl_kfree_rcu_drain_limit __read_mostly = 10;
+
+/* time to poll if fewer than drain_limit */
+int