Re: [PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration

2014-02-05 Thread Boris Ostrovsky

- srivatsa.b...@linux.vnet.ibm.com wrote:

> Subsystems that want to register CPU hotplug callbacks, as well as
> perform
> initialization for the CPUs that are already online, often do it as
> shown
> below:
> 
>   get_online_cpus();
> 
>   for_each_online_cpu(cpu)
>   init_cpu(cpu);
> 
>   register_cpu_notifier(_cpu_notifier);
> 
>   put_online_cpus();
> 
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running
> concurrently
> with CPU hotplug operations).
> 
> Interestingly, the balloon code in xen can actually prevent double
> initialization and hence can use the following simplified form of
> callback
> registration:
> 
>   register_cpu_notifier(_cpu_notifier);
> 
>   get_online_cpus();
> 
>   for_each_online_cpu(cpu)
>   init_cpu(cpu);
> 
>   put_online_cpus();
> 
> A hotplug operation that occurs between registering the notifier and
> calling
> get_online_cpus(), won't disrupt anything, because the code takes care
> to
> perform the memory allocations only once.
> 
> So reorganize the balloon code in xen this way to fix the deadlock
> with
> callback registration.
> 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> Cc: xen-de...@lists.xenproject.org
> Signed-off-by: Srivatsa S. Bhat 
> ---
> 
>  drivers/xen/balloon.c |   35 +++
>  1 file changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 37d06ea..afe1a3f 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned
> long start_pfn,
>   }
>  }
>  
> +static int alloc_balloon_scratch_page(int cpu)
> +{
> + if (per_cpu(balloon_scratch_page, cpu) != NULL)
> + return 0;
> +
> + per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> + if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> + pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +
>  static int balloon_cpu_notify(struct notifier_block *self,
>   unsigned long action, void *hcpu)
>  {
>   int cpu = (long)hcpu;
>   switch (action) {
>   case CPU_UP_PREPARE:
> - if (per_cpu(balloon_scratch_page, cpu) != NULL)
> - break;
> - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> - if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> - pr_warn("Failed to allocate balloon_scratch_page for 
> cpu %d\n",
> cpu);
> + if (alloc_balloon_scratch_page(cpu))
>   return NOTIFY_BAD;
> - }
>   break;
>   default:
>   break;
> @@ -624,15 +634,16 @@ static int __init balloon_init(void)
>   return -ENODEV;
>  
>   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> - for_each_online_cpu(cpu)
> - {
> - per_cpu(balloon_scratch_page, cpu) = 
> alloc_page(GFP_KERNEL);
> - if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> - pr_warn("Failed to allocate 
> balloon_scratch_page for cpu %d\n",
> cpu);
> + register_cpu_notifier(_cpu_notifier);
> +
> + get_online_cpus();
> + for_each_online_cpu(cpu) {
> + if (alloc_balloon_scratch_page(cpu)) {
> + put_online_cpus();
>   return -ENOMEM;


Not that original code was doing a particularly thorough job of cleaning up on 
allocation failure but if it couldn't get memory it would not register the 
notifier. So perhaps you should unregister it before returning here.

I am also not sure how we were susceptible to the deadlock here since we didn't 
call get_online_cpus(). (We probably should have but then commit description 
should say it).

-boris

>   }
>   }
> - register_cpu_notifier(_cpu_notifier);
> + put_online_cpus();
>   }
>  
>   pr_info("Initialising balloon driver\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration

2014-02-05 Thread Srivatsa S. Bhat
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Interestingly, the balloon code in xen can actually prevent double
initialization and hence can use the following simplified form of callback
registration:

register_cpu_notifier(_cpu_notifier);

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

put_online_cpus();

A hotplug operation that occurs between registering the notifier and calling
get_online_cpus(), won't disrupt anything, because the code takes care to
perform the memory allocations only once.

So reorganize the balloon code in xen this way to fix the deadlock with
callback registration.

Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: xen-de...@lists.xenproject.org
Signed-off-by: Srivatsa S. Bhat 
---

 drivers/xen/balloon.c |   35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 37d06ea..afe1a3f 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned long 
start_pfn,
}
 }
 
+static int alloc_balloon_scratch_page(int cpu)
+{
+   if (per_cpu(balloon_scratch_page, cpu) != NULL)
+   return 0;
+
+   per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
+   if (per_cpu(balloon_scratch_page, cpu) == NULL) {
+   pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", 
cpu);
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+
 static int balloon_cpu_notify(struct notifier_block *self,
unsigned long action, void *hcpu)
 {
int cpu = (long)hcpu;
switch (action) {
case CPU_UP_PREPARE:
-   if (per_cpu(balloon_scratch_page, cpu) != NULL)
-   break;
-   per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
-   if (per_cpu(balloon_scratch_page, cpu) == NULL) {
-   pr_warn("Failed to allocate balloon_scratch_page for 
cpu %d\n", cpu);
+   if (alloc_balloon_scratch_page(cpu))
return NOTIFY_BAD;
-   }
break;
default:
break;
@@ -624,15 +634,16 @@ static int __init balloon_init(void)
return -ENODEV;
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-   for_each_online_cpu(cpu)
-   {
-   per_cpu(balloon_scratch_page, cpu) = 
alloc_page(GFP_KERNEL);
-   if (per_cpu(balloon_scratch_page, cpu) == NULL) {
-   pr_warn("Failed to allocate 
balloon_scratch_page for cpu %d\n", cpu);
+   register_cpu_notifier(_cpu_notifier);
+
+   get_online_cpus();
+   for_each_online_cpu(cpu) {
+   if (alloc_balloon_scratch_page(cpu)) {
+   put_online_cpus();
return -ENOMEM;
}
}
-   register_cpu_notifier(_cpu_notifier);
+   put_online_cpus();
}
 
pr_info("Initialising balloon driver\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration

2014-02-05 Thread Srivatsa S. Bhat
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Interestingly, the balloon code in xen can actually prevent double
initialization and hence can use the following simplified form of callback
registration:

register_cpu_notifier(foobar_cpu_notifier);

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

put_online_cpus();

A hotplug operation that occurs between registering the notifier and calling
get_online_cpus(), won't disrupt anything, because the code takes care to
perform the memory allocations only once.

So reorganize the balloon code in xen this way to fix the deadlock with
callback registration.

Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: David Vrabel david.vra...@citrix.com
Cc: xen-de...@lists.xenproject.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/xen/balloon.c |   35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 37d06ea..afe1a3f 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned long 
start_pfn,
}
 }
 
+static int alloc_balloon_scratch_page(int cpu)
+{
+   if (per_cpu(balloon_scratch_page, cpu) != NULL)
+   return 0;
+
+   per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
+   if (per_cpu(balloon_scratch_page, cpu) == NULL) {
+   pr_warn(Failed to allocate balloon_scratch_page for cpu %d\n, 
cpu);
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+
 static int balloon_cpu_notify(struct notifier_block *self,
unsigned long action, void *hcpu)
 {
int cpu = (long)hcpu;
switch (action) {
case CPU_UP_PREPARE:
-   if (per_cpu(balloon_scratch_page, cpu) != NULL)
-   break;
-   per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
-   if (per_cpu(balloon_scratch_page, cpu) == NULL) {
-   pr_warn(Failed to allocate balloon_scratch_page for 
cpu %d\n, cpu);
+   if (alloc_balloon_scratch_page(cpu))
return NOTIFY_BAD;
-   }
break;
default:
break;
@@ -624,15 +634,16 @@ static int __init balloon_init(void)
return -ENODEV;
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-   for_each_online_cpu(cpu)
-   {
-   per_cpu(balloon_scratch_page, cpu) = 
alloc_page(GFP_KERNEL);
-   if (per_cpu(balloon_scratch_page, cpu) == NULL) {
-   pr_warn(Failed to allocate 
balloon_scratch_page for cpu %d\n, cpu);
+   register_cpu_notifier(balloon_cpu_notifier);
+
+   get_online_cpus();
+   for_each_online_cpu(cpu) {
+   if (alloc_balloon_scratch_page(cpu)) {
+   put_online_cpus();
return -ENOMEM;
}
}
-   register_cpu_notifier(balloon_cpu_notifier);
+   put_online_cpus();
}
 
pr_info(Initialising balloon driver\n);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration

2014-02-05 Thread Boris Ostrovsky

- srivatsa.b...@linux.vnet.ibm.com wrote:

 Subsystems that want to register CPU hotplug callbacks, as well as
 perform
 initialization for the CPUs that are already online, often do it as
 shown
 below:
 
   get_online_cpus();
 
   for_each_online_cpu(cpu)
   init_cpu(cpu);
 
   register_cpu_notifier(foobar_cpu_notifier);
 
   put_online_cpus();
 
 This is wrong, since it is prone to ABBA deadlocks involving the
 cpu_add_remove_lock and the cpu_hotplug.lock (when running
 concurrently
 with CPU hotplug operations).
 
 Interestingly, the balloon code in xen can actually prevent double
 initialization and hence can use the following simplified form of
 callback
 registration:
 
   register_cpu_notifier(foobar_cpu_notifier);
 
   get_online_cpus();
 
   for_each_online_cpu(cpu)
   init_cpu(cpu);
 
   put_online_cpus();
 
 A hotplug operation that occurs between registering the notifier and
 calling
 get_online_cpus(), won't disrupt anything, because the code takes care
 to
 perform the memory allocations only once.
 
 So reorganize the balloon code in xen this way to fix the deadlock
 with
 callback registration.
 
 Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Cc: Boris Ostrovsky boris.ostrov...@oracle.com
 Cc: David Vrabel david.vra...@citrix.com
 Cc: xen-de...@lists.xenproject.org
 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 ---
 
  drivers/xen/balloon.c |   35 +++
  1 file changed, 23 insertions(+), 12 deletions(-)
 
 diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
 index 37d06ea..afe1a3f 100644
 --- a/drivers/xen/balloon.c
 +++ b/drivers/xen/balloon.c
 @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned
 long start_pfn,
   }
  }
  
 +static int alloc_balloon_scratch_page(int cpu)
 +{
 + if (per_cpu(balloon_scratch_page, cpu) != NULL)
 + return 0;
 +
 + per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
 + if (per_cpu(balloon_scratch_page, cpu) == NULL) {
 + pr_warn(Failed to allocate balloon_scratch_page for cpu %d\n,
 cpu);
 + return -ENOMEM;
 + }
 +
 + return 0;
 +}
 +
 +
  static int balloon_cpu_notify(struct notifier_block *self,
   unsigned long action, void *hcpu)
  {
   int cpu = (long)hcpu;
   switch (action) {
   case CPU_UP_PREPARE:
 - if (per_cpu(balloon_scratch_page, cpu) != NULL)
 - break;
 - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
 - if (per_cpu(balloon_scratch_page, cpu) == NULL) {
 - pr_warn(Failed to allocate balloon_scratch_page for 
 cpu %d\n,
 cpu);
 + if (alloc_balloon_scratch_page(cpu))
   return NOTIFY_BAD;
 - }
   break;
   default:
   break;
 @@ -624,15 +634,16 @@ static int __init balloon_init(void)
   return -ENODEV;
  
   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 - for_each_online_cpu(cpu)
 - {
 - per_cpu(balloon_scratch_page, cpu) = 
 alloc_page(GFP_KERNEL);
 - if (per_cpu(balloon_scratch_page, cpu) == NULL) {
 - pr_warn(Failed to allocate 
 balloon_scratch_page for cpu %d\n,
 cpu);
 + register_cpu_notifier(balloon_cpu_notifier);
 +
 + get_online_cpus();
 + for_each_online_cpu(cpu) {
 + if (alloc_balloon_scratch_page(cpu)) {
 + put_online_cpus();
   return -ENOMEM;


Not that original code was doing a particularly thorough job of cleaning up on 
allocation failure but if it couldn't get memory it would not register the 
notifier. So perhaps you should unregister it before returning here.

I am also not sure how we were susceptible to the deadlock here since we didn't 
call get_online_cpus(). (We probably should have but then commit description 
should say it).

-boris

   }
   }
 - register_cpu_notifier(balloon_cpu_notifier);
 + put_online_cpus();
   }
  
   pr_info(Initialising balloon driver\n);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/