Hi,

On 12/04/18 22:31, Stefano Stabellini wrote:
On Thu, 12 Apr 2018, Julien Grall wrote:
On 12/04/18 00:46, Stefano Stabellini wrote:
On Wed, 11 Apr 2018, Julien Grall wrote:
On 11/04/18 14:19, Mirela Simonovic wrote:
Freeing percpu area is done when a non-boot CPU is disabled upon
suspend.
This use to be scheduled for execution after a period of time, what
caused
the following racing issues. If CPU is enabled after it is disabled and
before the freeing of percpu area is performed, Xen would crash upon
initializing percpu area because per cpu offset is not marked as
INVALID_PERCPU_AREA (this suppose to happen when cpu area is freed).
To resolve the racing issue, free percpu area right away instead
scheduling it for later.


The reason of using the RCU is you want to make sure that none of the
other
CPUs will access that percpu data before freeing it. So I don't think this
patch is valid.

It looks like to me a rcu barrier is missing after calling cpu_down
somewhere
in the CPU off path. I am not entirely sure where.

We need a rcu_barrier(). Perhaps, it could be added on cpu_on before
initializing the percpu area?

Do you mind giving a bit more details on your thought? cpu_up looks a strange
place as no one should access the percpu area after the CPU is down. So it
feels the rcu_barrier should be somewhere before PSCI_cpu_off is called.

Yes, it feels strange to do it on cpu_on, it would be more obvious on
cpu_off, but we don't actually need to _free_percpu_area on cpu_off,
right? We only need to make sure it is done before cpu_percpu_callback
is called on cpu_on.

My suggestion would be to evaluate if it is possible to introduce the
rcu_barrier() on the resume path before cpu_percpu_callback, maybe in
start_secondary.

Well, cpu_percpu_callback is not called by start_secondary. It is called when preparing the CPU from another CPU. So anything in start_secondary will not work.


I was also looking at xen/arch/x86/acpi/power.c:enter_state and noticed
that they chose to call rcu_barrier() on enable_cpu before
enable_nonboot_cpus().

I guess the rcu_barrier() in the function handling suspend/resume works. But that doesn't cover the hotplug case. Looking at x86, suspend/resume case. For the hotplug case, there are an rcu_barrier in cpu_{up,down}_helper but they are only present in the case of cpu_{up,down} failed. I am not entirely sure how this is handled in x86

Andrew, Jan, do you know when the percpu will be free on hotplug? It is call to call_rcu(...) but I am not sure when this is going to be executed.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to