Re: [linux-linus test] 183794: regressions - FAIL

2023-11-23 Thread Julien Grall

Hi Juergen,

On 23/11/2023 05:57, Juergen Gross wrote:

On 23.11.23 00:07, Stefano Stabellini wrote:

On Wed, 22 Nov 2023, Juergen Gross wrote:

On 22.11.23 04:07, Stefano Stabellini wrote:

On Mon, 20 Nov 2023, Stefano Stabellini wrote:

On Mon, 20 Nov 2023, Juergen Gross wrote:

On 20.11.23 03:21, osstest service owner wrote:

flight 183794 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183794/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
    test-arm64-arm64-examine  8 reboot   fail 
REGR.

vs.
183766


I'm seeing the following in the serial log:

Nov 20 00:25:41.586712 [    0.567318] kernel BUG at
arch/arm64/xen/../../arm/xen/enlighten.c:164!
Nov 20 00:25:41.598711 [    0.574002] Internal error: Oops - BUG:
f2000800 [#1] PREEMPT SMP

The related source code lines in the kernel are:

err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info,
xen_vcpu_nr(cpu),
 );
BUG_ON(err);

I suspect commit 20f3b8eafe0ba to be the culprit.

Stefano, could you please have a look?


The good news and bad news is that I cannot repro this neither with nor
without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba 
but
I cannot see anything wrong with it. Looking at the register dump, 
from:


x0 : fffa

I am guessing the error was -ENXIO which is returned from 
map_guest_area

in Xen.

Could it be that the struct is crossing a page boundary? Or that it is
not 64-bit aligned? Do we need to do something like the following?

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 9afdc4c4a5dc..5326070c5dc0 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -484,7 +485,7 @@ static int __init xen_guest_init(void)
    * for secondary CPUs as they are brought up.
    * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
    */
-    xen_vcpu_info = alloc_percpu(struct vcpu_info);
+    xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE);
   if (xen_vcpu_info == NULL)
   return -ENOMEM;


May I suggest to use a smaller alignment? What about:

1 << fls(sizeof(struct vcpu_info) - 1)


See below

---
[PATCH] arm/xen: fix xen_vcpu_info allocation alignment


Stefano, are you going to submit the patch formally?



xen_vcpu_info is a percpu area than needs to be mapped by Xen.
Currently, it could cross a page boundary resulting in Xen being unable
to map it:

[    0.567318] kernel BUG at 
arch/arm64/xen/../../arm/xen/enlighten.c:164!
[    0.574002] Internal error: Oops - BUG: f2000800 [#1] 
PREEMPT SMP


Fix the issue by using __alloc_percpu and requesting alignment for the
memory allocation.

Signed-off-by: Stefano Stabellini 


I am guessing we want to backport it. So should this contain a tag to 
indicate the intention?




diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 9afdc4c4a5dc..09eb74a07dfc 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -484,7 +484,8 @@ static int __init xen_guest_init(void)
   * for secondary CPUs as they are brought up.
   * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
   */
-    xen_vcpu_info = alloc_percpu(struct vcpu_info);
+    xen_vcpu_info = __alloc_percpu(sizeof(struct vcpu_info),
+   1 << fls(sizeof(struct vcpu_info) 
- 1));


Nit: one tab less, please (can be fixed while committing).


  if (xen_vcpu_info == NULL)
  return -ENOMEM;


Reviewed-by: Juergen Gross 


Juergen, looking at the x86 code, you seem to use DEFINE_PER_CPU(). So 
what guarantees that this is not going to cross a page?


Cheers,

--
Julien Grall



Re: [linux-linus test] 183794: regressions - FAIL

2023-11-22 Thread Juergen Gross

On 23.11.23 00:07, Stefano Stabellini wrote:

On Wed, 22 Nov 2023, Juergen Gross wrote:

On 22.11.23 04:07, Stefano Stabellini wrote:

On Mon, 20 Nov 2023, Stefano Stabellini wrote:

On Mon, 20 Nov 2023, Juergen Gross wrote:

On 20.11.23 03:21, osstest service owner wrote:

flight 183794 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183794/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
test-arm64-arm64-examine  8 reboot   fail REGR.
vs.
183766


I'm seeing the following in the serial log:

Nov 20 00:25:41.586712 [0.567318] kernel BUG at
arch/arm64/xen/../../arm/xen/enlighten.c:164!
Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG:
f2000800 [#1] PREEMPT SMP

The related source code lines in the kernel are:

err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info,
xen_vcpu_nr(cpu),
 );
BUG_ON(err);

I suspect commit 20f3b8eafe0ba to be the culprit.

Stefano, could you please have a look?


The good news and bad news is that I cannot repro this neither with nor
without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but
I cannot see anything wrong with it. Looking at the register dump, from:

x0 : fffa

I am guessing the error was -ENXIO which is returned from map_guest_area
in Xen.

Could it be that the struct is crossing a page boundary? Or that it is
not 64-bit aligned? Do we need to do something like the following?

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 9afdc4c4a5dc..5326070c5dc0 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -484,7 +485,7 @@ static int __init xen_guest_init(void)
 * for secondary CPUs as they are brought up.
 * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
 */
-   xen_vcpu_info = alloc_percpu(struct vcpu_info);
+   xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE);
if (xen_vcpu_info == NULL)
return -ENOMEM;
   


May I suggest to use a smaller alignment? What about:

1 << fls(sizeof(struct vcpu_info) - 1)


See below

---
[PATCH] arm/xen: fix xen_vcpu_info allocation alignment

xen_vcpu_info is a percpu area than needs to be mapped by Xen.
Currently, it could cross a page boundary resulting in Xen being unable
to map it:

[0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164!
[0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP

Fix the issue by using __alloc_percpu and requesting alignment for the
memory allocation.

Signed-off-by: Stefano Stabellini 

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 9afdc4c4a5dc..09eb74a07dfc 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -484,7 +484,8 @@ static int __init xen_guest_init(void)
 * for secondary CPUs as they are brought up.
 * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
 */
-   xen_vcpu_info = alloc_percpu(struct vcpu_info);
+   xen_vcpu_info = __alloc_percpu(sizeof(struct vcpu_info),
+  1 << fls(sizeof(struct 
vcpu_info) - 1));


Nit: one tab less, please (can be fixed while committing).


if (xen_vcpu_info == NULL)
return -ENOMEM;
  


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [linux-linus test] 183794: regressions - FAIL

2023-11-22 Thread Stefano Stabellini
On Wed, 22 Nov 2023, Juergen Gross wrote:
> On 22.11.23 04:07, Stefano Stabellini wrote:
> > On Mon, 20 Nov 2023, Stefano Stabellini wrote:
> > > On Mon, 20 Nov 2023, Juergen Gross wrote:
> > > > On 20.11.23 03:21, osstest service owner wrote:
> > > > > flight 183794 linux-linus real [real]
> > > > > http://logs.test-lab.xenproject.org/osstest/logs/183794/
> > > > > 
> > > > > Regressions :-(
> > > > > 
> > > > > Tests which did not succeed and are blocking,
> > > > > including tests which could not be run:
> > > > >test-arm64-arm64-examine  8 reboot   fail REGR.
> > > > > vs.
> > > > > 183766
> > > > 
> > > > I'm seeing the following in the serial log:
> > > > 
> > > > Nov 20 00:25:41.586712 [0.567318] kernel BUG at
> > > > arch/arm64/xen/../../arm/xen/enlighten.c:164!
> > > > Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG:
> > > > f2000800 [#1] PREEMPT SMP
> > > > 
> > > > The related source code lines in the kernel are:
> > > > 
> > > > err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info,
> > > > xen_vcpu_nr(cpu),
> > > >  );
> > > > BUG_ON(err);
> > > > 
> > > > I suspect commit 20f3b8eafe0ba to be the culprit.
> > > > 
> > > > Stefano, could you please have a look?
> > 
> > The good news and bad news is that I cannot repro this neither with nor
> > without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but
> > I cannot see anything wrong with it. Looking at the register dump, from:
> > 
> > x0 : fffa
> > 
> > I am guessing the error was -ENXIO which is returned from map_guest_area
> > in Xen.
> > 
> > Could it be that the struct is crossing a page boundary? Or that it is
> > not 64-bit aligned? Do we need to do something like the following?
> > 
> > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > index 9afdc4c4a5dc..5326070c5dc0 100644
> > --- a/arch/arm/xen/enlighten.c
> > +++ b/arch/arm/xen/enlighten.c
> > @@ -484,7 +485,7 @@ static int __init xen_guest_init(void)
> >  * for secondary CPUs as they are brought up.
> >  * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
> >  */
> > -   xen_vcpu_info = alloc_percpu(struct vcpu_info);
> > +   xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE);
> > if (xen_vcpu_info == NULL)
> > return -ENOMEM;
> >   
> 
> May I suggest to use a smaller alignment? What about:
> 
> 1 << fls(sizeof(struct vcpu_info) - 1)

See below

---
[PATCH] arm/xen: fix xen_vcpu_info allocation alignment

xen_vcpu_info is a percpu area than needs to be mapped by Xen.
Currently, it could cross a page boundary resulting in Xen being unable
to map it:

[0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164!
[0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP

Fix the issue by using __alloc_percpu and requesting alignment for the
memory allocation.

Signed-off-by: Stefano Stabellini 

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 9afdc4c4a5dc..09eb74a07dfc 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -484,7 +484,8 @@ static int __init xen_guest_init(void)
 * for secondary CPUs as they are brought up.
 * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
 */
-   xen_vcpu_info = alloc_percpu(struct vcpu_info);
+   xen_vcpu_info = __alloc_percpu(sizeof(struct vcpu_info),
+  1 << fls(sizeof(struct 
vcpu_info) - 1));
if (xen_vcpu_info == NULL)
return -ENOMEM;
 

Re: [linux-linus test] 183794: regressions - FAIL

2023-11-22 Thread Juergen Gross

On 22.11.23 04:07, Stefano Stabellini wrote:

On Mon, 20 Nov 2023, Stefano Stabellini wrote:

On Mon, 20 Nov 2023, Juergen Gross wrote:

On 20.11.23 03:21, osstest service owner wrote:

flight 183794 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183794/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
   test-arm64-arm64-examine  8 reboot   fail REGR. vs.
183766


I'm seeing the following in the serial log:

Nov 20 00:25:41.586712 [0.567318] kernel BUG at
arch/arm64/xen/../../arm/xen/enlighten.c:164!
Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG:
f2000800 [#1] PREEMPT SMP

The related source code lines in the kernel are:

err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu),
 );
BUG_ON(err);

I suspect commit 20f3b8eafe0ba to be the culprit.

Stefano, could you please have a look?


The good news and bad news is that I cannot repro this neither with nor
without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but
I cannot see anything wrong with it. Looking at the register dump, from:

x0 : fffa

I am guessing the error was -ENXIO which is returned from map_guest_area
in Xen.

Could it be that the struct is crossing a page boundary? Or that it is
not 64-bit aligned? Do we need to do something like the following?

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 9afdc4c4a5dc..5326070c5dc0 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -484,7 +485,7 @@ static int __init xen_guest_init(void)
 * for secondary CPUs as they are brought up.
 * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
 */
-   xen_vcpu_info = alloc_percpu(struct vcpu_info);
+   xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE);
if (xen_vcpu_info == NULL)
return -ENOMEM;
  


May I suggest to use a smaller alignment? What about:

1 << fls(sizeof(struct vcpu_info) - 1)


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [linux-linus test] 183794: regressions - FAIL

2023-11-21 Thread Stefano Stabellini
On Mon, 20 Nov 2023, Stefano Stabellini wrote:
> On Mon, 20 Nov 2023, Juergen Gross wrote:
> > On 20.11.23 03:21, osstest service owner wrote:
> > > flight 183794 linux-linus real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/183794/
> > > 
> > > Regressions :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >   test-arm64-arm64-examine  8 reboot   fail REGR. vs.
> > > 183766
> > 
> > I'm seeing the following in the serial log:
> > 
> > Nov 20 00:25:41.586712 [0.567318] kernel BUG at
> > arch/arm64/xen/../../arm/xen/enlighten.c:164!
> > Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG:
> > f2000800 [#1] PREEMPT SMP
> > 
> > The related source code lines in the kernel are:
> > 
> > err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, 
> > xen_vcpu_nr(cpu),
> >  );
> > BUG_ON(err);
> > 
> > I suspect commit 20f3b8eafe0ba to be the culprit.
> > 
> > Stefano, could you please have a look?

The good news and bad news is that I cannot repro this neither with nor
without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but
I cannot see anything wrong with it. Looking at the register dump, from:

x0 : fffa

I am guessing the error was -ENXIO which is returned from map_guest_area
in Xen.

Could it be that the struct is crossing a page boundary? Or that it is
not 64-bit aligned? Do we need to do something like the following?

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 9afdc4c4a5dc..5326070c5dc0 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -484,7 +485,7 @@ static int __init xen_guest_init(void)
 * for secondary CPUs as they are brought up.
 * For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
 */
-   xen_vcpu_info = alloc_percpu(struct vcpu_info);
+   xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE);
if (xen_vcpu_info == NULL)
return -ENOMEM;
 

Re: [linux-linus test] 183794: regressions - FAIL

2023-11-20 Thread Stefano Stabellini
On Mon, 20 Nov 2023, Juergen Gross wrote:
> On 20.11.23 03:21, osstest service owner wrote:
> > flight 183794 linux-linus real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/183794/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >   test-arm64-arm64-examine  8 reboot   fail REGR. vs.
> > 183766
> 
> I'm seeing the following in the serial log:
> 
> Nov 20 00:25:41.586712 [0.567318] kernel BUG at
> arch/arm64/xen/../../arm/xen/enlighten.c:164!
> Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG:
> f2000800 [#1] PREEMPT SMP
> 
> The related source code lines in the kernel are:
> 
> err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu),
>  );
> BUG_ON(err);
> 
> I suspect commit 20f3b8eafe0ba to be the culprit.
> 
> Stefano, could you please have a look?

The original email somehow escaped my email filters and managed to skip
my inbox. Hence, this is the first time I am seeing this commit. Today I
ran out of time but I'll look at it tomorrow.

Re: [linux-linus test] 183794: regressions - FAIL

2023-11-19 Thread Juergen Gross

On 20.11.23 03:21, osstest service owner wrote:

flight 183794 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183794/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
  test-arm64-arm64-examine  8 reboot   fail REGR. vs. 183766


I'm seeing the following in the serial log:

Nov 20 00:25:41.586712 [0.567318] kernel BUG at 
arch/arm64/xen/../../arm/xen/enlighten.c:164!
Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG: 
f2000800 [#1] PREEMPT SMP


The related source code lines in the kernel are:

err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu),
 );
BUG_ON(err);

I suspect commit 20f3b8eafe0ba to be the culprit.

Stefano, could you please have a look?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature