Re: [Intel-gfx] Regression on linux-next (next-20231016)

2023-10-20 Thread Borah, Chaitanya Kumar
Hello Lorenzo,

> -Original Message-
> From: Lorenzo Stoakes 
> Sent: Friday, October 20, 2023 12:08 PM
> To: Borah, Chaitanya Kumar 
> Cc: intel-gfx@lists.freedesktop.org; Kurmi, Suresh Kumar
> ; Saarinen, Jani 
> Subject: Re: Regression on linux-next (next-20231016)
> 
> On Fri, 20 Oct 2023 at 06:52, Borah, Chaitanya Kumar
>  wrote:
> >
> > Hello Lorenzo,
> >
> > Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> >
> > This mail is regarding a regression we are seeing in our CI runs[1] on 
> > linux-
> next repository.
> >
> 
> Thanks for reporting :) It is reassuring that this has been picked up from
> multiple sources.
> 
> [snip]
> 
> > We didn't see the issue on next-20231018. Is there a fix already available 
> > for
> this? If not, could you please check why this patch causes the regression and 
> if
> we can find a solution for it soon?
> 
> This is because I submitted a fix on Monday [0] which has now been taken into
> the weds revision of -next which resolves this issue altogether, so this
> regression -> not regression is expected and intentional.
> 
> Apologies for the noise!
> 

No problem! Thank you for the fix and a quick response.

Regards

Chaitanya

> [0]:https://lore.kernel.org/all/c9eb4cc6-7db4-4c2b-838d-
> 43a0b319a4f0@lucifer.local/
> 
> Thanks, Lorenzo


Re: [Intel-gfx] Regression on linux-next (next-20231016)

2023-10-19 Thread Lorenzo Stoakes
On Fri, 20 Oct 2023 at 06:52, Borah, Chaitanya Kumar
 wrote:
>
> Hello Lorenzo,
>
> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>
> This mail is regarding a regression we are seeing in our CI runs[1] on 
> linux-next repository.
>

Thanks for reporting :) It is reassuring that this has been picked up
from multiple sources.

[snip]

> We didn't see the issue on next-20231018. Is there a fix already available 
> for this? If not, could you please check why this patch causes the regression 
> and if we can find a solution for it soon?

This is because I submitted a fix on Monday [0] which has now been
taken into the weds revision of -next which resolves this issue
altogether, so this regression -> not regression is expected and
intentional.

Apologies for the noise!

[0]:https://lore.kernel.org/all/c9eb4cc6-7db4-4c2b-838d-43a0b319a4f0@lucifer.local/

Thanks, Lorenzo


[Intel-gfx] Regression on linux-next (next-20231016)

2023-10-19 Thread Borah, Chaitanya Kumar
Hello Lorenzo,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on 
linux-next repository.

Since the version next-20231016 [2], we are seeing the following error
```
<6>[4.550196] e1000e :00:1f.6 enp0s31f6: renamed from eth0
<1>[4.581173] BUG: kernel NULL pointer dereference, address: 
01b8
<1>[4.581178] #PF: supervisor read access in kernel mode
<1>[4.581180] #PF: error_code(0x) - not-present page
<6>[4.581182] PGD 0 P4D 0 
<4>[4.581184] Oops:  [#1] PREEMPT SMP NOPTI
<4>[4.581186] CPU: 6 PID: 460 Comm: apache2 Not tainted 
6.6.0-rc6-next-20231016-next-20231016-g4d0515b235de+ #1
<4>[4.581189] Hardware name: Intel Corporation Raptor Lake Client 
Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.3157.A00.2204200131 
04/20/2022
<4>[4.581193] RIP: 0010:mmap_region+0x803/0xa50
`

Details log can be found in [3].

After bisecting the tree, the following patch [4] seems to be causing the 
regression.

`
1db41d29b79ad271674081c752961edd064bbbac is the first bad commit
commit 1db41d29b79ad271674081c752961edd064bbbac
Author: Lorenzo Stoakes lstoa...@gmail.com
Date:   Thu Oct 12 18:04:30 2023 +0100

mm: perform the mapping_map_writable() check after call_mmap()

In order for a F_SEAL_WRITE sealed memfd mapping to have an opportunity to
clear VM_MAYWRITE, we must be able to invoke the appropriate
vm_ops->mmap() handler to do so.  We would otherwise fail the
mapping_map_writable() check before we had the opportunity to avoid it.

This patch moves this check after the call_mmap() invocation.  Only memfd
actively denies write access causing a potential failure here (in
memfd_add_seals()), so there should be no impact on non-memfd cases.

This patch makes the userland-visible change that MAP_SHARED, PROT_READ
mappings of an F_SEAL_WRITE sealed memfd mapping will now succeed.

There is a delicate situation with cleanup paths assuming that a writable
mapping must have occurred in circumstances where it may now not have.  In
order to ensure we do not accidentally mark a writable file unwritable by
mistake, we explicitly track whether we have a writable mapping and unmap
only if we do.
`

We also verified that reverting  the patch fixes the issue.

We didn't see the issue on next-20231018. Is there a fix already available for 
this? If not, could you please check why this patch causes the regression and 
if we can find a solution for it soon?

[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231016
[3] 
https://intel-gfx-ci.01.org/tree/linux-next/next-20231016/bat-rpls-1/boot0.txt 
[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231016&id=1db41d29b79ad271674081c752961edd064bbbac