Re: [git pull] drm urgent for 6.10-rc1

2024-05-17 Thread Linus Torvalds
On Fri, 17 May 2024 at 06:55, Alex Deucher  wrote:
>
> Can you try this patch?
> https://patchwork.freedesktop.org/patch/594539/

Ack. This seems to fix it for me - unless the problem is random and
only happens sometimes, and I've just been *very* unlucky until now.

Linus


Re: [git pull] drm urgent for 6.10-rc1

2024-05-17 Thread Linus Torvalds
On Thu, 16 May 2024 at 18:08, Dave Airlie  wrote:
>
> Linus, do you see it a boot straight away?

Ok, back at that computer now, and yes, I see those messages right
away. In fact, they seem to happen before gnome even starts up, ie I
see those messages long before the first messages from gnome-session:

May 17 12:07:17 tr3970x kernel: WARNING: CPU: 4 PID: 1067 at
drivers/gpu/drm/drm_buddy.c:198 __force_merge+0x184/0x1b0 [drm_buddy]
.. lots and lots and lots of them ..
...
May 17 12:07:23 tr3970x systemd-cryptsetup[982]: ...
...
May 17 12:07:25 tr3970x systemd[1]: Reached target basic.target
...
May 17 12:07:25 tr3970x systemd[1]: Mounted sysroot.mount - /sysroot.
...
May 17 12:07:25 tr3970x systemd[1]: Switching root.
...
May 17 12:07:36 tr3970x gnome-session[2824]: ..
...
May 17 12:07:36 tr3970x gnome-shell[2836]: Obtained a high
priority EGL context
May 17 12:07:36 tr3970x kernel: WARNING: CPU: 31 PID: 2836 at
drivers/gpu/drm/drm_buddy.c:198 __force_merge+0x184/0x1b0 [drm_buddy]
.. lots of warnings resume ...

IOW, it happens already during the graphical boot before I have even
typed in my disk encryption password.

Then it starts again when gnome starts.

I just checked: I have exactly 8192 warnings from the early boot
before the first gnome warning. Which sounds like too round a number
to be an accident.

I will try the patch Alex pointed at next:

https://patchwork.freedesktop.org/patch/594539/

and see if that fixes it for me.

 Linus


Re: [git pull] drm urgent for 6.10-rc1

2024-05-16 Thread Linus Torvalds
On Wed, 15 May 2024 at 19:54, Dave Airlie  wrote:
>
> Here is the buddy allocator fix I picked up from the list, please apply.

So I removed my reverts, and am running a kernel that includes the
merge 972a2543e3dd ("Merge tag 'drm-next-2024-05-16' of
https://gitlab.freedesktop.org/drm/kernel;) but I still see a lot of
warnings as per below.

I was going to say that the difference is that now they trigger
through the page fault path (amdgpu_gem_fault) while previously they
triggered through the system call path and amdgpu_drm_ioctl. But it
turns out it's both in both cases, and it just happened to be one or
the other in the particular warnings that I cut-and-pasted.

As before, there are tens of thousands of them after being up for less
than an hour, so this is not some kind of rare thing.

The machine hasn't _crashed_ yet, though. But I'm going to be out and
about and working on my laptop the rest of the day, so I won't be able
to test.

(And that kernel version of "6.9.0-08295-gfd39ab3b5289" that is quoted
in the WARN isn't some official kernel, I have about ten private
patches that I keep testing in my tree, so if you wondered what the
heck that git version is, it's not going to match anything you see,
but the ~ten patches also aren't relevant to this).

Nothing unusual in the config, although this is clang-built. Shouldn't
matter, never has before.

Linus

---
CPU: 28 PID: 3326 Comm: mutter-x11-fram Tainted: GW
6.9.0-08295-gfd39ab3b5289 #64
Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS MASTER/TRX40
AORUS MASTER, BIOS F7 09/07/2022
RIP: 0010:__force_merge+0x14f/0x180 [drm_buddy]
Code: 74 0d 49 8b 44 24 18 48 d3 e0 49 29 44 24 30 4c 89 e7 ba 01 00
00 00 e8 9f 00 00 00 44 39 e8 73 1f 49 8b 04 24 e9 25 ff ff ff <0f> 0b
4c 39 c3 75 a3 eb 99 b8 f4 ff ff ff c3 b8 f4 ff ff ff eb 02
RSP: :9e350314baa0 EFLAGS: 00010246
RAX: 974a227a4a00 RBX: 974a2d024b88 RCX: 0b8eb800
RDX: 974a2d024bf8 RSI: 974a2d024bd0 RDI: 974a2d024bb0
RBP:  R08: 974a2d024b88 R09: 1000
R10: 0800 R11:  R12: 974a2198fa18
R13: 0009 R14: 1000 R15: 
FS:  7f56a78b6540() GS:97591e70() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f568804 CR3: 000198cc9000 CR4: 00350ef0
Call Trace:
 
 ? __warn+0xc1/0x190
 ? __force_merge+0x14f/0x180 [drm_buddy]
 ? report_bug+0x129/0x1a0
 ? handle_bug+0x3d/0x70
 ? exc_invalid_op+0x16/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? __force_merge+0x14f/0x180 [drm_buddy]
 drm_buddy_alloc_blocks+0x249/0x400 [drm_buddy]
 ? __cond_resched+0x16/0x40
 amdgpu_vram_mgr_new+0x204/0x3f0 [amdgpu]
 ttm_resource_alloc+0x31/0x120 [ttm]
 ttm_bo_alloc_resource+0xbc/0x260 [ttm]
 ? memcg_account_kmem+0x4a/0xe0
 ? ttm_resource_compatible+0xbb/0xe0 [ttm]
 ttm_bo_validate+0x9f/0x210 [ttm]
 ? __alloc_pages+0x129/0x210
 amdgpu_bo_fault_reserve_notify+0x98/0x110 [amdgpu]
 amdgpu_gem_fault+0x53/0xd0 [amdgpu]
 __do_fault+0x41/0x140
 do_pte_missing+0x453/0xfd0
 handle_mm_fault+0x73c/0x1090
 do_user_addr_fault+0x2e2/0x6f0
 exc_page_fault+0x56/0x110
 asm_exc_page_fault+0x22/0x30


Re: [git pull] drm for 6.10-rc1

2024-05-15 Thread Linus Torvalds
On Wed, 15 May 2024 at 16:51, Dave Airlie  wrote:
>
> > Let's see if the machine ends up being stable now. It took several
> > hours for the "scary messages" state to turn into the "hung machine"
> > state, so they *could* have been independent issues, but it seems a
> > bit unlikely.
>
> This worries me actually, it's possible this warn could cause a
> problem, but I'm not convinced it should have machine ending
> properties without some sort of different error at the end, so I'd
> keep an eye open here.

Well, since I'm a big believer in dogfooding, I always run my own
kernel even during the merge window. I don't reboot between each pull,
but I try to basically reboot daily.

And it's entirely possible that the eventual "bad page flags" error -
which is what I think triggered the eventual hang - is something else
that came in during this merge window.

I haven't actually gotten the -mm changes from Andrew yet, but it did
happen in the btrfs kworker, and I have merged the btrfs changes for
6.10.  So maybe they are the cause.

I was blaming the DRM case mainly because it clearly *was* about some
kind of allocation management, and I got a *lot* of those warnings:

$ journalctl -b -1 | grep 'WARNING: CPU' | wc -1
  16015

but let's see if it happens with my amdgpu reverts in place, and no
drm warnings.

It most definitely wouldn't be the first time we had multiple
independent bugs during the merge window ;/

  Linus


Re: [git pull] drm for 6.10-rc1

2024-05-15 Thread Linus Torvalds
On Wed, 15 May 2024 at 16:17, Dave Airlie  wrote:
>
> It's also possible it's just that hey there's a few others in the tree
>
> KVM_WERROR not tied to it
> PPC_WERROR (why does CXL uses this?)

Yeah, that should be fixed too, but at least KVM_WERROR predates the
whole-kernel WERROR.

And PPC_WERROR predates it by over a decade.

But yes, good catch - both of those should be silenced if we already
have the global WERROR enabled.

I mainly notice new questions (because I use "make oldconfig"), so old
pre-existing illogical ones don't trigger my "why are they asking?"
reaction.

> AMDGPU, I915 and XE all have !COMPILE_TEST on their variants

Hmm.  It turns out that I didn't notice the AMDGPU one because my
Threadripper - that has AMDGPU enabled - I have actually turned off
EXPERT on, so it's hidden by that for me.

But yes, both of those should be "depends on !WERROR" too.

Or maybe they should just go away entirely, and be subsumed by the
DRM_WERROR thing.

   Linus


Re: [git pull] drm for 6.10-rc1

2024-05-15 Thread Linus Torvalds
On Wed, 15 May 2024 at 15:45, Dave Airlie  wrote:
>
>   The drm subsystem enables more warnings than the kernel default, so
>   this config option is disabled by default.

Irrelevant.

If the *main* CONFIG_WERROR is on, then it does NOT MATTER if somebody
sets CONFIG_DRM_WERROR or not. It's a no-op. It's pointless.

And that means that it's also entirely pointless to ask. It's only annoying.

> depends on DRM && EXPERT
>
> so we aren't throwing it at random users.

Yes you are.

Because - rightly or wrongly - distros enable EXPERT by default. At
least Fedora does. So any user that starts from a distro config will
have EXPERT enabled.

> should we rename it CONFIG_DRM_WERROR_MORE or something?

Renaming does nothing. If it's pointless, it's pointless even if it's renamed.

It needs to have a

   depends on !WERROR

because if WERROR is already true, then it's stupid and wrong to ask AGAIN.

To summarize: if the main WERROR is enabled, then the DRM tree is
*ALREADY* built with WERROR. Asking for DRM_WERROR is wrong.

I keep harping on bad config variables because our kernel config thing
is already much too messy and is by far the most difficult part of
building your own kernel.

Everything else is literally just "make" followed by "make
modules_install" and "make install". Very straightforward.

But doing a kernel config? Nasty. And made nastier by bad and
nonsensical questions.

Linus


Re: [git pull] drm for 6.10-rc1

2024-05-15 Thread Linus Torvalds
On Tue, 14 May 2024 at 23:21, Dave Airlie  wrote:
>
> This is the main pull request for the drm subsystems for 6.10.

.. and now that I look more at this pull request, I find other things wrong.

Why is the DRM code asking if I want to enable -Werror? I have Werror
enabled *already*.

I hate stupid config questions. They only confuse users.

If the global WERROR config is enabled, then the DRM config certainly
shouldn't ask whether you want even more -Werror. It does nothing but
annoy people.

And no, we are not going to have subsystems that can *weaken* the
existing CONFIG_WERROR. Happily, that doesn't seem to be what the DRM
code wants to do, it just wants to add -Werror, but as mentioned, its'
crazy to do that when we already have it globally enabled.

Now, it might make more sense to ask if you want -Wextra. A lot of
those warnings are bogus.

   Linus


Re: [git pull] drm for 6.10-rc1

2024-05-15 Thread Linus Torvalds
On Wed, 15 May 2024 at 13:24, Linus Torvalds
 wrote:
>
> I have to revert both
>
>   a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
>   e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour")
>
> to make things build cleanly. Next step: see if it boots and fixes the
> problem for me.

Well, perhaps not surprisingly, the WARN_ON() no longer triggers with
this, and everything looks fine.

Let's see if the machine ends up being stable now. It took several
hours for the "scary messages" state to turn into the "hung machine"
state, so they *could* have been independent issues, but it seems a
bit unlikely.

   Linus


Re: [git pull] drm for 6.10-rc1

2024-05-15 Thread Linus Torvalds
On Wed, 15 May 2024 at 13:21, Linus Torvalds
 wrote:
>
> I guess I'll try to revert the later commit that enables it for amdgpu
> (commit a68c7eaa7a8f) and see if it at least makes the horrendous
> messages go away.

I have to revert both

  a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
  e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour")

to make things build cleanly. Next step: see if it boots and fixes the
problem for me.

  Linus


Re: [git pull] drm for 6.10-rc1

2024-05-15 Thread Linus Torvalds
On Wed, 15 May 2024 at 13:06, Linus Torvalds
 wrote:
>
> Hmm. There's something seriously wrong with amdgpu.
>
> I'm getting a ton of__force_merge warnings:
>
>   WARNING: CPU: 0 PID: 1069 at drivers/gpu/drm/drm_buddy.c:199
> __force_merge+0x14f/0x180 [drm_buddy]

Adding likely culprits to the participants, since it looks like this
is all new with commit 96950929eb23 ("drm/buddy: Implement tracking
clear page feature").

Sadly I can't juist revert that commit to check, because there are
many subsequent commits that then depend on it.

I guess I'll try to revert the later commit that enables it for amdgpu
(commit a68c7eaa7a8f) and see if it at least makes the horrendous
messages go away.

Anyway, this is some old Radeon graphics card in my Threadripper:

49:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
(prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Radeon RX 570 Pulse 4GB
Flags: bus master, fast devsel, latency 0, IRQ 130, IOMMU group 32
Memory at c000 (64-bit, prefetchable) [size=256M]
Memory at d000 (64-bit, prefetchable) [size=2M]
I/O ports at 8000 [size=256]
Memory at d1c0 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 000c [disabled] [size=128K]
Capabilities: 
Kernel driver in use: amdgpu
Kernel modules: amdgpu

I think it's a "Sapphire Radeon Pulse RX 580" or something like that.

Linus


Re: [git pull] drm for 6.10-rc1

2024-05-15 Thread Linus Torvalds
On Tue, 14 May 2024 at 23:21, Dave Airlie  wrote:
>
> In drivers the main thing is a new driver for ARM Mali firmware based
> GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and
> scattered changes to everything else.

Hmm. There's something seriously wrong with amdgpu.

I'm getting a ton of__force_merge warnings:

  WARNING: CPU: 0 PID: 1069 at drivers/gpu/drm/drm_buddy.c:199
__force_merge+0x14f/0x180 [drm_buddy]
  Modules linked in: hid_logitech_hidpp hid_logitech_dj uas
usb_storage amdgpu drm_ttm_helper ttm video drm_exec
drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper
drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm
ghash_clmulni_intel igb atlantic nvme dca macsec ccp i2c_algo_bit
nvme_core sp5100_tco wmi ip6_tables ip_tables fuse
  CPU: 0 PID: 1069 Comm: plymouthd Not tainted 6.9.0-07381-g3860ca371740 #60
  Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS
MASTER/TRX40 AORUS MASTER, BIOS F7 09/07/2022
  RIP: 0010:__force_merge+0x14f/0x180 [drm_buddy]
  Code: 74 0d 49 8b 44 24 18 48 d3 e0 49 29 44 24 30 4c 89 e7 ba 01 00
00 00 e8 9f 00 00 00 44 39 e8 73 1f 49 8b 04 24 e9 25 ff ff ff <0f> 0b
4c 39 c3 75 a3 eb 99 b8 f4 ff ff ff c3 b8 f4 ff ff ff eb 02
  RSP: 0018:b87a81cb7908 EFLAGS: 00010246
  RAX: 9b1915de8000 RBX: 9b1919478288 RCX: 0800
  RDX: 9b19194782f8 RSI: 9b19194782d0 RDI: 9b19194782b0
  RBP:  R08: 9b1919478288 R09: 1000
  R10: 0800 R11:  R12: 9b192590fa18
  R13: 000d R14: 1000 R15: 
  FS:  7fa06bfa9740() GS:9b281e00() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 555adb857000 CR3: 00011b516000 CR4: 00350ef0
  Call Trace:
   ? __force_merge+0x14f/0x180 [drm_buddy]
   drm_buddy_alloc_blocks+0x249/0x400 [drm_buddy]
   ? __cond_resched+0x16/0x40
   amdgpu_vram_mgr_new+0x204/0x3f0 [amdgpu]
   ttm_resource_alloc+0x31/0x120 [ttm]
   ttm_bo_alloc_resource+0xbc/0x260 [ttm]
   ttm_bo_validate+0x9f/0x210 [ttm]
   ttm_bo_init_reserved+0x103/0x130 [ttm]
   amdgpu_bo_create+0x246/0x400 [amdgpu]
   ? amdgpu_bo_destroy+0x70/0x70 [amdgpu]
   amdgpu_bo_create_user+0x29/0x40 [amdgpu]
   amdgpu_mode_dumb_create+0x108/0x190 [amdgpu]
   ? amdgpu_bo_destroy+0x70/0x70 [amdgpu]
   ? drm_mode_create_dumb+0xa0/0xa0 [drm]
   drm_ioctl_kernel+0xad/0xd0 [drm]
   drm_ioctl+0x330/0x4b0 [drm]
   ? drm_mode_create_dumb+0xa0/0xa0 [drm]
   amdgpu_drm_ioctl+0x41/0x80 [amdgpu]
   __x64_sys_ioctl+0xd2a/0xe00
   ? update_process_times+0x89/0xa0
   ? tick_nohz_handler+0xe2/0x120
   ? timerqueue_add+0x94/0xa0
   ? __hrtimer_run_queues+0x12b/0x250
   ? ktime_get+0x34/0xb0
   ? lapic_next_event+0x12/0x20
   ? clockevents_program_event+0x78/0xd0
   ? hrtimer_interrupt+0x118/0x390
   ? sched_clock+0x5/0x10
   do_syscall_64+0x68/0x130
   ? __irq_exit_rcu+0x53/0xb0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

and eventually the whole thing just crashes entirely, with a bad page
state in the VM:

  BUG: Bad page state in process kworker/u261:13  pfn:31fb9a
  page: refcount:0 mapcount:0 mapping:ff0b239e index:0x37ce8
pfn:0x31fb9a
  aops:btree_aops ino:1
  flags: 
0x2fffc60020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff)
  page_type: 0x()

which comes from a btrfs worker (btrfs-delayed-meta
btrfs_work_helper), but I would not be surprised if that was caused by
whatever odd thing is going on with the DRM code. IOW, it *looks* like
this code ends up just corrupting memory in horrible ways.

Linus

Linus


Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-09 Thread Linus Torvalds
On Thu, 9 May 2024 at 04:39, Christian Brauner  wrote:
>
> Not worth it without someone explaining in detail why imho. First pass
> should be to try and replace kcmp() in scenarios where it's obviously
> not needed or overkill.

Ack.

> I've added a CLASS(fd_raw) in a preliminary patch since we'll need that
> anyway which means that your comparison patch becomes even simpler imho.
> I've also added a selftest patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.misc

LGTM.

Maybe worth adding an explicit test for "open same file, but two
separate opens, F_DUPFD_QUERY returns 0? Just to clarify the "it's not
testing the file on the filesystem for equality, but the file pointer
itself".

 Linus


Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-08 Thread Linus Torvalds
On Wed, 8 May 2024 at 09:19, Linus Torvalds
 wrote:
>
> So since we already have two versions of F_DUPFD (the other being
> F_DUPFD_CLOEXEC) I decided that the best thing to do is to just extend
> on that existing naming pattern, and called it F_DUPFD_QUERY instead.
>
> I'm not married to the name, so if somebody hates it, feel free to
> argue otherwise.

Side note: with this patch, doing

   ret = fcntl(fd1, F_DUPFD_QUERY, fd2);

will result in:

 -1 (EBADF): 'fd1' is not a valid file descriptor
 -1 (EINVAL): old kernel that doesn't support F_DUPFD_QUERY
 0: fd2 does not refer to the same file as fd1
 1: fd2 is the same 'struct file' as fd1

and it might be worth noting a couple of things here:

 (a) fd2 being an invalid file descriptor does not cause EBADF, it
just causes "does not match".

 (b) we *could* use more bits for more equality

IOW, it would possibly make sense to extend the 0/1 result to be

- bit #0: same file pointer
- bit #1: same path
- bit #2: same dentry
- bit #3: same inode

which are all different levels of "sameness".

Does anybody care? Do we want to extend on this "sameness"? I'm not
convinced, but it might be a good idea to document this as a possibly
future extension, ie *if* what you care about is "same file pointer",
maybe you should make sure to only look at bit #0.

   Linus


Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-08 Thread Linus Torvalds
On Tue, 7 May 2024 at 12:07, Linus Torvalds
 wrote:
>
> That example thing shows that we shouldn't make it a FISAME ioctl - we
> should make it a fcntl() instead, and it would just be a companion to
> F_DUPFD.
>
> Doesn't that strike everybody as a *much* cleaner interface? I think
> F_ISDUP would work very naturally indeed with F_DUPFD.

So since we already have two versions of F_DUPFD (the other being
F_DUPFD_CLOEXEC) I decided that the best thing to do is to just extend
on that existing naming pattern, and called it F_DUPFD_QUERY instead.

I'm not married to the name, so if somebody hates it, feel free to
argue otherwise.

But with that, the suggested patch would end up looking something like
the attached (I also re-ordered the existing "F_LINUX_SPECIFIC_BASE"
users, since one of them was out of numerical order).

This really feels like a very natural thing, and yes, the 'same_fd()'
function in systemd that Christian also pointed at could use this very
naturally.

Also note that I obviously haven't tested this. Because obviously this
is trivially correct and cannot possibly have any bugs. Right? RIGHT?

And yes, I did check - despite the odd jump in numbers, we've never
had anything between F_NOTIFY (+2) and F_CANCELLK (+5).

We added F_SETLEASE (+0) , F_GETLEASE (+1) and F_NOTIFY (+2) in
2.4.0-test9 (roughly October 2000, I didn't dig deeper).

And then back in 2007 we suddenly jumped to F_CANCELLK (+5) in commit
9b9d2ab4154a ("locks: add lock cancel command"). I don't know why 3/4
were shunned.

After that we had 22d2b35b200f ("F_DUPFD_CLOEXEC implementation") add
F_DUPFD_CLOEXEC (+6).

I'd have loved to put F_DUPFD_QUERY next to it, but +5 and +7 are both used.

Linus
 fs/fcntl.c | 23 +++
 include/uapi/linux/fcntl.h | 14 --
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 54cc85d3338e..1ddb63f70445 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -327,6 +327,25 @@ static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
 	return 0;
 }
 
+/*
+ * Is the file descriptor a dup of the file?
+ */
+static long f_dupfd_query(int fd, struct file *filp)
+{
+	struct fd f = fdget_raw(fd);
+
+	/*
+	 * We can do the 'fdput()' immediately, as the only thing that
+	 * matters is the pointer value which isn't changed by the fdput.
+	 *
+	 * Technically we didn't need a ref at all, and 'fdget()' was
+	 * overkill, but given our lockless file pointer lookup, the
+	 * alternatives are complicated.
+	 */
+	fdput(f);
+	return f.file == filp;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -342,6 +361,9 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_DUPFD_CLOEXEC:
 		err = f_dupfd(argi, filp, O_CLOEXEC);
 		break;
+	case F_DUPFD_QUERY:
+		err = f_dupfd_query(argi, filp);
+		break;
 	case F_GETFD:
 		err = get_close_on_exec(fd) ? FD_CLOEXEC : 0;
 		break;
@@ -446,6 +468,7 @@ static int check_fcntl_cmd(unsigned cmd)
 	switch (cmd) {
 	case F_DUPFD:
 	case F_DUPFD_CLOEXEC:
+	case F_DUPFD_QUERY:
 	case F_GETFD:
 	case F_SETFD:
 	case F_GETFL:
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 282e90aeb163..c0bcc185fa48 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -8,6 +8,14 @@
 #define F_SETLEASE	(F_LINUX_SPECIFIC_BASE + 0)
 #define F_GETLEASE	(F_LINUX_SPECIFIC_BASE + 1)
 
+/*
+ * Request nofications on a directory.
+ * See below for events that may be notified.
+ */
+#define F_NOTIFY	(F_LINUX_SPECIFIC_BASE + 2)
+
+#define F_DUPFD_QUERY	(F_LINUX_SPECIFIC_BASE + 3)
+
 /*
  * Cancel a blocking posix lock; internal use only until we expose an
  * asynchronous lock api to userspace:
@@ -17,12 +25,6 @@
 /* Create a file descriptor with FD_CLOEXEC set. */
 #define F_DUPFD_CLOEXEC	(F_LINUX_SPECIFIC_BASE + 6)
 
-/*
- * Request nofications on a directory.
- * See below for events that may be notified.
- */
-#define F_NOTIFY	(F_LINUX_SPECIFIC_BASE+2)
-
 /*
  * Set and get of pipe page size array
  */


Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-07 Thread Linus Torvalds
On Tue, 7 May 2024 at 11:04, Daniel Vetter  wrote:
>
> On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote:
>
> > I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
> > too, if this is possibly a more common thing. and not just DRM wants
> > it.
> >
> > Would something like that work for you?
>
> Yes.
>
> Adding Simon and Pekka as two of the usual suspects for this kind of
> stuff. Also example code (the int return value is just so that callers know
> when kcmp isn't available, they all only care about equality):
>
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239

That example thing shows that we shouldn't make it a FISAME ioctl - we
should make it a fcntl() instead, and it would just be a companion to
F_DUPFD.

Doesn't that strike everybody as a *much* cleaner interface? I think
F_ISDUP would work very naturally indeed with F_DUPFD.

Yes? No?

   Linus


Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-07 Thread Linus Torvalds
On Tue, 7 May 2024 at 04:03, Daniel Vetter  wrote:
>
> It's really annoying that on some distros/builds we don't have that, and
> for gpu driver stack reasons we _really_ need to know whether a fd is the
> same as another, due to some messy uniqueness requirements on buffer
> objects various drivers have.

It's sad that such a simple thing would require two other horrid
models (EPOLL or KCMP).

There'[s a reason that KCMP is a config option - *some* of that is
horrible code - but the "compare file descriptors for equality" is not
that reason.

Note that KCMP really is a broken mess. It's also a potential security
hole, even for the simple things, because of how it ends up comparing
kernel pointers (ie it doesn't just say "same file descriptor", it
gives an ordering of them, so you can use KCMP to sort things in
kernel space).

And yes, it orders them after obfuscating the pointer, but it's still
not something I would consider sane as a baseline interface. It was
designed for checkpoint-restore, it's the wrong thing to use for some
"are these file descriptors the same".

The same argument goes for using EPOLL for that. Disgusting hack.

Just what are the requirements for the GPU stack? Is one of the file
descriptors "trusted", IOW, you know what kind it is?

Because dammit, it's *so* easy to do. You could just add a core DRM
ioctl for it. Literally just

struct fd f1 = fdget(fd1);
struct fd f2 = fdget(fd2);
int same;

same = f1.file && f1.file == f2.file;
fdput(fd1);
fdput(fd2);
return same;

where the only question is if you also woudl want to deal with O_PATH
fd's, in which case the "fdget()" would be "fdget_raw()".

Honestly, adding some DRM ioctl for this sounds hacky, but it sounds
less hacky than relying on EPOLL or KCMP.

I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl
too, if this is possibly a more common thing. and not just DRM wants
it.

Would something like that work for you?

 Linus


Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)

2024-05-06 Thread Linus Torvalds
On Mon, 6 May 2024 at 10:46, Stefan Metzmacher  wrote:
>
> I think it's a very important detail that epoll does not take
> real references. Otherwise an application level 'close()' on a socket
> would not trigger a tcp disconnect, when an fd is still registered with
> epoll.

Yes, exactly.

epoll() ends up actually wanting the lifetime of the ep item be the
lifetime of the file _descriptor_, not the lifetime of the file
itself.

We approximate that - badly - with epoll not taking a reference on the
file pointer, and then at final fput() it tears things down.

But that has two real issues, and one of them is that "oh, now epoll
has file pointers that are actually dead" that caused this thread.

The other issue is that "approximates" thing, where it means that
duplicating the file pointer (dup*() and fork() end unix socket file
sending etc) will not mean that the epoll ref is also out of sync with
the lifetime of the file descriptor.

That's why I suggested that "clean up epoll references at
file_close_fd() time instead:

  
https://lore.kernel.org/all/CAHk-=wj6xl9mgcd_nuzrj6saken0tsyttzdfpgdw34r+zmz...@mail.gmail.com/

because it would actually really *fix* the lifetime issue of ep items.

In the process, it would make it possible to actually take a f_count
reference at EPOLL_CTL_ADD time, since now the lifetime of the EP
wouldn't be tied to the lifetime of the 'struct file *' pointer, it
would be properly tied to the lifetime of the actual file descriptor
that you are adding.

So it would be a huge conceptual cleanup too.

(Of course - at that point EPOLL_CTL_ADD still doesn't actually _need_
a reference to the file, since the file being open in itself is
already that reference - but the point here being that there would
*be* a reference that the epoll code would effectively have, and you'd
never be in the situation we were in where there would be stale "dead"
file pointers that just haven't gone through the cleanup yet).

But I'd rather not touch the epoll code more than I have to.

Which is why I applied the minimal patch for just "refcount over
vfs_poll()", and am just mentioning my suggestion in the hope that
some eager beaver would like to see how painful it would do to make
the bigger surgery...

   Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-05 Thread Linus Torvalds
On Sun, 5 May 2024 at 13:30, Al Viro  wrote:
>
> 0.  special-cased ->f_count rule for ->poll() is a wart and it's
> better to get rid of it.
>
> 1.  fs/eventpoll.c is a steaming pile of shit and I'd be glad to see
> git rm taken to it.  Short of that, by all means, let's grab reference
> in there around the call of vfs_poll() (see (0)).

Agreed on 0/1.

> 2.  having ->poll() instances grab extra references to file passed
> to them is not something that should be encouraged; there's a plenty
> of potential problems, and "caller has it pinned, so we are fine with
> grabbing extra refs" is nowhere near enough to eliminate those.

So it's not clear why you hate it so much, since those extra
references are totally normal in all the other VFS paths.

I mean, they are perhaps not the *common* case, but we have a lot of
random get_file() calls sprinkled around in various places when you
end up passing a file descriptor off to some asynchronous operation
thing.

Yeah, I think most of them tend to be special operations (eg the tty
TIOCCONS ioctl to redirect the console), but it's not like vfs_ioctl()
is *that* different from vfs_poll. Different operation, not somehow
"one is more special than the other".

cachefiles and backing-file does it for regular IO, and drop it at IO
completion - not that different from what dma-buf does. It's in
->read_iter() rather than ->poll(), but again: different operations,
but not "one of them is somehow fundamentally different".

> 3.  dma-buf uses of get_file() are probably safe (epoll shite aside),
> but they do look fishy.  That has nothing to do with epoll.

Now, what dma-buf basically seems to do is to avoid ref-counting its
own fundamental data structure, and replaces that by refcounting the
'struct file' that *points* to it instead.

And it is a bit odd, but it actually makes some amount of sense,
because then what it passes around is that file pointer (and it allows
passing it around from user space *as* that file).

And honestly, if you look at why it then needs to add its refcount to
it all, it actually makes sense.  dma-bufs have this notion of
"fences" that are basically completion points for the asynchronous
DMA. Doing a "poll()" operation will add a note to the fence to get
that wakeup when it's done.

And yes, logically it takes a ref to the "struct dma_buf", but because
of how the lifetime of the dma_buf is associated with the lifetime of
the 'struct file', that then turns into taking a ref on the file.

Unusual? Yes. But not illogical. Not obviously broken. Tying the
lifetime of the dma_buf to the lifetime of a file that is passed along
makes _sense_ for that use.

I'm sure dma-bufs could add another level of refcounting on the
'struct dma_buf' itself, and not make it be 1:1 with the file, but
it's not clear to me what the advantage would really be, or why it
would be wrong to re-use a refcount that is already there.

 Linus


Re: [PATCH v2] epoll: be better about file lifetimes

2024-05-05 Thread Linus Torvalds
On Sun, 5 May 2024 at 13:02, David Laight  wrote:
>
> How much is the extra pair of atomics going to hurt performance?
> IIRC a lot of work was done to (try to) make epoll lockless.

If this makes people walk away from epoll, that would be absolutely
*lovely*. Maybe they'd start using io_uring instead, which has had its
problems, but is a lot more capable in the end.

Yes, doing things right is likely more expensive than doing things
wrong. Bugs are cheap. That doesn't make buggy code better.

Epoll really isn't important enough to screw over the VFS subsystem over.

I did point out elsewhere how this could be fixed by epoll() removing
the ep items at a different point:

  
https://lore.kernel.org/all/CAHk-=wj6xl9mgcd_nuzrj6saken0tsyttzdfpgdw34r+zmz...@mail.gmail.com/

so if somebody actually wants to fix up epoll properly, that would
probably be great.

In fact, that model would allow epoll() to just keep a proper refcount
as an fd is added to the poll events, and would probably fix a lot of
nastiness. Right now those ep items stay around for basically random
amounts of time.

But maybe there are other ways to fix it. I don't think we have an
actual eventpoll maintainer any more, but what I'm *not* willing to
happen is eventpoll messing up other parts of the kernel. It was
always a ugly performance hack, and was only acceptable as such. "ugly
hack" is ok. "buggy ugly hack" is not.

  Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-05 Thread Linus Torvalds
On Sun, 5 May 2024 at 12:46, Al Viro  wrote:
>
> I've no problem with having epoll grab a reference, but if we make that
> a universal requirement ->poll() instances can rely upon,

Al, we're note "making that a requirement".

It always has been.

Otgherwise, the docs should have shouted out DAMN LOUDLY that you
can't rely on all the normal refcounting of 'struct file' THAT EVERY
SINGLE NORMAL VFS FUNCTION CAN.

Lookie herte: epoll is unimportant and irrelevant garbage compared to
something fundamental like "read()", and what does read() do?

It does this:

struct fd f = fdget_pos(fd);

if (f.file) {
...

which is being DAMN CAREFUL to make sure that the file has the proper
refcounts before it then calls "vfs_read()". There's a lot of very
careful and subtle code in fdget_pos() to make this all proper, and
that even if the file is closed by another thread concurrently, we
*always* have a refcount to it, and it's always live over the whole
'vfs_read()' sequence.

'vfs_poll()' is NOT DIFFERENT in this regard. Not at all.

Now, you have two choices that are intellectually honest:

 - admit that epoll() - which is a hell of a lot less important -
should spend a small fraction of that effort on making its vfs_poll()
use sane

 - say that all this fdget_pos() care is uncalled for in the read()
path, and we should make all the filesystem .read() functions be aware
that the file pointer they get may be garbage, and they should use
get_file_active() to make sure every 'struct file *' use they have is
safe?

because if your choice is that "epoll can do whatever the f* it
wants", then it's in clear violation of all the effort we go to in a
MUCH MORE IMPORTANT code path, and is clearly not consistent or
logical.

Neither you nor Christian have explained why you think it's ok for
that epoll() garbage to magically violate all our regular rules.

Your claim that those regular rules are some new conditional
requirement that we'd be imposing. NO. They are the rules that
*anybody* who gets a 'struct file *' pointer should always be able to
rely on by default: it's damn well a ref-counted thing, and the caller
holds the refcount.

The exceptional case is exactly the other way around: if you do random
crap with unrefcounted poitners, it's *your* problem, and *you* are
the one who has to be careful. Not some unrelated poor driver that
didn't know about your f*

Dammit, epoll is CLEARLY BUGGY. It's passing off random kernel
pointers without holding a refcount to them. THAT'S A BUG.

And fixing that bug is *not* somehow changing existing rules as you
are trying to claim. No. It's just fixing a bug.

So stop claiming that this is some "new requirement". It is absolutely
nothing of the sort. epoll() actively MISUSED file pointer, because
file pointers are fundamentally refcounted (as are pretty much all
sane kernel interfaces).

Linus


[PATCH v2] epoll: be better about file lifetimes

2024-05-05 Thread Linus Torvalds
epoll can call out to vfs_poll() with a file pointer that may race with
the last 'fput()'. That would make f_count go down to zero, and while
the ep->mtx locking means that the resulting file pointer tear-down will
be blocked until the poll returns, it means that f_count is already
dead, and any use of it won't actually get a reference to the file any
more: it's dead regardless.

Make sure we have a valid ref on the file pointer before we call down to
vfs_poll() from the epoll routines.

Link: https://lore.kernel.org/lkml/2d631f0615918...@google.com/
Reported-by: syzbot+045b454ab35fd82a3...@syzkaller.appspotmail.com
Reviewed-by: Jens Axboe 
Signed-off-by: Linus Torvalds 
---

Changes since v1:

 - add Link, Reported-by, and Jens' reviewed-by. And sign off on it
   because it looks fine to me and we have some testing now.

 - move epi_fget() closer to the user

 - more comments about the background

 - remove the rcu_read_lock(), with the comment explaining why it's not
   needed

 - note about returning zero rather than something like EPOLLERR|POLLHUP
   for a file that is going away

 fs/eventpoll.c | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 882b89edc52a..a3f0f868adc4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -979,6 +979,37 @@ static __poll_t __ep_eventpoll_poll(struct file *file, 
poll_table *wait, int dep
return res;
 }
 
+/*
+ * The ffd.file pointer may be in the process of
+ * being torn down due to being closed, but we
+ * may not have finished eventpoll_release() yet.
+ *
+ * Normally, even with the atomic_long_inc_not_zero,
+ * the file may have been free'd and then gotten
+ * re-allocated to something else (since files are
+ * not RCU-delayed, they are SLAB_TYPESAFE_BY_RCU).
+ *
+ * But for epoll, users hold the ep->mtx mutex, and
+ * as such any file in the process of being free'd
+ * will block in eventpoll_release_file() and thus
+ * the underlying file allocation will not be free'd,
+ * and the file re-use cannot happen.
+ *
+ * For the same reason we can avoid a rcu_read_lock()
+ * around the operation - 'ffd.file' cannot go away
+ * even if the refcount has reached zero (but we must
+ * still not call out to ->poll() functions etc).
+ */
+static struct file *epi_fget(const struct epitem *epi)
+{
+   struct file *file;
+
+   file = epi->ffd.file;
+   if (!atomic_long_inc_not_zero(>f_count))
+   file = NULL;
+   return file;
+}
+
 /*
  * Differs from ep_eventpoll_poll() in that internal callers already have
  * the ep->mtx so we need to start from depth=1, such that mutex_lock_nested()
@@ -987,14 +1018,23 @@ static __poll_t __ep_eventpoll_poll(struct file *file, 
poll_table *wait, int dep
 static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt,
 int depth)
 {
-   struct file *file = epi->ffd.file;
+   struct file *file = epi_fget(epi);
__poll_t res;
 
+   /*
+* We could return EPOLLERR | EPOLLHUP or something,
+* but let's treat this more as "file doesn't exist,
+* poll didn't happen".
+*/
+   if (!file)
+   return 0;
+
pt->_key = epi->event.events;
if (!is_file_epoll(file))
res = vfs_poll(file, pt);
else
res = __ep_eventpoll_poll(file, pt, depth);
+   fput(file);
return res & epi->event.events;
 }
 
-- 
2.44.0.330.g4d18c88175



Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-05 Thread Linus Torvalds
On Sun, 5 May 2024 at 03:50, Christian Brauner  wrote:
>
> And I agree with you that for some instances it's valid to take another
> reference to a file from f_op->poll() but then they need to use
> get_file_active() imho and simply handle the case where f_count is zero.

I think this is

 (a) practically impossible to find (since most f_count updates are in
various random helpers)

 (b) not tenable in the first place, since *EVERYBODY* does a f_count
update as part of the bog-standard pollwait

So (b) means that the notion of "warn if somebody increments f_count
from zero" is broken to begin with - but it's doubly broken because it
wouldn't find anything *anyway*, since this never happens in any
normal situation.

And (a) means that any non-automatic finding of this is practically impossible.

> And we need to document that in Documentation/filesystems/file.rst or
> locking.rst.

WHY?

Why cannot you and Al just admit that the problem is in epoll. Always
has been, always will be.

The fact is, it's not dma-buf that is violating any rules. It's epoll.
It's calling out to random driver functions with a file pointer that
is no longer valid.

It really is that simple.

I don't see why you are arguing for "unknown number of drivers - we
know at least *one* - have to be fixed for a bug that is in epoll".

If it was *easy* to fix, and if it was *easy* to validate, then  sure.
But that just isn't the case.

In contrast, in epoll it's *trivial* to fix the one case where it does
a VFS call-out, and just say "you have to follow the rules".

So explain to me again why you want to mess up the driver interface
and everybody who has a '.poll()' function, and not just fix the ONE
clearly buggy piece of code.

Because dammit,. epoll is clearly buggy. It's not enough to say "the
file allocation isn't going away", and claim that that means that it's
not buggy - when the file IS NO LONGER VALID!

  Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-04 Thread Linus Torvalds
On Sat, 4 May 2024 at 08:32, Linus Torvalds
 wrote:
>
> Lookie here, the fundamental issue is that epoll can call '->poll()'
> on a file descriptor that is being closed concurrently.

Thinking some more about this, and replying to myself...

Actually, I wonder if we could *really* fix this by simply moving the
eventpoll_release() to where it really belongs.

If we did it in file_close_fd_locked(),  it would actually make a
*lot* more sense. Particularly since eventpoll actually uses this:

struct epoll_filefd {
struct file *file;
int fd;
} __packed;

ie it doesn't just use the 'struct file *', it uses the 'fd' itself
(for ep_find()).

(Strictly speaking, it should also have a pointer to the 'struct
files_struct' to make the 'int fd' be meaningful).

IOW, eventpoll already considers the file _descriptor_ relevant, not
just the file pointer, and that's destroyed at *close* time, not at
'fput()' time.

Yeah, yeah, the locking situation in file_close_fd_locked() is a bit
inconvenient, but if we can solve that, it would solve the problem in
a fundamentally different way: remove the ep iterm before the
file->f_count has actually been decremented, so the whole "race with
fput()" would just go away entirely.

I dunno. I think that would be the right thing to do, but I wouldn't
be surprised if some disgusting eventpoll user then might depend on
the current situation where the eventpoll thing stays around even
after the close() if you have another copy of the file open.

 Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-04 Thread Linus Torvalds
On Sat, 4 May 2024 at 08:40, Linus Torvalds
 wrote:
>
> And maybe it's even *only* dma-buf that does that fget() in its
> ->poll() function. Even *then* it's not a dma-buf.c bug.

They all do in the sense that they do

  poll_wait
-> __pollwait
 -> get_file (*boom*)

but the boom is very small because the poll_wait() will be undone by
poll_freewait(), and normally poll/select has held the file count
elevated.

Except for epoll. Which leaves those pollwait entries around until
it's done - but again will be held up on the ep->mtx before it does
so.

So everybody does some f_count games, but possibly dma-buf is the only
one that ends up expecting to hold on to the f_count for longer
periods.

 Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-04 Thread Linus Torvalds
On Sat, 4 May 2024 at 08:32, Linus Torvalds
 wrote:
>
> Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the
> file closing completes, eventpoll_release() finishes [..]

Actually, Al is right that ep_item_poll() should be holding the
ep->mtx, so eventpoll_release() -> eventpoll_release_file_file() ->
mutex_lock(>mtx) should block and the file doesn't actually get
released.

So I guess the sock_poll() issue cannot happen. It does need some
poll() function that does 'fget()', and believes that it works.

But because the f_count has already gone down to zero, fget() doesn't
work, and doesn't keep the file around, and you have the bug.

The cases that do fget() in poll() are probably race, but they aren't
buggy. epoll is buggy.

So my example wasn't going to work, but the argument isn't really any
different, it's just a much more limited case that breaks.

And maybe it's even *only* dma-buf that does that fget() in its
->poll() function. Even *then* it's not a dma-buf.c bug.

   Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-04 Thread Linus Torvalds
On Sat, 4 May 2024 at 02:37, Christian Brauner  wrote:
>
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -244,13 +244,18 @@ static __poll_t dma_buf_poll(struct file *file, 
> poll_table *poll)
> if (!dmabuf || !dmabuf->resv)
> return EPOLLERR;
>
> +   if (!get_file_active(>file))
> +   return EPOLLERR;
[...]

I *really* don't think anything that touches dma-buf.c can possibly be right.

This is not a dma-buf.c bug.

This is purely an epoll bug.

Lookie here, the fundamental issue is that epoll can call '->poll()'
on a file descriptor that is being closed concurrently.

That means that *ANY* driver that relies on *any* data structure that
is managed by the lifetime of the 'struct file' will have problems.

Look, here's sock_poll():

static __poll_t sock_poll(struct file *file, poll_table *wait)
{
struct socket *sock = file->private_data;

and that first line looks about as innocent as it possibly can, right?

Now, imagine that this is called from 'epoll' concurrently with the
file being closed for the last time (but it just hasn't _quite_
reached eventpoll_release() yet).

Now, imagine that the kernel is built with preemption, and the epoll
thread gets preempted _just_ before it loads 'file->private_data'.

Furthermore, the machine is under heavy load, and it just stays off
its CPU a long time.

Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the
file closing completes, eventpoll_release() finishes, and the
preemption of the poll() thing just takes so long that you go through
an RCU period too, so that the actual file has been released too.

So now that totally innoced file->private_data load in the poll() is
probably going to get random data.

Yes, the file is allocated as SLAB_TYPESAFE_BY_RCU, so it's probably
still a file. Not guaranteed, even the slab will get fully free'd in
some situations. And yes, the above case is impossible to hit in
practice. You have to hit quite the small race window with an
operation that practically never happens in the first place.

But my point is that the fact that the problem with file->f_count
lifetimes happens for that dmabuf driver is not the fault of the
dmabuf code. Not at all.

It is *ENTIRELY* a bug in epoll, and the dmabuf code is probably just
easier to hit because it has a poll() function that does things that
have longer lifetimes than most things, and interacts more directly
with that f_count.

So I really don't understand why Al thinks this is "dmabuf does bad
things with f_count". It damn well does not. dma-buf is the GOOD GUY
here. It's doing things *PROPERLY*. It's taking refcounts like it damn
well should.

The fact that it takes ref-counts on something that the epoll code has
messed up is *NOT* its fault.

Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-03 Thread Linus Torvalds
On Fri, 3 May 2024 at 16:39, Al Viro  wrote:
>
> *IF* those files are on purely internal filesystem, that's probably
> OK; do that with something on something mountable (char device,
> sysfs file, etc.) and you have a problem with filesystem staying
> busy.

Yeah, I agree, it's a bit annoying in general. That said, it's easy to
do: stash a file descriptor in a unix domain socket, and that's
basically exactly what you have: a random reference to a 'struct file'
that will stay around for as long as you just keep that socket around,
long after the "real" file descriptor has been closed, and entirely
separately from it.

And yes, that's exactly why unix domain socket transfers have caused
so many problems over the years, with both refcount overflows and
nasty garbage collection issues.

So randomly taking references to file descriptors certainly isn't new.

In fact, it's so common that I find the epoll pattern annoying, in
that it does something special and *not* taking a ref - and it does
that special thing to *other* ("innocent") file descriptors. Yes,
dma-buf is a bit like those unix domain sockets in that it can keep
random references alive for random times, but at least it does it just
to its own file descriptors, not random other targets.

So the dmabuf thing is very much a "I'm a special file that describes
a dma buffer", and shouldn't really affect anything outside of active
dmabuf uses (which admittedly is a large portion of the GPU drivers,
and has been expanding from there...). I

So the reason I'm annoyed at epoll in this case is that I think epoll
triggered the bug in some entirely innocent subsystem. dma-buf is
doing something differently odd, yes, but at least it's odd in a "I'm
a specialized thing" sense, not in some "I screw over others" sense.

 Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-03 Thread Linus Torvalds
On Fri, 3 May 2024 at 16:23, Kees Cook  wrote:
>
> static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf)
> {
> return atomic_long_inc_not_zero(>file->f_count) != 0L;
> }
>
> If we end up adding epi_fget(), we'll have 2 cases of using
> "atomic_long_inc_not_zero" for f_count. Do we need some kind of blessed
> helper to live in file.h or something, with appropriate comments?

I wonder if we could try to abstract this out a bit more.

These games with non-ref-counted file structures *feel* a bit like the
games we play with non-ref-counted (aka "stashed") 'struct dentry'
that got fairly recently cleaned up with path_from_stashed() when both
nsfs and pidfs started doing the same thing.

I'm not loving the TTM use of this thing, but at least the locking and
logic feels a lot more straightforward (ie the
atomic_long_inc_not_zero() here is clealy under the 'prime->mutex'
lock

IOW, the tty use looks correct to me, and it has fairly simple locking
and is just catching the the race between 'fput()' decrementing the
refcount and and 'file->f_op->release()' doing the actual release.

You are right that it's similar to the epoll thing in that sense, it
just looks a _lot_ more straightforward to me (and, unlike epoll,
doesn't look actively buggy right now).

Could we abstract out this kind of "stashed file pointer" so that we'd
have a *common* form for this? Not just the inc_not_zero part, but the
locking rule too?

  Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-03 Thread Linus Torvalds
On Fri, 3 May 2024 at 15:07, Al Viro  wrote:
>
> Suppose your program calls select() on a pipe and dmabuf, sees data to be read
> from pipe, reads it, closes both pipe and dmabuf and exits.
>
> Would you expect that dmabuf file would stick around for hell knows how long
> after that?  I would certainly be very surprised by running into that...

Why?

That's the _point_ of refcounts. They make the thing they refcount
stay around until it's no longer referenced.

Now, I agree that dmabuf's are a bit odd in how they use a 'struct
file' *as* their refcount, but hey, it's a specialty use. Unusual
perhaps, but not exactly wrong.

I suspect that if you saw a dmabuf just have its own 'refcount_t' and
stay around until it was done, you wouldn't bat an eye at it, and it's
really just the "it uses a struct file for counting" that you are
reacting to.

Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-03 Thread Linus Torvalds
On Fri, 3 May 2024 at 14:45, Al Viro  wrote:
>
> How do you get through eventpoll_release_file() while someone
> has entered ep_item_poll()?

Doesn't matter.

Look, it's enough that the file count has gone down to zero. You may
not even have gotten to eventpoll_release_file() yet - the important
part is that you're on your *way* to it.

That means that the file will be released - and it means that you have
violated all the refcounting rules for poll().

So I think you're barking up the wrong tree.

  Linus


Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)

2024-05-03 Thread Linus Torvalds
On Fri, 3 May 2024 at 14:36, Al Viro  wrote:
>
> ... the last part is no-go - poll_wait() must be able to grab a reference
> (well, the callback in it must)

Yeah. I really think that *poll* itself is doing everything right. It
knows that it's called with a file pointer with a reference, and it
adds its own references as needed.

And I think that's all fine - both for dmabuf in particular, but for
poll in general. That's how things are *supposed* to work. You can
keep references to other things in your 'struct file *', knowing that
files are properly refcounted, and won't go away while you are dealing
with them.

The problem, of course, is that then epoll violates that "called with
reference" part.  epoll very much by design does *not* take references
to the files it keeps track of, and then tears them down at close()
time.

Now, epoll has its reasons for doing that. They are even good reasons.
But that does mean that when epoll needs to deal with that hackery.

I wish we could remove epoll entirely, but that isn't an option, so we
need to just make sure that when it accesses the ffd.file pointer, it
does so more carefully.

  Linus


Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-03 Thread Linus Torvalds
On Fri, 3 May 2024 at 14:24, Al Viro  wrote:
>
> Can we get to ep_item_poll(epi, ...) after eventpoll_release_file()
> got past __ep_remove()?  Because if we can, we have a worse problem -
> epi freed under us.

Look at the hack in __ep_remove(): if it is concurrent with
eventpoll_release_file(), it will hit this code

spin_lock(>f_lock);
if (epi->dying && !force) {
spin_unlock(>f_lock);
return false;
}

and not free the epi.

But as far as I can tell, almost nothing else cares about the f_lock
and dying logic.

And in fact, I don't think doing

spin_lock(>f_lock);

is even valid in the places that look up file through "epi->ffd.file",
because the lock itself is inside the thing that you can't trust until
you've taken the lock...

So I agree with Kees about the use of "atomic_dec_not_zero()" kind of
logic - but it also needs to be in an RCU-readlocked region, I think.

I wish epoll() just took the damn file ref itself. But since it relies
on the file refcount to release the data structure, that obviously
can't work.

Linus


Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)

2024-05-03 Thread Linus Torvalds
On Fri, 3 May 2024 at 14:11, Al Viro  wrote:
>
> What we need is
> * promise that ep_item_poll() won't happen after 
> eventpoll_release_file().
> AFAICS, we do have that.
> * ->poll() not playing silly buggers.

No. That is not enough at all.

Because even with perfectly normal "->poll()", and even with the
ep_item_poll() happening *before* eventpoll_release_file(), you have
this trivial race:

  ep_item_poll()
 ->poll()

and *between* those two operations, another CPU does "close()", and
that causes eventpoll_release_file() to be called, and now f_count
goes down to zero while ->poll() is running.

So you do need to increment the file count around the ->poll() call, I feel.

Or, alternatively, you'd need to serialize with
eventpoll_release_file(), but that would need to be some sleeping lock
held over the ->poll() call.

> As it is, dma_buf ->poll() is very suspicious regardless of that
> mess - it can grab reference to file for unspecified interval.

I think that's actually much preferable to what epoll does, which is
to keep using files without having reference counts to them (and then
relying on magically not racing with eventpoll_release_file().

Linus


[PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-03 Thread Linus Torvalds
epoll is a mess, and does various invalid things in the name of
performance.

Let's try to rein it in a bit. Something like this, perhaps?

Not-yet-signed-off-by: Linus Torvalds 
---

This is entirely untested, thus the "Not-yet-signed-off-by".  But I
think this may be kind of the right path forward. 

I suspect the ->poll() call is the main case that matters, but there are
other places where eventpoll just looks up the file pointer without then
being very careful about it.  The sock_from_file(epi->ffd.file) uses in
particular should probably also use this to look up the file. 

Comments?

 fs/eventpoll.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 882b89edc52a..bffa8083ff36 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -285,6 +285,30 @@ static inline void free_ephead(struct epitems_head *head)
kmem_cache_free(ephead_cache, head);
 }
 
+/*
+ * The ffd.file pointer may be in the process of
+ * being torn down due to being closed, but we
+ * may not have finished eventpoll_release() yet.
+ *
+ * Technically, even with the atomic_long_inc_not_zero,
+ * the file may have been free'd and then gotten
+ * re-allocated to something else (since files are
+ * not RCU-delayed, they are SLAB_TYPESAFE_BY_RCU).
+ *
+ * But for epoll, we don't much care.
+ */
+static struct file *epi_fget(const struct epitem *epi)
+{
+   struct file *file;
+
+   rcu_read_lock();
+   file = epi->ffd.file;
+   if (!atomic_long_inc_not_zero(>f_count))
+   file = NULL;
+   rcu_read_unlock();
+   return file;
+}
+
 static void list_file(struct file *file)
 {
struct epitems_head *head;
@@ -987,14 +1011,18 @@ static __poll_t __ep_eventpoll_poll(struct file *file, 
poll_table *wait, int dep
 static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt,
 int depth)
 {
-   struct file *file = epi->ffd.file;
+   struct file *file = epi_fget(epi);
__poll_t res;
 
+   if (!file)
+   return 0;
+
pt->_key = epi->event.events;
if (!is_file_epoll(file))
res = vfs_poll(file, pt);
else
res = __ep_eventpoll_poll(file, pt, depth);
+   fput(file);
return res & epi->event.events;
 }
 
-- 
2.44.0.330.g4d18c88175



Re: [git pull] drm for 6.9-rc1

2024-03-13 Thread Linus Torvalds
On Tue, 12 Mar 2024 at 21:07, Dave Airlie  wrote:
>
> I've done a trial merge into your tree from a few hours ago, there
> are definitely some slighty messy conflicts, I've pushed a sample
> branch here:

I appreciate your sample merges since I like verifying my end result,
but I think your merge is wrong.

I got two differences when I did the merge. The one in
intel_dp_detect() I think is just syntactic - I ended up placing the

if (!intel_dp_is_edp(intel_dp))
intel_psr_init_dpcd(intel_dp);

differently than you did (I did it *after* the tunnel_detect()).

I don't _think,_ that placement matters, but somebody more familiar
with the code should check it out. Added Animesh and Jani to the
participants.

But I think your merge gets the TP_printk() for the xe_bo_move trace
event is actively wrong. You don't have the destination for the move
in the printk.

Or maybe I got it wrong. Our merges end up _close_, but not identical.

   Linus


Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing

2024-03-03 Thread Linus Torvalds
On Fri, 1 Mar 2024 at 02:27, Nikolai Kondrashov  wrote:
>
> I agree, it's hard to imagine even a simple majority agreeing on how GitLab CI
> should be done. Still, we would like to help people, who are interested in
> this kind of thing, to set it up. How about we reframe this contribution as a
> sort of template, or a reference for people to start their setup with,
> assuming that most maintainers would want to tweak it? We would also be glad
> to stand by for questions and help, as people try to use it.

Ack. I think seeing it as a library for various gitlab CI models would
be a lot more palatable. Particularly if you can then show that yes,
it is also relevant to our currently existing drm case.

So I'm not objecting to having (for example) some kind of CI helper
templates - I think a logical place would be in tools/ci/ which is
kind of alongside our tools/testing subdirectory.

(And then perhaps have a 'gitlab' directory under that. I'm not sure
whether - and how much - commonality there might be between the
different CI models of different hosts).

Just to clarify: when I say "a logical place", I very much want to
emphasize the "a" - maybe there are better places, and I'm not saying
that is the only possible place. But it sounds more logical to me than
some.

Linus


Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing

2024-03-01 Thread Linus Torvalds
On Thu, 29 Feb 2024 at 01:23, Nikolai Kondrashov  wrote:
>
> However, I think a better approach would be *not* to add the .gitlab-ci.yaml
> file in the root of the source tree, but instead change the very same repo
> setting to point to a particular entry YAML, *inside* the repo (somewhere
> under "ci" directory) instead.

I really don't want some kind of top-level CI for the base kernel project.

We already have the situation that the drm people have their own ci
model. II'm ok with that, partly because then at least the maintainers
of that subsystem can agree on the rules for that one subsystem.

I'm not at all interested in having something that people will then
either fight about, or - more likely - ignore, at the top level
because there isn't some global agreement about what the rules are.

For example, even just running checkpatch is often a stylistic thing,
and not everybody agrees about all the checkpatch warnings.

I would suggest the CI project be separate from the kernel.

And having that slack channel that is restricted to particular
companies is just another sign of this whole disease.

If you want to make a google/microsoft project to do kernel CI, then
more power to you, but don't expect it to be some kind of agreed-upon
kernel project when it's a closed system.

   Linus


Re: [PATCH next v2 08/11] minmax: Add min_const() and max_const()

2024-02-25 Thread Linus Torvalds
On Sun, 25 Feb 2024 at 08:53, David Laight  wrote:
>
> The expansions of min() and max() contain statement expressions so are
> not valid for static intialisers.
> min_const() and max_const() are expressions so can be used for static
> initialisers.

I hate the name.

Naming shouldn't be about an implementation detail, particularly not
an esoteric one like the "C constant expression" rule. That can be
useful for some internal helper functions or macros, but not for
something that random people are supposed to USE.

Telling some random developer that inside an array size declaration or
a static initializer you need to use "max_const()" because it needs to
syntactically be a constant expression, and our regular "max()"
function isn't that, is just *horrid*.

No, please just use the traditional C model of just using ALL CAPS for
macro names that don't act like a function.

Yes, yes, that may end up requiring getting rid of some current users of

  #define MIN(a,b) ((a)<(b) ? (a):(b))

but dammit, we don't actually have _that_ many of them, and why should
we have random drivers doing that anyway?

  Linus


Re: [PATCH 2/9] lib/cmdline: Fix an invalid format specifier in an assertion msg

2024-02-22 Thread Linus Torvalds
On Thu, 22 Feb 2024 at 09:36, Daniel Latypov  wrote:
>
> Copying the line for context, it's about `p-r` where
>   p = memchr_inv([1], 0, sizeof(r) - sizeof(r[0]));
> `p-r` should never be negative unless something has gone horribly
> horribly wrong.

Sure it would - if 'p' is NULL.

Of course, then a negative value wouldn't be helpful either, and in
this case that's what the EXPECT_PTR_EQ checking is testing in the
first place, so it's a non-issue.

IOW, in practice clearly the sign should simply not matter here.

I do think that the default case for pointer differences should be
that they are signed, because they *can* be.

Just because of that "default case", unless there's some actual reason
to use '%tu', I think '%td' should be seen as the normal case to use.

That said, just as a quick aside: be careful with pointer differences
in the kernel.

For this particular case, when we're talking about just 'char *', it's
not a big deal, but we've had code where people didn't think about
what it means to do a pointer difference in C, and how it can be often
unnecessarily expensive due to the implied "divide by the size of the
pointed object".

Sometimes it's actually worth writing the code in ways that avoids
pointer differences entirely (which might involve passing around
indexes instead of pointers).

 Linus


Re: [PATCH 8/9] drm/xe/tests: Fix printf format specifiers in xe_migrate test

2024-02-21 Thread Linus Torvalds
On Wed, 21 Feb 2024 at 21:05, Lucas De Marchi  wrote:
>
> this has a potential to cause conflicts with upcoming work, so I think
> it's better to apply this through drm-xe-next. Let me know if you agree.

I disagree. Violently.

For this to be fixed, we need to have the printf format checking enabled.

And we can't enable it until all the problems have been fixed.

Which means that we should *not* have to wait for [N] different trees
to fix their issues separately.

This should get fixed in the Kunit tree, so that the Kunit tree can
just send a pull request to me to enable format checking for the KUnit
tests, together with all the fixes.  Trying to spread those fixes out
to different git branches will only result in pain and pointless
dependencies between different trees.

Honestly, the reason I noticed the problem in the first place was that
the drm tree had a separate bug, that had been apparently noted in
linux-next, and *despite* that it made it into a pull request to me
and caused new build failures in rc5.

So as things are, I am not IN THE LEAST interested in some kind of
"let us fix this in the drm tree separately" garbage.  We're not
making things worse by trying to fix this in different trees.

We're fixing this in the Kunit tree, and I do not want to get *more*
problems from the drm side. I've had enough.

   Linus


Re: [PATCH next 10/11] block: Use a boolean expression instead of max() on booleans

2024-01-28 Thread Linus Torvalds
On Sun, 28 Jan 2024 at 14:22, David Laight  wrote:
>
> H blame gcc :-)

I do agree that the gcc warning quoting is unnecessarily ugly (even
just visually), but..

> The error message displays as '0' but is e2:80:98 30 e2:80:99
> I HATE UTF-8, it wouldn't be as bad if it were a bijection.

No, that's not the problem. The UTF-8 that gcc emits is fine.

And your email was also UTF-8:

Content-Type: text/plain; charset=UTF-8

The problem is that you clearly then used some other tool in between
that took the UTF-8 byte stream, and used it as (presumably) Latin1,
which is bogus.

If you just make everything use and stay as UTF-8, it all works out
beautifully. But I suspect you have an editor or a MUA that is fixed
in some 1980s mindset, and when you cut-and-pasted the UTF-8, it
treated it as Latin1.

Just make all your environment be utf-8, like it should be. It's not
the 80s any more. We don't do mullets, and we don't do Latin1, ok?

Linus


Re: [PATCH next 10/11] block: Use a boolean expression instead of max() on booleans

2024-01-28 Thread Linus Torvalds
On Sun, 28 Jan 2024 at 11:36, David Laight  wrote:
>
> However it generates:
> error: comparison of constant ‘0’ with boolean expression is always true 
> [-Werror=bool-compare]
> inside the signedness check that max() does unless a '+ 0' is added.

Please fix your locale. You have random garbage characters there,
presumably because you have some incorrect locale setting somewhere in
your toolchain.

   Linus


Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4

2024-01-22 Thread Linus Torvalds
On Mon, 22 Jan 2024 at 16:56, Bhardwaj, Rajneesh
 wrote:
>
> I think a fix might already be in flight. Please see  Linux-Kernel Archive: 
> Re: [PATCH] drm/ttm: fix ttm pool initialization for no-dma-device drivers 
> (iu.edu)

Please use lore.kernel.org that doesn't corrupt whitespace in patches
or lose header information:

  https://lore.kernel.org/lkml/20240113213347.9562-1-pchel...@ispras.ru/

although that seems to be a strange definition of "in flight". It was
sent out 8 days ago, and apparently nobody thought to include it in
the drm fixes pile that came in last Friday.

So it made it into rc1, even though it was reported a week before.

It also looks like some mailing list there is mangling emails - if you
use 'all' instead of 'lkml', lore reports multiple emails with the
same message-id, and it all looks messier as a result.

I assume it's dri-devel@lists.freedesktop.org that messes up, mainly
because I don't tend to see this behaviour when only the usual
kernel.org mailing lists are involved.

  Linus


Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4

2024-01-22 Thread Linus Torvalds
On Mon, 22 Jan 2024 at 15:17, Steven Rostedt  wrote:
>
> Perhaps this is the real fix?

If you send a signed-off version, I'll apply it asap.

Thanks,
 Linus


Re: [git pull] drm for 6.8

2024-01-12 Thread Linus Torvalds
On Wed, 10 Jan 2024 at 11:49, Dave Airlie  wrote:
>
> Let me know if there are any issues,

Your testing is seriously lacking.

This doesn't even build. The reason seems to be that commit
b49e894c3fd8 ("drm/i915: Replace custom intel runtime_pm tracker with
ref_tracker library") changed the 'intel_wakeref_t' type from a
'depot_stack_handle_t' to 'unsigned long', and as a result did this:

-   drm_dbg(>drm, "async_put_wakeref %u\n",
+   drm_dbg(>drm, "async_put_wakeref %lu\n",
power_domains->async_put_wakeref);

meanwhile, the Xe driver has this:

  drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h:
typedef bool intel_wakeref_t;

which has never been valid, but now the build fails with

  drivers/gpu/drm/i915/display/intel_display_power.c: In function
‘print_async_put_domains_state’:
  drivers/gpu/drm/i915/display/intel_display_power.c:408:29: error:
format ‘%lu’ expects argument of type ‘long unsigned int’, but
argument 5 has type ‘int’ [-Werror=format=]

because the drm header files have this disgusting thing where a
*header* file includes a *C* file:

  In file included from ./include/drm/drm_mm.h:51,
 from drivers/gpu/drm/xe/xe_bo_types.h:11,
 from drivers/gpu/drm/xe/xe_bo.h:11,
 from
./drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h:11,
 from ./drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h:15,
 from drivers/gpu/drm/i915/display/intel_display_power.c:8:

nasty.

I made it build by fixing that broken Xe compat header file, but this
is definitely *NOT* how things should have worked. How did this ever
get to me without any kind of build testing?

And why the %^!@$% does a header file include a C file? That's wrong
regardless of this bug.

   Linus


Re: [git pull] drm fixes for 6.8

2024-01-04 Thread Linus Torvalds
On Wed, 3 Jan 2024 at 18:30, Dave Airlie  wrote:
>
> These were from over the holiday period, mainly i915, a couple of
> qaic, bridge and an mgag200.
>
> I have a set of nouveau fixes that I'll send after this, that might be
> too rich for you at this point.
>
> I expect there might also be some more regular fixes before 6.8, but
> they should be minor.

I'm assuming you're just confused about the numbering, and meant 6.7
here and in the subject line.

This seems to be too small of a pull to be an early pull request for
the 6.8 merge window.

   Linus


Re: [GIT PULL] fbdev fixes and updates for v6.7-rc3

2023-11-26 Thread Linus Torvalds
On Sat, 25 Nov 2023 at 22:58, Helge Deller  wrote:
>
> please pull some small fbdev fixes for 6.7-rc3.

These all seem to be pure cleanups, not bug fixes.

Please resend during the merge window.

Linus


github version complaints about the gitlab CI requirements.txt

2023-11-12 Thread Linus Torvalds
So every time I push to my github mirror, github now ends up having a
'dependabot' thing that warns about some of the CI version
requirements for the gitlab automated testing file.

It wants to update the pip requirements from 23.2.1 to 23.3

 - When installing a package from a Mercurial VCS URL, e.g. pip install
   hg+..., with pip prior to v23.3, the specified Mercurial revision
   could be used to inject arbitrary configuration options to the hg
   clone call (e.g. --config). Controlling the Mercurial configuration
   can modify how and which repository is installed. This vulnerability
   does not affect users who aren't installing from Mercurial.

and upgrade the urllib3 requirements from 2.0.4 to 2.0.7:

 - urllib3's request body not stripped after redirect from 303 status
   changes request method to GET

 - `Cookie` HTTP header isn't stripped on cross-origin redirects

And it's not like any of this looks like a big deal, but I'd like to
shut up the messages I get.

I can either just close those issues, or I can apply a patch something
like the attached (which also adds a missing newline at the end).

I thought I should ask the people who actually set this up. Comments?

   Linus
 drivers/gpu/drm/ci/xfails/requirements.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ci/xfails/requirements.txt b/drivers/gpu/drm/ci/xfails/requirements.txt
index d8856d1581fd..e9994c9db799 100644
--- a/drivers/gpu/drm/ci/xfails/requirements.txt
+++ b/drivers/gpu/drm/ci/xfails/requirements.txt
@@ -5,7 +5,7 @@ termcolor==2.3.0
 certifi==2023.7.22
 charset-normalizer==3.2.0
 idna==3.4
-pip==23.2.1
+pip==23.3
 python-gitlab==3.15.0
 requests==2.31.0
 requests-toolbelt==1.0.0
@@ -13,5 +13,5 @@ ruamel.yaml==0.17.32
 ruamel.yaml.clib==0.2.7
 setuptools==68.0.0
 tenacity==8.2.3
-urllib3==2.0.4
-wheel==0.41.1
\ No newline at end of file
+urllib3==2.0.7
+wheel==0.41.1


Re: drm/vkms: deadlock between dev->event_lock and timer

2023-09-13 Thread Linus Torvalds
On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa
 wrote:
>
> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ .
> It looks like this locking pattern remains as of 6.6-rc1. Please fix.
>
> void drm_crtc_vblank_off(struct drm_crtc *crtc) {
>   spin_lock_irq(>event_lock);
>   drm_vblank_disable_and_save(dev, pipe) {
> __disable_vblank(dev, pipe) {
>   crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank {
> hrtimer_cancel(>vblank_hrtimer) { // Retries with 
> dev->event_lock held until lock_hrtimer_base() succeeds.
>   ret = hrtimer_try_to_cancel(timer) {
> base = lock_hrtimer_base(timer, ); // Fails forever because 
> vkms_vblank_simulate() is in progress.

Heh. Ok. This is clearly a bug, but it does seem to be limited to just
the vkms driver, and literally only to the "simulate vblank" case.

The worst part about it is that it's so subtle and not obvious.

Some light grepping seems to show that amdgpu has almost the exact
same pattern in its own vkms thing, except it uses

hrtimer_try_to_cancel(_crtc->vblank_timer);

directly, which presumably fixes the livelock, but means that the
cancel will fail if it's currently running.

So just doing the same thing in the vkms driver probably fixes things.

Maybe the vkms people need to add a flag to say "it's canceled" so
that it doesn't then get re-enabled?  Or maybe it doesn't matter
and/or already happens for some reason I didn't look into.

   Linus


Re: [git pull] drm CI integration

2023-09-10 Thread Linus Torvalds
On Wed, 30 Aug 2023 at 18:00, Dave Airlie  wrote:
>
> This is a PR to add drm-ci support files to the upstream tree.

So I finally had no other pull requests pending, and spent some time
looking at this, and I see nothing offensive.

I did wonder how this then expands to having more than one subsystem
using this (and mixing them possibly on the same CI system), but
that's more about my ignorance about how the gitlab CI works than
anything else, so that's certainly not a real concern.

The other side of that "I do wonder" coin is for when others want to
use the same tests but using some other CI infrastructure, whether
it's some AWS or google cloud thing, or github or whatever.

Anyway, considering that both of my idle curiosity reactions were
about "if this is successful", I think me having those questions only
means that I should pull this, rather than questioning the pull
itself.

If it works out so well that others want to actually do this and
integrate our other selftests in similar manners, I think that would
be lovely.

And if - as you say - this is a failure and the whole thing gets
deleted a year from now as "this didn't go anywhere", it doesn't look
like it should cause a ton of problems either.

Anyway, it's in my tree now, let's see where it goes.

   Linus


Re: [git pull] drm fixes for 6.6-rc1

2023-09-07 Thread Linus Torvalds
On Thu, 7 Sept 2023 at 19:45, Dave Airlie  wrote:
>
> Just a poke about the outstanding drm CI support pull request since I
> haven't see any movement on that in the week, hopefully it's not as
> difficult a problem as bcachefs :-)

I was assuming that it wouldn't interfere with anything else... and
that I could just ignore it until I have all my "real" pulls done. I
didn't want to even look at it until I was "done".

Yes, it's past the mid-way point of the second week of the merge
window, and I mostly _should_ be "done".

But I still keep finding new pull requests in my inbox that aren't
just fixes and updates for previous main pull requests.

Linus


Re: mainline build failure due to 501126083855 ("fbdev/g364fb: Use fbdev I/O helpers")

2023-08-31 Thread Linus Torvalds
On Thu, 31 Aug 2023 at 11:48, Sudip Mukherjee (Codethink)
 wrote:
> The latest mainline kernel branch fails to build mips jazz_defconfig with
> the error:
>
> drivers/video/fbdev/g364fb.c:115:9: error: 'FB_DEFAULT_IOMEM_HELPERS' 
> undeclared here (not in a function); did you mean 'FB_DEFAULT_IOMEM_OPS'?
>   115 | FB_DEFAULT_IOMEM_HELPERS,
>   | ^~~~
>   | FB_DEFAULT_IOMEM_OPS
>
>
> git bisect pointed to 501126083855 ("fbdev/g364fb: Use fbdev I/O helpers").
>
> Reverting the commit has fixed the build failure.
>
> I will be happy to test any patch or provide any extra log if needed.

Would you mind testing the exact thing that the compiler suggested?

So instead of the revert, just replace FB_DEFAULT_IOMEM_HELPERS with
FB_DEFAULT_IOMEM_OPS.

I think it's just a typo / confusion with the config variable (which
is called FB_IOMEM_HELPERS).

   Linus


Re: [git pull] drm fixes for 6.5-rc4

2023-07-28 Thread Linus Torvalds
On Thu, 27 Jul 2023 at 19:20, Dave Airlie  wrote:
>
> Regular scheduled fixes, msm and amdgpu leading the way, with some
> i915 and a single misc fbdev, all seems fine.

Pulled.

Tangentially related: where do you keep your pgp key? The one I have
is long expired, and doing a refresh doesn't get any updates...

Linus


Re: [pull] amdgpu drm-fixes-6.4

2023-06-23 Thread Linus Torvalds
On Fri, 23 Jun 2023 at 14:18, Alex Deucher  wrote:
>
> are available in the Git repository at:
>
>   https://gitlab.freedesktop.org/agd5f/linux.git 
> tags/amd-drm-fixes-6.4-2023-06-23

That's not a valid signed tag.

Yes, it's a tag.  But it doesn't actually have any cryptographic
signing, and I'm not willing to pull random content from git sites
that I can't verify. In fact, these days I ask even kernel.org pull
requests to be proper signed tags, although I haven't really gotten to
the point where I *require* it.

So please sign your tags - use "git tag -s" (or "-u keyname" if you
have some specific key you want to use, rather than one described by
your regular git config file).

  Linus


Re: mainline build failure due to 322458c2bb1a ("drm/ttm: Reduce the number of used allocation orders for TTM pages")

2023-04-26 Thread Linus Torvalds
On Wed, Apr 26, 2023 at 10:44 AM Sudip Mukherjee (Codethink)
 wrote:
>
> drivers/gpu/drm/ttm/ttm_pool.c:73:29: error: variably modified 
> 'global_write_combined' at file scope
>73 | static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
>   | ^

Ugh.

This is because we have

  #define TTM_DIM_ORDER (__TTM_DIM_ORDER <= MAX_ORDER ?
__TTM_DIM_ORDER : MAX_ORDER)

which looks perfectly fine as a constant ("pick the smaller of
MAX_ORDER and __TTM_DIM_ORDER").

But:

  #define __TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
  #define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)

which still _looks_ fine, but  on 64-bit powerpc, we then have

   #define PTE_INDEX_SIZE  __pte_index_size

so that __TTM_DIM_ORDER constant isn't actually a constant at all.

I would suggest that the DRM code just make those fixed-size arrays
use "MAX_ORDER", because even though it might be a bit bigger than
necessary, that's still not a very big constant (ie typically it's
"11", but it could be up to 64).

It's a bit sad how that macro that _looks_ like a constant (and is one
pretty much everywhere else) isn't actually constant on powerpc, but
looking around it looks fairly unavoidable.

Maybe the DRM code could try to avoid using things like PMD_SHIFT entirely?

Linus


Disabling -Warray-bounds for gcc-13 too

2023-04-23 Thread Linus Torvalds
Kees,
  I made the mistake of upgrading my M2 Macbook Air to Fedora-38, and
in the process I got gcc-13 which is not WERROR-clean because we only
limited the 'array-bounds' warning to gcc-11 and gcc-12. But gcc-13
has all the same issues.

And I want to be able to do my arm64 builds with WERROR on still...

I guess it never made much sense to hope it was going to go away
without having a confirmation, so I just changed it to be gcc-11+.

A lot of the warnings seem just crazy, with gcc just not getting the
bounds right, and then being upset about us going backwards with
'container_of()' etc. Ok, so the kernel is special. We do odd things.
I get it, gcc ends up being confused.

But before I disabled it, I did take a look at a couple of warnings
that didn't look like the sea of crazy.

And one of them is from you.

In particular, commit 4076ea2419cf ("drm/nouveau/disp: Fix
nvif_outp_acquire_dp() argument size") cannot possibly be right, It
changes

 nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[16],

to

 nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[DP_RECEIVER_CAP_SIZE],

and then does

memcpy(args.dp.dpcd, dpcd, sizeof(args.dp.dpcd));

where that 'args.dp.dpcd' is a 16-byte array, and DP_RECEIVER_CAP_SIZE is 15.

So yeah, it's copying 16 bytes from an argument that claims to be 15
bytes in size.

I think that commit was wrong, and the problem is that the 'dpcd'
array is something 15 and sometimes 16. For example, we have

  struct nouveau_encoder {
...
union {
struct {
...
u8 dpcd[DP_RECEIVER_CAP_SIZE];
} dp;
};

so there it's indeed 15 bytes, but then we have

union nvif_outp_acquire_args {
struct nvif_outp_acquire_v0 {
...
union {
...
struct {
...
__u8 dpcd[16];
} dp;

where it's 16.

I think it's all entirely harmless from a code generation standpoint,
because the 15-byte field will be padded out to 16 bytes in the
structure that contains it, but it's most definitely buggy.

So that warning does find real cases of wrong code. But when those
real cases are hidden by hundreds of lines of unfixable false
positives, we don't have much choice.

But could the Nouveau driver *please* pick a size for the dhcp[] array
and stick with it?

The other driver where the warnings didn't look entirely crazy was the
ath/carl9170 wireless driver, but I didn't look closer at that one.

 Linus


Re: Linux 6.3-rc3

2023-03-22 Thread Linus Torvalds
On Wed, Mar 22, 2023 at 9:40 AM Sedat Dilek  wrote:
>
> You have to pass `make LLVM=1` in any case... to `oldconfig` or when
> adding any MAKEFLAGS like -j${number-of-available-cpus}.

I actually think we should look (again) at just making the compiler
choice (and the prefix) be a Kconfig option.

That would simplify *so* many use cases.

It used to be that gcc was "THE compiler" and anything else was just
an odd toy special case, but that's clearly not true any more.

So it would be lovely to make the kernel choice a Kconfig choice - so
you'd set it only at config time, and then after that a kernel build
wouldn't need special flags any more, and you'd never need to play
games with GNUmakefile or anything like that.

Yes, you'd still use environment variables (or make arguments) for
that initial Kconfig, but that's no different from the other
environment variables we already have, like KCONFIG_SEED that kconfig
uses internally, but also things like "$(ARCH)" that we already use
*inside* the Kconfig files themselves.

I really dislike how you have to set ARCH and CROSS_COMPILE etc
externally, and can't just have them *in* the config file.

So when you do cross-compiles, right now you have to do something like

make ARCH=i386 allmodconfig

to build the .config file, but then you have to *repeat* that
ARCH=i386 when you actually build things:

make ARCH=i386

because the ARCH choice ends up being in the .config file, but the
makefiles themselves always take it from the environment.

There are good historical reasons for our behavior (and probably a
number of extant practical reasons too), but it's a bit annoying, and
it would be lovely if we could start moving away from this model.

Linus


Re: Linux 6.3-rc3

2023-03-20 Thread Linus Torvalds
On Mon, Mar 20, 2023 at 3:06 PM Nathan Chancellor  wrote:
>
> Right, this seems like a subtle difference in semantics between
> -Wuninitialized between clang and GCC.

I guess it's a bit ambiguous whether it's

 "X may be USED uninitialized"

or whether it is

 "X may BE uninitialized"

and then depending on how you see that ambiguity, the control flow matters.

In this case, there is absolutely no question that the variable is
uninitialized (since there is no write to it at all).

So it is very clearly and unambiguously uninitialized. And I do think
that as a result, "-Wuninitialized" should warn.

But at the same time, whether it is *used* or not depends on that
conditional, so I can see how it could be confusing and not be so
clear an unambiguous.

On the whole, I do wish that the logic would be "after dead code
removal, if some pseudo has no initializer, it should always warn,
regardless of any remaining dynamic conditoinals".

That "after dead code removal" might matter, because I could see where
config things (#ifdef's etc) would just remove the initialization of
some variable, and if the use is behind some static "if (0)", then
warning about it is all kinds of silly.

 Linus


Re: Linux 6.3-rc3

2023-03-20 Thread Linus Torvalds
On Mon, Mar 20, 2023 at 1:05 PM Guenter Roeck  wrote:
>
> I have noticed that gcc doesn't always warn about uninitialized variables
> in most architectures.

Yeah, I'm getting the feeling that when the gcc people were trying to
make -Wmaybe-uninitialized work better (when moving it into "-Wall"),
they ended up moving a lot of "clearly uninitialized" cases into it.

So then because we disable the "maybe" case (with
-Wno-maybe-uninitialized) because it had too many random false
positives, we end up not seeing the obvious cases either.

  Linus


Re: Linux 6.3-rc3

2023-03-20 Thread Linus Torvalds
On Mon, Mar 20, 2023 at 11:56 AM Nathan Chancellor  wrote:
>
> I did see a patch fly by to fix that:
>
> https://lore.kernel.org/20230316082035.567520-3-christian.koe...@amd.com/
>
> It seems like the DRM_TEGRA half of it is broken though:
>
> https://lore.kernel.org/202303170635.a2rsq1wu-...@intel.com/

Hmm. x86-64 has 'vmap()' too.

So I think that DRM_TEGRA breakage is likely just due to a missing
header file include that then (by luck and mistake) gets included on
arm.

You need  for 'vmap()'.

There might be something else going on, I didn't look deeply at it.

   Linus


Re: Linux 6.3-rc3

2023-03-20 Thread Linus Torvalds
On Mon, Mar 20, 2023 at 11:26 AM Linus Torvalds
 wrote:
>
> Hmm. I do my arm64 allmodconfig builds with gcc, and I'm surprised
> that gcc doesn't warn about this.

Side note: I'm also wondering why that TEGRA_HOST1X config has that
ARM dependency in

depends on ARCH_TEGRA || (ARM && COMPILE_TEST)

because it seems to build just fine at least on x86-64 if I change it to be just

depends on ARCH_TEGRA || COMPILE_TEST

ie there seems to be nothing ARM-specific in there.

Limiting it to just the tegra platform by default makes 100% sense,
but that "only do compile-testing on ARM" seems a bit bogus.

That limit goes back to forever (commit 6f44c2b5280f: "gpu: host1x:
Increase compile test coverage" back in Nov 2013), so maybe things
didn't use to work as well back in the dark ages?

None of this explains why gcc didn't catch it, but at least allowing
the build on x86-64 would likely have made it easier for people to see
clang catching this.

   Linus


Re: Linux 6.3-rc3

2023-03-20 Thread Linus Torvalds
On Mon, Mar 20, 2023 at 11:05 AM Nathan Chancellor  wrote:
>
> On the clang front, I am still seeing the following warning turned error
> for arm64 allmodconfig at least:
>
>   drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is 
> uninitialized when used here [-Werror,-Wuninitialized]
>   if (syncpt_irq < 0)
>   ^~

Hmm. I do my arm64 allmodconfig builds with gcc, and I'm surprised
that gcc doesn't warn about this.

That syncpt_irq thing isn't written to anywhere, so that's pretty egregious.

We use -Wno-maybe-uninitialized because gcc gets it so wrong, but
that's different from the "-Wuninitialized" thing (without the
"maybe").

I've seen gcc mess this up when there is one single assignment,
because then the SSA format makes it *so* easy to just use that
assignment out-of-order (or unconditionally), but this case looks
unusually clear-cut.

So the fact that gcc doesn't warn about it is outright odd.

> If that does not come to you through other means before -rc4, could you
> just apply it directly so that I can stop applying it to our CI? :)

Bah. I took it now, there's no excuse for that thing.

Do we have any gcc people around that could explain why gcc failed so
miserably at this trivial case?

   Linus


Re: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

2023-03-17 Thread Linus Torvalds
On Wed, Mar 15, 2023 at 5:22 PM Steven Rostedt  wrote:
>
> I hope that this gets in by -rc3, as I want to start basing my next branch
> on that tag.

My tree should have it now as commit c00133a9e87e ("drm/ttm: drop
extra ttm_bo_put in ttm_bo_cleanup_refs").

Linus


Re: [PATCH 0/2] docs & checkpatch: allow Closes tags with links

2023-03-16 Thread Linus Torvalds
On Thu, Mar 16, 2023 at 4:43 AM Matthieu Baerts
 wrote:
>
> @Linus: in short, we would like to continue using the "Closes:" tag (or
> similar, see below) with a URL in commit messages. They are useful to
> have public bug trackers doing automated actions like closing a specific
> ticket. Any objection from your side?

As long as it's a public link, I guess that just documents what the
drm people have been doing.

I'm not convinced "Closes" is actually any better than just "Link:",
though. I would very much hope and expect that the actual closing of
any bug report is actually done separately and verified, rather than
some kind of automated "well, the commit says it closes it, so.."

So honestly, I feel like "Link:" is just a better thing, and I worry
that "Closes:" is then going to be used for random internal crap.
We've very much seen people wanting to do that - having their own
private bug trackers, and then using the commit message to refer to
them, which I am *violently* against. If it's only useful to some
closed community, it shouldn't be in the public commits.

And while the current GPU people seem to use "Closes:" the right way
(and maybe some other groups do too - but it does seem to be mostly a
freedesktop thing), I really think it is amenable to mis-use in ways
"Link:" is not.

The point of "Link:" is explicitly two-fold:

 - it makes it quite obvious that you expect an actual valid web-link,
not some internal garbage

 - random people always want random extensions, and "Link:" is
_designed_ to counter-act that creeping "let's add a random new tag"
disease. It's very explicitly "any relevant link".

and I really question the value of adding new types of tags,
particularly ones that seem almost designed to be mis-used.

So I'm not violently against it, and 99% of the existing uses seem
fine. But I do note that some of the early "Closes:" tags in the
kernel were very much complete garbage, and exactly the kind of thing
that I absolutely detest.

What does

Closes: 10437

mean? That's crazy talk. (And yes, in that case it was a
kernel.bugzilla.org number, which is perfectly fine, but I'm using it
as a very real example of how "Closes:" ends up being very naturally
to mis-use).

End result: I don't hate our current "Closes:" uses. But I'm very wary of it.

I'm not at all convinced that it really adds a lot of value over
"Link:", and I am, _very_ aware of how easily it can be then taken to
be a "let's use our own bug tracker cookies here".

So I will neither endorse nor condemn it, but if I see people using it
wrong, I will absolutely put my foot down.

Linus


Re: drm next for 6.3-rc1

2023-02-24 Thread Linus Torvalds
On Fri, Feb 24, 2023 at 5:30 PM Dave Airlie  wrote:
>
> Any issues with this? I get nervous around 48hrs :-)

It was merged on Wednesday evening. See commit a5c95ca18a98.

If you were waiting for a pr-tracker-bot reply, I think you need to
put "{GIT PULL]" in the subject line for the automation to trigger on
it.

 Linus


Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

2023-01-16 Thread Linus Torvalds
[ Back from travel, so trying to make sense of this series.. ]

On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park  wrote:
>
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock(?) acquisition order to try to
> cover all synchonization machanisms. It's done on v6.2-rc2.

Ugh. I hate how this adds random patterns like

if (timeout == MAX_SCHEDULE_TIMEOUT)
sdt_might_sleep_strong(NULL);
else
sdt_might_sleep_strong_timeout(NULL);
   ...
sdt_might_sleep_finish();

to various places, it seems so very odd and unmaintainable.

I also recall this giving a fair amount of false positives, are they all fixed?

Anyway, I'd really like the lockdep people to comment and be involved.
We did have a fairly recent case of "lockdep doesn't track page lock
dependencies because it fundamentally cannot" issue, so DEPT might fix
those kinds of missing dependency analysis. See

https://lore.kernel.org/lkml/60d41f05f139a...@google.com/

for some context to that one, but at teh same time I would *really*
want the lockdep people more involved and acking this work.

Maybe I missed the email where you reported on things DEPT has found
(and on the lack of false positives)?

   Linus


Re: [git pull] drm for 6.2-rc1

2022-12-14 Thread Linus Torvalds
On Wed, Dec 14, 2022 at 12:05 AM Christian König
 wrote:
>
> Anyway we need to re-apply b09d6acba1d9 which should be trivial.

Note that my resolution did exactly that (*), it's just that when I
double-checked against Dave's suggested merge that I noticed I'd done
things differently than he did.

(*) Well, when I say "did exactly that" I don't actually know some of
the other fencing changes that happened, so there may be a reason why
something further should still be done.  So I can only point to my
merge commit a594533df0f6 and ask people to verify.

It does all at least work for me. Knock wood.

  Linus


Re: [git pull] drm for 6.2-rc1

2022-12-13 Thread Linus Torvalds
On Mon, Dec 12, 2022 at 6:56 PM Dave Airlie  wrote:
>
> There are a bunch of conflicts, one in amdgpu is a bit nasty, I've
> cc'ed Christian/Alex to make sure they know to check whatever
> resolution you find. The one I have is what we have in drm-tip tree.

Hmm. My merge resolution is slightly different from yours.

You seem to have basically dropped commit b09d6acba1d9 ("drm/amdgpu:
handle gang submit before VMID").

Now, there are other fence changes in the drm tree that may mean that
that commit *should* be dropped, so it's entirely possible that my
resolution which kept that ordering change might be wrong and your
resolution that just took the drm tip code is the right one.

Christian? Alex? Can you please double-check what I just pushed out?

Linus


Re: [PATCH RFC 16/19] mm/frame-vector: remove FOLL_FORCE usage

2022-11-22 Thread Linus Torvalds
On Tue, Nov 22, 2022 at 4:25 AM Hans Verkuil  wrote:
>
> I tracked the use of 'force' all the way back to the first git commit
> (2.6.12-rc1) in the very old video-buf.c. So it is very, very old and the
> reason is lost in the mists of time.

Well, not entirely.

For archaeology reasons, I went back to the old BK history, which
exists as a git conversion in

https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/

and there you can actually find it.

Not with a lot of explanations, though - it's commit b7649ef789
("[PATCH] videobuf update"):

This updates the video-buf.c module (helper module for video buffer
management).  Some memory management fixes, also some adaptions to the
final v4l2 api.

but it went from

 err = get_user_pages(current,current->mm,
-data, dma->nr_pages,
-rw == READ, 0, /* don't force */
+data & PAGE_MASK, dma->nr_pages,
+rw == READ, 1, /* force */
 dma->pages, NULL);

in that commit.

So it goes back to October 2002.

> Looking at this old LWN article https://lwn.net/Articles/28548/ suggests
> that it might be related to calling get_user_pages for write buffers

The timing roughly matches.

> I assume that removing FOLL_FORCE from 'FOLL_FORCE|FOLL_WRITE' will still
> allow drivers to read from the buffer?

The issue with some of the driver hacks has been that

 - they only want to read, and the buffer may be read-only

 - they then used FOLL_WRITE despite that, because they want to break
COW (due to the issue that David is now fixing with his series)

 - but that means that the VM layer says "nope, you can't write to
this read-only user mapping"

 - ... and then they use FOLL_FORCE to say "yes, I can".

iOW, the FOLL_FORCE may be entirely due to an (incorrect, but
historically needed) FOLL_WRITE.

 Linus


Re: [PATCH mm-unstable v1 20/20] mm: rename FOLL_FORCE to FOLL_PTRACE

2022-11-17 Thread Linus Torvalds
On Thu, Nov 17, 2022 at 2:58 PM Kees Cook  wrote:
>
> Oh, er, why does get_arg_page() even need FOLL_FORCE? This is writing the
> new stack contents to the nascent brpm->vma, which was newly allocated
> with VM_STACK_FLAGS, which an arch can override, but they all appear to 
> include
> VM_WRITE | VM_MAYWRITE.

Yeah, it does seem entirely superfluous.

It's been there since the very beginning (although in that original
commit b6a2fea39318 it was there as a '1' to the 'force' argument to
get_user_pages()).

I *think* it can be just removed. But as long as it exists, it should
most definitely not be renamed to FOLL_PTRACE.

There's a slight worry that it currently hides some other setup issue
that makes it matter, since it's been that way so long, but I can't
see what it is.

 Linus


Re: [PATCH mm-unstable v1 20/20] mm: rename FOLL_FORCE to FOLL_PTRACE

2022-11-16 Thread Linus Torvalds
On Wed, Nov 16, 2022 at 2:30 AM David Hildenbrand  wrote:
>
> Let's make it clearer that functionality provided by FOLL_FORCE is
> really only for ptrace access.

I'm not super-happy about this one.

I do understand the "let's rename the bit so that no new user shows up".

And it's true that the main traditional use is ptrace.

But from the patch itself it becomes obvious that no, it's not *just*
ptrace. At least not yet.

It's used for get_arg_page(), which uses it to basically look up (and
install) pages in the newly created VM.

Now, I'm not entirely sure why it even uses FOLL_FORCE, - I think it
might be historical, because the target should always be the new stack
vma.

Following the history of it is a big of a mess, because there's a
number of renamings and re-organizations, but it seems to go back to
2007 and commit b6a2fea39318 ("mm: variable length argument support").

Before that commit, we kept our own array of "this is the set of pages
that I will install in the new VM". That commit basically just inserts
the pages directly into the VM instead, getting rid of the array size
limitation.

So at a minimum, I think that FOLL_FORCE would need to be removed
before any renaming to FOLL_PTRACE, because that's not some kind of
small random case.

It *might* be as simple as just removing it, but maybe there's some
reason for having it that I don't immediately see.

There _are_ also small random cases too, like get_cmdline(). Maybe
that counts as ptrace, but the execve() case most definitely does not.

Linus


Re: [PATCH RFC 00/19] mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning)

2022-11-07 Thread Linus Torvalds
On Mon, Nov 7, 2022 at 8:18 AM David Hildenbrand  wrote:
>
> So instead, make R/O long-term pinning work as expected, by breaking COW
> in a COW mapping early, such that we can remove any FOLL_FORCE usage from
> drivers.

Nothing makes me unhappy from a quick scan through these patches.

And I'd really love to just have this long saga ended, and FOLL_FORCE
finally relegated to purely ptrace accesses.

So an enthusiastic Ack from me.

   Linus


Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers

2022-11-05 Thread Linus Torvalds
On Sat, Nov 5, 2022 at 2:03 PM Jason A. Donenfeld  wrote:
>
> Something that might help here is changing the `...` into
> `... when exists` or into `... when != ptr` or similar.

I actually tried that.

You don't want "when exists", you'd want "when forall", but that seems
to be the default.

And trying "when != ptr->timer" actually does the right thing in that
it gets rid of the case where the timer is modified outside of the
del_timer() case, *but* it also causes odd other changes to the
output.

Look at what it generates for that

   drivers/media/usb/pvrusb2/pvrusb2-hdw.c

file, which finds a lot of triggers with the "when !=  ptr->timer",
but only does one without it.

So I gave up, just because I clearly don't understand the rules.

(Comparing output is also fun because the ordering of the patches is
random, so consecutive runs with the same rule will give different
patches. I assume that it's just because it's done in parallel, but it
doesn't help the "try to see what changes when you change the script"
;)

 Linus


Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers

2022-11-05 Thread Linus Torvalds
On Sat, Nov 5, 2022 at 11:04 AM Steven Rostedt  wrote:
>
> Here's the changes I made after running the script

Please. No.

What part of "I don't want extra crud" was I unclear on?

I'm not interested in converting everything. That's clearly a 6.,2
issue, possibly even longer considering how complicated the networking
side has been.

I'm not AT ALL interested in "oh, I then added my own small cleanups
on top to random files because I happened to notice them".

Repeat after me: "If the script didn't catch them, they weren't
trivially obvious".

And it does seem that right now the script itself is a bit too
generous, which is why it didn't notice that sometimes there wasn't a
kfree after all because of a goto around it. So clearly that "..."
doesn't really work, I think it accepts "_any_ path leads to the
second situation" rather than "_all_ paths lead to the second
situation".

But yeah, my coccinelle-foo is very weak too, and maybe there's no
pattern for "no flow control".

I would also like the coccinelle script to notice the "timer is used
afterwards", so that it does *not* modify that case that does

del_timer(>timer);
dch->timer.function = NULL;

since now the timer is modified in between the del_timer() and the kfree.

Again, that timer modification is then made pointless by changing the
del_timer() to a "timer_shutdown()", but at that point it is no longer
a "so obvious non-semantic change that it should be scripted". At that
point it's a manual thing.

So I think the "..." in your script should be "no flow control, and no
access to the timer", but do not know how to do that in coccinelle.

Julia?

And this thread has way too many participants, I suspect some email
systems will just mark it as spam as a result. Which is partly *why* I
would like to get rid of noisy changes that really don't matter - but
I would like it to be truly mindlessly obvious that there are *zero*
questions about it, and absolutely no manual intervention because the
patch is so strict that it's just unquestionably correct.

  Linus


Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers

2022-11-05 Thread Linus Torvalds
On Fri, Nov 4, 2022 at 11:01 PM Steven Rostedt  wrote:
>
> Patch 1 fixes an issue with sunrpc/xprt where it incorrectly uses
> del_singleshot_timer_sync() for something that is not a oneshot timer. As this
> will be converted to shutdown, this needs to be fixed first.

So this is the kind of thing that I would *not* want to get eartly.

I really would want to get just the infrastructure in to let people
start doing conversions.

And then the "mindlessly obvious patches that are done by scripting
and can not possibly matter".

The kinds that do not *need* review, because they are mechanical, and
that just cause pointless noise for the rest of the patches that *do*
want review.

Not this kind of thing that is so subtle that you have to explain it.
That's not a "scripted patch for no semantic change".

So leave the del_singleshot_timer_sync() cases alone, they are
irrelevant for the new infrastructure and for the "mindless scripted
conversion" patches.

> Patches 2-4 changes existing timer_shutdown() functions used locally in ARM 
> and
> some drivers to better namespace names.

Ok, these are relevant.

> Patch 5 implements the new timer_shutdown() and timer_shutdown_sync() 
> functions
> that disable re-arming the timer after they are called.

This is obviously what I'd want early so that people can start doign
this in their trees.

> Patches 6-28 change all the locations where there's a kfree(), kfree_rcu(),
> kmem_cache_free() and one call_rcu() call where the RCU function frees the
> timer (the workqueue patch) in the same function as the del_timer{,_sync}() is
> called on that timer, and there's no extra exit path between the del_timer and
> freeing of the timer.

So honestly, I was literally hoping for a "this is the coccinelle
script" kind of patch.

Now there seems to be a number of patches here that are actualyl
really hard to see that they are "obviously correct" and I can't tell
if they are actually scripted or not.

They don't *look* scripted, but I can't really tell.  I looked at the
patches with ten lines of context, and I didn't see the immediately
following kfree() even in that expanded patch context, so it's fairly
far away.

Others in the series were *definitely* not scripted, doing clearly
manual cleanups:

-if (dch->timer.function) {
-del_timer(>timer);
-dch->timer.function = NULL;
-}
+timer_shutdown(>timer);

so no, this does *not* make me feel "ok, this is all trivial".

IOW, I'd really want *just* the infrastructure and *just* the provably
trivial stuff. If it wasn't some scripted really obvious thing that
cannot possibly change anything and that wasn't then edited manually
for some reason, I really don't want it early.

IOW, any early conversions I'd take are literally about removing pure
mindless noise. Not about doing conversions.

And I wouldn't mind it as a single conversion patch that has the
coccinelle script as the explanation.

Really just THAT kind of "100% mindless conversion".

   Linus


Re: [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Linus Torvalds
On Fri, Nov 4, 2022 at 12:42 PM Steven Rostedt  wrote:
>
> Linus, should I also add any patches that has already been acked by the
> respective maintainer?

No, I'd prefer to keep only the ones that are 100% unambiguously not
changing any semantics.

  Linus


Re: [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Linus Torvalds
On Thu, Nov 3, 2022 at 10:48 PM Steven Rostedt  wrote:
>
> Ideally, I would have the first patch go into this rc cycle, which is mostly
> non functional as it will allow the other patches to come in via the 
> respective
> subsystems in the next merge window.

Ack.

I also wonder if we could do the completely trivially correct
conversions immediately.

I'm talking about the scripted ones where it's currently a
"del_timer_sync()", and the very next action is freeing whatever data
structure the timer is in (possibly with something like free_irq() in
between - my point is that there's an unconditional free that is very
clear and unambiguous), so that there is absolutely no question about
whether they should use "timer_shutdown_sync()" or not.

IOW, things like patches 03, 17 and 31, and at least parts others in
this series.

This series clearly has several much more complex cases that need
actual real code review, and I think it would help to have the
completely unambiguous cases out of the way, just to get rid of noise.

So I'd take that first patch, and a scripted set of "this cannot
change any semantics" patches early.

Linus


Re: [PATCH] drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n

2022-10-14 Thread Linus Torvalds
On Fri, Oct 14, 2022 at 8:22 AM Nathan Chancellor  wrote:
>
> After commit 8799c0be89eb ("drm/amd/display: Fix vblank refcount in vrr
> transition"), a build with CONFIG_DEBUG_FS=n is broken due to a
> misplaced brace, along the lines of:

Thanks, applied.

  Linus


Re: [git pull] drm fixes for 6.1-rc1

2022-10-13 Thread Linus Torvalds
On Thu, Oct 13, 2022 at 5:29 PM Dave Airlie  wrote:
>
> Round of fixes for the merge window stuff, bunch of amdgpu and i915
> changes, this should have the gcc11 warning fix, amongst other
> changes.

Some of those amd changes aren't "fixes". They are some major code changes.

We're still in the merge window, so I'm letting it slide, but calling
then "fixes" really stretches things. They are fixes exactly the same
way completely new development can "fix" things.

  Linus


Re: mainline build failure due to 5d8c3e836fc2 ("drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()")

2022-10-06 Thread Linus Torvalds
On Thu, Oct 6, 2022 at 1:50 PM Sudip Mukherjee
 wrote:
>
> > And it looks like Sudip's proposed fix for this particular code is
> > additionally fixing unsigned vs signed as well. I think -Warray-bounds
> > did its job (though, with quite a confusing index range in the report).
>
> Not my. Linus's. I just tested. :)

I suspect Kees meant Stephen's other patch that Hamza pointed at, and
that is perhaps the cleaner version.

That said, I hate how this forces us to write random code changes just
to make a compiler just randomly _happen_ to not complain about it.

   Linus


Re: [git pull] drm for 6.1-rc1

2022-10-06 Thread Linus Torvalds
On Thu, Oct 6, 2022 at 1:25 PM Dave Airlie  wrote:
>
>
> [ 1234.778760] BUG: kernel NULL pointer dereference, address: 0088
> [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]

As far as I can tell, that's the line

struct drm_gpu_scheduler *sched = s_fence->sched;

where 's_fence' is NULL. The code is

   0: 0f 1f 44 00 00nopl   0x0(%rax,%rax,1)
   5: 41 54push   %r12
   7: 55push   %rbp
   8: 53push   %rbx
   9: 48 89 fb  mov%rdi,%rbx
   c:* 48 8b af 88 00 00 00 mov0x88(%rdi),%rbp <-- trapping instruction
  13: f0 ff 8d f0 00 00 00 lock decl 0xf0(%rbp)
  1a: 48 8b 85 80 01 00 00 mov0x180(%rbp),%rax

and that next 'lock decl' instruction would have been the

atomic_dec(>hw_rq_count);

at the top of drm_sched_job_done().

Now, as to *why* you'd have a NULL s_fence, it would seem that
drm_sched_job_cleanup() was called with an active job. Looking at that
code, it does

if (kref_read(>s_fence->finished.refcount)) {
/* drm_sched_job_arm() has been called */
dma_fence_put(>s_fence->finished);
...

but then it does

job->s_fence = NULL;

anyway, despite the job still being active. The logic of that kind of
"fake refcount" escapes me. The above looks fundamentally racy, not to
say pointless and wrong (a refcount is a _count_, not a flag, so there
could be multiple references to it, what says that you can just
decrement one of them and say "I'm done").

Now, _why_ any of that happens, I have no idea. I'm just looking at
the immediate "that pointer is NULL" thing, and reacting to what looks
like a completely bogus refcount pattern.

But that odd refcount pattern isn't new, so it's presumably some user
on the amd gpu side that changed.

The problem hasn't happened again for me, but that's not saying a lot,
since it was very random to begin with.

 Linus


Re: [git pull] drm for 6.1-rc1

2022-10-06 Thread Linus Torvalds
On Thu, Oct 6, 2022 at 12:28 PM Alex Deucher  wrote:
>
> Maybe you are seeing this which is an issue with GPU TLB flushes which
> is kind of sporadic:
> https://gitlab.freedesktop.org/drm/amd/-/issues/2113

Well, that seems to be 5.19, and while timing changes (or whatever
other software updates) could have made it start trigger, this machine
has been pretty solid otgerwise.

> Are you seeing any GPU page faults in your kernel log?

Nothing even remotely like that "no-retry page fault" in that issue
report. Of course, if it happens just before the whole thing locks
up...

   Linus


Re: [git pull] drm for 6.1-rc1

2022-10-06 Thread Linus Torvalds
On Thu, Oct 6, 2022 at 12:30 PM Dave Airlie  wrote:
>
> netconsole?

I've never been really successful with that in the past, and haven't
used it for decades. I guess I could try if nothing else works.

   Linus


Re: mainline build failure due to 5d8c3e836fc2 ("drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()")

2022-10-06 Thread Linus Torvalds
On Thu, Oct 6, 2022 at 1:51 AM Sudip Mukherjee (Codethink)
 wrote:
>
> This is only seen with gcc-11, gcc-12 builds are ok.

Hmm. This seems to be some odd gcc issue.

I *think* that what is going on is that the test

j = 0 ; j < MAX_DWB_PIPES

makes gcc decide that "hey, j is in the range [0,MAX_DWB_PIPES[", and
then since MAX_DWB_PIPES is 1, that simplifies to "j must be zero".
Good range analysis so far.

But for 'i', we start off with that lower bound of 0, but the upper
bound is not fixed (the test for "i" is: "i < stream->num_wb_info").

And then that "if (i != j)", so now gcc decides that it can simplify
that to "if (i != 0)", and then simplifies *that* to "oh, the lower
bound of 'i' in that code is actually 1.

So then it decides that "stream->writeback_info[i]" must be out of bounds.

Of course, the *reality* is that stream->num_wb_info should be <=
MAX_DWB_PIPES, and as such (with the current MAX_DWB_PIPES value of 1)
it's not that 'i' can be 1, it's that the code in question cannot be
reached at all.

What confuses me is that error message ("array subscript [0, 0] is
outside array bounds of 'struct dc_writeback_info[1]') which seems to
be aware that the value is actually 0.

So this seems to be some gcc-11 range analysis bug, but I don't know
what the fix is. I suspect some random code change would magically
just make gcc realize it's ok after all, but since it all depends on
random gcc confusion, I don't know what the random code change would
be.

The fix *MAY* be to just add a '&& i < MAX_DWB_PIPES' to that loop
too, and then gcc will see that both i and j are always 0, and that
the code is unreachable and not warn about it. Hmm? Can you test that?

And the reason gcc-12 builds are ok probably isn't that gcc-12 gets
this right, it's simply that gcc-12 gets so many *opther* things wrong
that we already disabled -Warray-bounds with gcc-12 entirely.

If somebody cannot come up with a fix, I suspect the solution is "gcc
array bounds analysis is terminally buggy" and we just need to disable
it for gcc-11 too.

Kees, any idea? Who else might be interested in fixing a -Warray-bounds issue?

 Linus


Re: [git pull] drm for 6.1-rc1

2022-10-06 Thread Linus Torvalds
On Tue, Oct 4, 2022 at 8:42 PM Dave Airlie  wrote:
>
> Lots of stuff all over, some new AMD IP support and gang
> submit support [..]

Hmm.

I have now had my main desktop lock up twice after pulling this.
Nothing in the dmesg after a reboot, and nothing in particular that
seems to trigger it, so I have a hard time even guessing what's up,
but the drm changes are the primary suspect.

I will try to see if I can get any information out of the machine, but
with the symptom being just a dead machine ...

This is the same (old) Radeon device:

   49:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)

with dual 4k monitors, running on my good old Threadripper setup.

Again, there's no explicit reason to blame the drm pull, except that
it started after that merge (that machine ran the kernel with the
networking pull for a day with no problems, and while there were other
pull requests in between them, they seem to be fairly unrelated to the
hardware I have).

But the lockup is so sporadic (twice in the last day) that I really
can't bisect it, so I'm afraid I have very very little info.

Any suggestions?

  Linus


Re: [git pull] drm for 6.1-rc1

2022-10-05 Thread Linus Torvalds
On Tue, Oct 4, 2022 at 8:42 PM Dave Airlie  wrote:
>
> This is very conflict heavy, mostly the correct answer is picking
> the version from drm-next.

Ugh, yes, that was a bit annoying.

I get the same end result as you did, but I do wonder if the drm
people should try to keep some kind of separate "fixes" branches for
things that go both into the development tree and then get sent to me
for fixes pulls?

Hopefully this "lots of pointless noise" was a one-off, but it might
be due to how you guys work..

  Linus


Re: [PATCH v13 5/9] drm/i915: Check for integer truncation on scatterlist creation

2022-09-28 Thread Linus Torvalds
On Wed, Sep 28, 2022 at 1:15 AM Gwan-gyeong Mun
 wrote:
>
> +   if (check_assign(obj->base.size >> PAGE_SHIFT, ))
> +   return -E2BIG;

I have to say, I find that new "check_assign()" macro use to be disgusting.

It's one thing to check for overflows.

It's another thing entirely to just assign something to a local variable.

This disgusting "let's check and assign" needs to die. It makes the
code a completely unreadable mess. The "user" wersion is even worse.

If you worry about overflow, then use a mix of

 (a) use a sufficiently large type to begin with

 (b) check for value range separately

and in this particular case, I also suspect that the whole range check
should have been somewhere else entirely - at the original creation of
that "obj" structure, not at one random end-point where it is used.

In other words, THIS WHOLE PATCH is just end-points checking the size
requirements of that "base.size" thing much too late, when it should
have been checked originally for some "maximum acceptable base size"
instead.

And that "maximum acceptable base size" should *not* be about "this is
the size of the variables we use". It should be a sanity check of
"this value is sane and fits in sane use cases".

Because "let's plug security checks" is most definitely not about
picking random assignments and saying "let's check this one". It's
about trying to catch things earlier than that.

Kees, you need to reign in the craziness in overflow.h.

 Linus


Re: mainline build failure for x86_64 allmodconfig with clang

2022-08-07 Thread Linus Torvalds
On Sun, Aug 7, 2022 at 10:36 AM David Laight  wrote:
>
> Or just shoot the software engineer who thinks 100 arguments
> is sane. :-)

I suspect the issue is that it's not primarily a software engineer who
wrote that code.

Hardware people writing code are about as scary as software engineers
with a soldering iron.

   Linus


Re: mainline build failure for x86_64 allmodconfig with clang

2022-08-04 Thread Linus Torvalds
On Thu, Aug 4, 2022 at 1:43 PM Nathan Chancellor  wrote:
>
> I do note that commit 1b54a0121dba ("drm/amd/display: Reduce stack size
> in the mode support function") did have a workaround for GCC. It appears
> clang will still inline mode_support_configuration(). If I mark it as
> 'noinline', the warning disappears in that file.

That sounds like probably the best option for now. Gcc does not inline
that function (at least for allmodconfig builds in my testing), so if
that makes clang match what gcc does, it seems a reasonable thing to
do.

Linus


Re: mainline build failure for x86_64 allmodconfig with clang

2022-08-04 Thread Linus Torvalds
On Thu, Aug 4, 2022 at 11:37 AM Sudip Mukherjee (Codethink)
 wrote:
>
> git bisect points to 3876a8b5e241 ("drm/amd/display: Enable building new 
> display engine with KCOV enabled").

Ahh. So that was presumably why it was disabled before - because it
presumably does disgusting things that make KCOV generate even bigger
stack frames than it already has.

Those functions do seem to have fairly big stack footprints already (I
didn't try to look into why, I assume it's partly due to aggressive
inlining, and probably some automatic structures on stack). But gcc
doesn't seem to make it all that much worse with KCOV (and my clang
build doesn't enable KCOV).

So it's presumably some KCOV-vs-clang thing. Nathan?

  Linus


Re: [PATCH] drm/amd/display: restore plane with no modifiers code.

2022-08-04 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 10:50 PM Dave Airlie  wrote:
>
> With this applied, I get gdm back.

I can confirm that it makes thing work again for me too. Applied.

 Linus


Re: mainline build failure due to 6fdd2077ec03 ("drm/amd/amdgpu: add memory training support for PSP_V13")

2022-08-04 Thread Linus Torvalds
On Thu, Aug 4, 2022 at 12:35 AM Sudip Mukherjee (Codethink)
 wrote:
>
> I will be happy to test any patch or provide any extra log if needed.

It sounds like that file just needs to get a

#include 

there, and for some reason architectures other than alpha and mips end
up getting it accidentally through other headers.

Mind testing just adding that header file, and perhaps even sending a
patch if (when) that works for you?

Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 9:27 PM Linus Torvalds
 wrote:
>
> I'll do a few more. It's close enough already that it should be just
> four more reboots to pinpoint exactly which commit breaks.

commit 5d945cbcd4b16a29d6470a80dfb19738f9a4319f is the first bad commit.

I think it's supposed to make no semantic changes, but it clearly does.

What a pain to figure out what's wrong in there, and I assume it
doesn't revert cleanly either.

Bringing in the guilty parties. See

  
https://lore.kernel.org/all/CAHk-=wj+yzaunxiewhfcrkbdlsqkizdr1q3yjlaqpo6avq2...@mail.gmail.com/

for the beginning of this thread.

Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 9:24 PM Dave Airlie  wrote:
>
> I've reproduced it, I'll send you a revert pile when I confirm it is
> the buddy allocator.

I've bisected it to 86bd6706c404..074293dd9f61 and don't see "buddy"
in any of those commits.

I'll do a few more. It's close enough already that it should be just
four more reboots to pinpoint exactly which commit breaks.

  Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 8:53 PM Dave Airlie  wrote:
>
> > It works on my intel laptop, so it's amdgpu somewhere.
>
> I'll spin my ryzen up to see if I can reproduce, and test against the
> drm-next pre-merge tree as well.

So it's not my merge - I've had a bad result in the middle of the DRM
history too.

On a positive note, my arm64 machine works fine, but that's just using
fbdev so ...

But another datapoint to say that it's amdgpu-specific. Not that that
was really in doubt.

Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 8:37 PM Dave Airlie  wrote:
>
> Actually I did miss that so that looks good.

.. I wish it did, but I just actually test-booted my desktop with the
result, and it crashes the X server.  This seems to be the splat in
Xorg.0.log:

  (II) Initializing extension DRI2
  (II) AMDGPU(0): Setting screen physical size to 2032 x 571
  (EE)
  (EE) Backtrace:
  (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x13d) [0x55b1dc61258d]
  (EE) 1: /lib64/libc.so.6 (__sigaction+0x50) [0x7f7972a3ea70]
  (EE) 2: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
(AMDGPUCreateWindow_oneshot+0x101) [0x7f797207ddd1]
  (EE) 3: /usr/libexec/Xorg (compIsAlternateVisual+0xdc4) [0x55b1dc545fa4]
  (EE) 4: /usr/libexec/Xorg (InitRootWindow+0x17) [0x55b1dc4e0047]
  (EE) 5: /usr/libexec/Xorg (miPutImage+0xd4c) [0x55b1dc49e60b]
  (EE) 6: /lib64/libc.so.6 (__libc_start_call_main+0x80) [0x7f7972a29550]
  (EE) 7: /lib64/libc.so.6 (__libc_start_main+0x89) [0x7f7972a29609]
  (EE) 8: /usr/libexec/Xorg (_start+0x25) [0x55b1dc49f2c5]
  (EE)
  (EE) Segmentation fault at address 0x4
  (EE)
Fatal server error:
  (EE) Caught signal 11 (Segmentation fault). Server aborting

so something is going horribly wrong. No kernel oops, though.

It works on my intel laptop, so it's amdgpu somewhere.

I guess I will start bisecting. Oy vey.

 Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 7:46 PM Linus Torvalds
 wrote:
>
> I think I have it resolved, am still doing a full build test, and will
> then compare against what your suggested merge is.

Hmm.

I end up with *almost* the same thing.

Except I ended up with a

select DRM_BUDDY

for the DRM_AMDGPU config entry, and you don't have that.

I *think* my version is correct, in that clearly the amdgpu driver now
uses that buddy logic (just doing a random "grep drm_buddy_block" to
see).

But this was messy enough to resolve that I think people should
double-check my end, and maybe I just got confused at some point in
the process.

And while I seem to have gotten the same result as you did on the i915
firmware side too, again, I'd like people to re-verify.

   Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Tue, Aug 2, 2022 at 10:38 PM Dave Airlie  wrote:
>
> This is a conflicty one. The late revert in 5.19 of the amdgpu buddy
> allocator causes major conflict fallout. The buddy allocator code in
> this one works, so the resolutions are usually just to take stuff from
> this. It might actually be cleaner if you revert
> 925b6e59138cefa47275c67891c65d48d3266d57 (Revert "drm/amdgpu: add drm
> buddy support to amdgpu") first in your tree then merge this.

Ugh, what a pain. The other conflicts are also due to just randomly
duplicated commits, with *usually* your drm tree having the superset
(so "just take yours" is the easy resolution), but not always (ie the
Intel firmware-69 mess was apparently not dealt with in the
development tree).

Nasty.

I think I have it resolved, am still doing a full build test, and will
then compare against what your suggested merge is.

  Linus


Re: [git pull] drm fixes for 5.19-rc7

2022-07-16 Thread Linus Torvalds
On Sat, Jul 16, 2022 at 2:35 PM Linus Torvalds
 wrote:
>
> That said, even those type simplifications do not fix the fundamental
> issue. That "DIV_ROUND_UP()" still ends up being a 64-bit divide,
> although now it's at least a "64-by-32" bit divide.

Hmm. The "DIV_ROUND_UP()" issue could be solved by just making the
rule be that the max_segment size is always a power of two.

Then you don't need the (expensive!) DIV_ROUND_UP(), and can just use
the regular "round_up()" that works on powers-of-two.

And the simplest way to do that is to just make "max_segments" be 2GB.

The whole "round_down(UINT_MAX, page_alignment)" seems entirely
pointless. Do you really want segments that are some odd number just
under the 4GB mark, and force expensive divides?

For consistency, I used the same value in
i915_rsgt_from_buddy_resource(). I have no idea if that makes sense.

Anyway, the attached patch is COMPLETELY UNTESTED. But it at least
seems to compile. Maybe.

  Linus
 drivers/gpu/drm/i915/i915_scatterlist.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c b/drivers/gpu/drm/i915/i915_scatterlist.c
index f63b50b71e10..830dcd833d4e 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -81,7 +81,9 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
 	  u64 region_start,
 	  u64 page_alignment)
 {
-	const u64 max_segment = round_down(UINT_MAX, page_alignment);
+	// Make sure max_segment (and thus segment_pages) is
+	// a power of two that fits in 32 bits.
+	const u64 max_segment = 1ul << 31;
 	u64 segment_pages = max_segment >> PAGE_SHIFT;
 	u64 block_size, offset, prev_end;
 	struct i915_refct_sgt *rsgt;
@@ -96,7 +98,7 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
 
 	i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT);
 	st = >table;
-	if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages),
+	if (sg_alloc_table(st, round_up(node->size, segment_pages),
 			   GFP_KERNEL)) {
 		i915_refct_sgt_put(rsgt);
 		return ERR_PTR(-ENOMEM);
@@ -159,7 +161,7 @@ struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
 {
 	struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
 	const u64 size = res->num_pages << PAGE_SHIFT;
-	const u64 max_segment = round_down(UINT_MAX, page_alignment);
+	const u64 max_segment = 1u << 31;
 	struct drm_buddy *mm = bman_res->mm;
 	struct list_head *blocks = _res->blocks;
 	struct drm_buddy_block *block;


Re: [git pull] drm fixes for 5.19-rc7

2022-07-16 Thread Linus Torvalds
On Fri, Jul 15, 2022 at 2:09 PM Nathan Chancellor  wrote:
>
> On Fri, Jul 15, 2022 at 01:36:17PM +1000, Dave Airlie wrote:
> > Matthew Auld (1):
> >   drm/i915/ttm: fix sg_table construction
>
> This patch breaks i386_defconfig with both GCC and clang:
>
>   ld: drivers/gpu/drm/i915/i915_scatterlist.o: in function 
> `i915_rsgt_from_mm_node':
>   i915_scatterlist.c:(.text+0x1a7): undefined reference to `__udivdi3'

Yeah, we definitely don't want arbitrary 64x64 divides in the kernel,
and the fact that we don't include libgcc.a once again caught this
horrid issue.

The offending code is

if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages),
   GFP_KERNEL)) {

and I have to say that all of those games with "u64 page_alignment"
that commit aff1e0b09b54 ("drm/i915/ttm: fix sg_table construction")
introduces are absolutely disgusting.

And they are just *pointlessly* disgusting.

Why is that "page_alignment" a "u64"?

And why is it a "size", instead of being a "number of bits"?

The code literally does things like

const u64 max_segment = round_down(UINT_MAX, page_alignment);

which means that

 (a) page_alignment must be a power-of-two for this to work
(round_down() only works in powers of two)

 (b) the result obviously must fit in an "unsigned int", since it's
rounding down a UINT_MAX!

So (a) makes it doubtful that "page_alignment" should have been a
value (as opposed to mask), and (b) then questions why was that made
an "u64" value when it cannot have a u64 range?

And if max_segments cannot have a 64-bit range, then segment_pages here:

u64 segment_pages = max_segment >> PAGE_SHIFT;

sure cannot.

Fixing those then uncovers other things:

len = min(block_size, max_segment - sg->length);

now complains about mixing types ("max_segment - sg->length" being
u32), because 'block_size' is 64, bit, and that does seem to make some
amount of sense:

block_size = node->size << PAGE_SHIFT;

with the 'node->size' being from drm_mm_node, and that size is a
'u64'. That I *could* see being more than 32 bits on a 64-bit
architecture. Ok.

But then that means that 'len' cannot be a 64-bit value either, and it
should probably have been

u32 len;
..
len = min_t(u64, block_size, max_segment - sg->length);

and that would just have been a lot nicer on 32-bit x86, avoiding a
lot of pointlessly 64-bit things.

That said, even those type simplifications do not fix the fundamental
issue. That "DIV_ROUND_UP()" still ends up being a 64-bit divide,
although now it's at least a "64-by-32" bit divide.

Which needs to be handled by "do_div()" rather than the generic
DIV_ROUND_UP() helper, because sadly, at least gcc still generates a
full __udivdi3() even for the 64-by-32 divides.

Can Intel GPU people please take a look?

 Linus


  1   2   3   4   5   6   7   8   >