[Mesa-dev] [ANNOUNCE] mesa 20.0.2

2020-03-18 Thread Dylan Baker
Hi list,

I'd like to announce the availability of mesa 20.0.2. We've had a fairly busy
cycle (outside of the release management metadata), and there's fixes all over
the code base, but for the most part it was all no-fuss sort of fixes.

Just a reminder that Eric Engestrom will be handling the release of 20.0.3.

Dylan


Git Short Log
=

Andreas Baierl (1):
  gitlab-ci: Add add a set of lima flakes

Bas Nieuwenhuizen (2):
  amd/llvm: Fix divergent descriptor indexing. (v3)
  amd/llvm: Fix divergent descriptor regressions with radeonsi.

Danylo Piliaiev (2):
  glsl: do not crash if string literal is used outside of #include/#line
  st/mesa: Fix signed integer overflow when using util_throttle_memory_usage

Dave Airlie (1):
  gallium: fix build with latest meson and gcc10

Dylan Baker (11):
  docs: Add sha256sums for 20.0.1
  .pick_status.json: Update to 07f1ef5656e0721282d01a8421eaca056348137d
  .pick_status.json: Update to 70341d7746c177a4cd7377ef633e9f85afd11d54
  .pick_status.json: Update to 625d8705f02e211e2733c3fe12845505725c37d4
  .pick_status.json: Mark b83c9aca4a5fd02d920c90c1799137fed52dc1d9 as 
backported
  .pick_status.json: Update to ee9e0d1ecae307fa48200d2604d3114070253299
  .pick_status.json: Update to 3dd0d12aa5fefa94123269a541c94cdf57599e34
  .pick_status.json: Update to 94e37859a96cc56cf0c5418a5af00a3e9f5a1bf5
  Docs: Add release notes for 20.0.2
  VERSION: bump for 20.0.2 release
  docs/relnotes: Add sha256 sums for 20.0.2

Eric Anholt (1):
  glsl/tests: Fix waiting for disk_cache_put() to finish.

Eric Engestrom (7):
  bin/gen_release_notes.py: fix commit list command
  .pick_status.json: Update to 24db276d11976905b2e8a44965c684bb48c3d49f
  gen_release_notes: fix vulkan version reported
  docs/relnotes/20.0: fix vulkan version reported
  .pick_status.json: Update to ba03e308b66b0b88f60b99d9d47851a5e1522e6e
  vulkan/wsi: fix cleanup when dup() fails
  gen_release_notes: fix version in "you should wait" message

Francisco Jerez (1):
  intel/fs: Fix workaround for VxH indirect addressing bug under control 
flow.

Jason Ekstrand (9):
  isl: Set 3DSTATE_DEPTH_BUFFER::Depth correctly for 3D surfaces
  iris: Don't skip fast depth clears if the color changed
  anv: Parse VkPhysicalDeviceFeatures2 in CreateDevice
  vulkan/wsi: Don't leak the FD when GetImageDrmFormatModifierProperties 
fails
  vulkan/wsi: Return an error if dup() fails
  anv: Use the PIPE_CONTROL instead of bits for the CS stall W/A
  anv: Use a proper end-of-pipe sync instead of just CS stall
  anv: Do end-of-pipe sync around MCS/CCS ops instead of CS stall
  anv: Do an end-of-pipe sync before updating AUX table entries

José Fonseca (1):
  meson: Avoid duplicate symbols.

Kristian Høgsberg (2):
  Revert "glsl: Use a simpler formula for tanh"
  Revert "spirv: Use a simpler and more correct implementaiton of tanh()"

Marek Olšák (4):
  Revert "mesa: check for z=0 in _mesa_Vertex3dv()"
  radeonsi: add a bug workaround for NGG - LATE_ALLOC_GS
  ac: add a bug workaround for the 100% NGG culling case
  gallium/cso_context: remove cso_delete_xxx_shader helpers to fix the live 
cache

Martin Fuzzey (3):
  freedreno: android: fix build failure on android due to python version
  freedreno: android: add a6xx-pack.xml.h generation to android build
  freedreno: android: fix build of perfcounters.

Michel Dänzer (1):
  llvmpipe: Use uintptr_t for pointer values

Rafael Antognolli (3):
  anv: Wait for the GPU to be idle before invalidating the aux table.
  iris: Split aux map initialization from invalidation.
  iris: Wait for the GPU to be idle before invalidating the aux table.

Rob Clark (1):
  freedreno: fix FD_MESA_DEBUG=inorder

Samuel Pitoiset (5):
  aco: fix image load/store with lod and 1D images
  nir/lower_input_attachments: remove bogus assert in 
try_lower_input_texop()
  ac/llvm: add missing optimization barrier for 64-bit readlanes
  radv: only inject implicit subpass dependencies if necessary
  radv: fix random depth range unrestricted failures due to a cache issue

Timur Kristóf (2):
  nir: Add ability to lower non-const quad broadcasts to const ones.
  radv: Enable lowering dynamic quad broadcasts.

Vinson Lee (1):
  st/nine: Fix incompatible-pointer-types-discards-qualifiers errors.



git tag: mesa-20.0.2

https://mesa.freedesktop.org/archive/mesa-20.0.2.tar.xz
SHA256: aa54f1cb669550606aab8ceb475105d15aeb814fca5a778ce70d0fd10e98e86f  
mesa-20.0.2.tar.xz
SHA512: 
d6ffc29bbc5b908cb0f08fa1b5a83e029b76c7b697c488a73e6bb60990a55beeb3ecdba1745868f6885ee2f660975f5debf7d2c9418e0a96e2f7049e83fd89ab
  mesa-20.0.2.tar.xz
PGP:  https://mesa.freedesktop.org/archive/mesa-20.0.2.tar.xz.sig

signature.asc
Description: signature
___
mesa-dev mailing list

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Nicolas Dufresne
Le mercredi 18 mars 2020 à 11:05 +0100, Michel Dänzer a écrit :
> On 2020-03-17 6:21 p.m., Lucas Stach wrote:
> > That's one of the issues with implicit sync that explicit may solve: 
> > a single client taking way too much time to render something can 
> > block the whole pipeline up until the display flip. With explicit 
> > sync the compositor can just decide to use the last client buffer if 
> > the latest buffer isn't ready by some deadline.
> 
> FWIW, the compositor can do this with implicit sync as well, by polling
> a dma-buf fd for the buffer. (Currently, it has to poll for writable,
> because waiting for the exclusive fence only isn't enough with amdgpu)

That is very interesting, thanks for sharing, could allow fixing some
issues in userspace for backward compatibility.

thanks,
Nicolas

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Michel Dänzer
On 2020-03-17 6:21 p.m., Lucas Stach wrote:
> That's one of the issues with implicit sync that explicit may solve: 
> a single client taking way too much time to render something can 
> block the whole pipeline up until the display flip. With explicit 
> sync the compositor can just decide to use the last client buffer if 
> the latest buffer isn't ready by some deadline.

FWIW, the compositor can do this with implicit sync as well, by polling
a dma-buf fd for the buffer. (Currently, it has to poll for writable,
because waiting for the exclusive fence only isn't enough with amdgpu)


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Tomek Bury
> GL and GLES are not relevant. What is relevant is EGL, which defines
> interfaces to make things work on the native platform.
Yes and no. This is what EGL spec says about sharing a texture between contexts:

"OpenGL and OpenGL ES makes no attempt to synchronize access to
texture objects. If a texture object is bound to more than one
context, then it is up to the programmer to ensure that the contents
of the object are not being changed via one context while another
context is using the texture object for rendering. The results of
changing a texture object while another context is using it are
undefined."

There are similar statements with regards to the lack of
synchronisation guarantees for EGL images or between GL and native
rendering, etc. But the main thing here is that EGL and Vulkan differ
significantly. The eglSwapBuffers() is expected to post an unspecified
"back buffer" to the display system using some internal driver magic.
EGL driver is then expected to obtain another back buffer at some
unspecified point in the future. Vulkan on the other hand is very
specific and explicit. The vkQueuePresentKHR() is expected to post a
specific vkImage with an explicit set of set of semaphores. Another
image is obtained through vkAcquireNextImageKHR() and it's the
application's decision whether it wants a fence, a semaphore, both or
none with the acquired buffer. The implicit synchronisation doesn't
mix well with Vulkan drivers and requires a lot of extra plumbing  in
the WSI code.

> If you are using EGL_WL_bind_wayland_display, then one of the things
> it is explicitly allowed/expected to do is to create a Wayland
> protocol interface between client and compositor, which can be used to
> pass buffer handles and metadata in a platform-specific way. Adding
> synchronisation is also possible.
Only one-way synchronisation is possible with this mechanism. There's
a standard protocol for recycling buffers - wl_buffer_release() so
buffer hand-over from the compositor to client remains unsynchronised
- see below.

> > The most troublesome part was Wayland buffer release mechanism, as it only 
> > involves a CPU signalling over Wayland IPC, without any 3D driver 
> > involvement. The choices were: explicit synchronisation extension or a 
> > buffer copy in the compositor (i.e. compositor textures from the copy, so 
> > the client can re-write the original), or some implicit synchronisation in 
> > kernel space (but that wasn't an option in Broadcom driver).
>
> You can add your own explicit synchronisation extension.
I could but that requires implementing in in the driver and in a
number of compositors, therefore a standard extension
zwp_linux_explicit_synchronization_v1 is much better choice here than
a custom one.

> In every cross-process and cross-subsystem usecase, synchronisation is
> obviously required. The two options for this are to implement kernel
> support for implicit synchronisation (as everyone else has done),
That would require major changes in driver architecture or a 2nd
mechanisms doing the same thing but in kernel space - both are
non-starters.

> or implement generic support for explicit synchronisation (as we have
> been working on with implementations inside Weston and Exosphere at
> least),
The zwp_linux_explicit_synchronization_v1 is a good step forward. I'm
using this extension as a main synchronisation mechanism in EGL and
Vulkan driver whenever available. I remember that Gustavo Padovan was
working on explicit sync support in the display system some time ago.
I hope it got merged into kernel by now, but I don't know to what
extend it's actually being used.

> or implement private support for explicit synchronisation,
If everything else fails, that would be the last resort scenario, but
far from ideal and very costly in terms of implementation and
maintenance as it would require maintaining custom patches for various
3rd party components or littering them with multiple custom explicit
synchronisation schemes.

> or do nothing and then be surprised at the lack of synchronisation.
Thank you, but no, thank you :)

Cheers,
Tomek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Tomek Bury
> vkAcquireNextImageKHR() [...] it's the application's decision whether it 
> wants a fence, a semaphore, both or none
Correction: "or none" is not allowed
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Tomek Bury
Hi Jason,

I've been wrestling with the sync problems in Wayland some time ago, but
only with regards to 3D drivers.

The guarantee given by the GL/GLES spec is limited to a single graphics
context. If the same buffer is accessed by 2 contexts the outcome is
unspecified. The cross-context and cross-process synchronisation is not
guaranteed. It happens to work on Mesa, because the read/write locking is
implemented in the kernel space, but it didn't work on Broadcom driver,
which has read-write interlocks in user space.

 A Vulkan client makes it even worse because of conflicting requirements:
Vulkan's vkQueuePresentKHR() passes in a number of semaphores but disallows
waiting. Wayland WSI requires wl_surface_commit() to be called from
vkQueuePresentKHR() which does require a wait, unless a synchronisation
primitive representing Vulkan samaphores is passed between Vulkan client
and the compositor.

The most troublesome part was Wayland buffer release mechanism, as it only
involves a CPU signalling over Wayland IPC, without any 3D driver
involvement. The choices were: explicit synchronisation extension or a
buffer copy in the compositor (i.e. compositor textures from the copy, so
the client can re-write the original), or some implicit synchronisation in
kernel space (but that wasn't an option in Broadcom driver).

With regards to V4L2, I believe it could easily work the same way as 3D
drivers, i.e. pass a buffer+fence pair to the next stage. The encode always
succeeds, but for capture or decode, the main problem is the uncertain
outcome, I believe? If we're fine with rendering or displaying an
occasional broken frame, then buffer+fence pair would work too. The broken
frame will go into the pipeline, but application can drain the pipeline and
start over once the capture works again.

To answer some points raised by Laurent (although I'm unfamiliar with the
camera drivers):

> you don't know until capture complete in which buffer the frame has
been captured
Surely you do, you only don't know in advance if the capture will be
successful

> but if an error occurs during capture, they can be recycled internally
and put to the back of the queue.
That would have to change in order to use explicit synchronisation. Every
started capture becomes immediately available as a buffer+fence pair. Fence
is signalled once the capture is finished (successfully or otherwise). The
buffer must not be reused until it's released, possibly with another fence
- in that case the buffer must not be reused until the release fence is
signalled.

Cheers,
Tomek

On Mon, 16 Mar 2020 at 10:20, Laurent Pinchart <
laurent.pinch...@ideasonboard.com> wrote:

> On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> > (I know I'm going to be spammed by so many mailing list ...)
> >
> > Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> > > On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand 
> wrote:
> > > > All,
> > > >
> > > > Sorry for casting such a broad net with this one. I'm sure most
> people
> > > > who reply will get at least one mailing list rejection.  However,
> this
> > > > is an issue that affects a LOT of components and that's why it's
> > > > thorny to begin with.  Please pardon the length of this e-mail as
> > > > well; I promise there's a concrete point/proposal at the end.
> > > >
> > > >
> > > > Explicit synchronization is the future of graphics and media.  At
> > > > least, that seems to be the consensus among all the graphics people
> > > > I've talked to.  I had a chat with one of the lead Android graphics
> > > > engineers recently who told me that doing explicit sync from the
> start
> > > > was one of the best engineering decisions Android ever made.  It's
> > > > also the direction being taken by more modern APIs such as Vulkan.
> > > >
> > > >
> > > > ## What are implicit and explicit synchronization?
> > > >
> > > > For those that aren't familiar with this space, GPUs, media encoders,
> > > > etc. are massively parallel and synchronization of some form is
> > > > required to ensure that everything happens in the right order and
> > > > avoid data races.  Implicit synchronization is when bits of work (3D,
> > > > compute, video encode, etc.) are implicitly based on the absolute
> > > > CPU-time order in which API calls occur.  Explicit synchronization is
> > > > when the client (whatever that means in any given context) provides
> > > > the dependency graph explicitly via some sort of synchronization
> > > > primitives.  If you're still confused, consider the following
> > > > examples:
> > > >
> > > > With OpenGL and EGL, almost everything is implicit sync.  Say you
> have
> > > > two OpenGL contexts sharing an image where one writes to it and the
> > > > other textures from it.  The way the OpenGL spec works, the client
> has
> > > > to make the API calls to render to the image before (in CPU time) it
> > > > makes the API calls which texture from the image.  As long as it does
> > > > this 

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Tomek Bury
> That's not true; you can post back a sync token every time the client
> buffer is used by the compositor.
Technically, yes but it's very cumbersome and invasive to the point
where it becomes impractical. Explicit sync is much cleaner solution.

> For instance, Mesa adds the `wl_drm` extension, which is
> used for bidirectional communication between the EGL implementations
> in the client and compositor address spaces, without modifying either.
Broadcom driver adds "wl_nexus" extension which servers similar
purpose for both EGL and Vulkan WSI

> OK. As it stands, everyone else has the kernel mechanism (e.g. via
> dmabuf resv), so in this case if you are reinventing the underlying
> platform in a proprietary stack, you get to solve the same problems
> yourselves.
That's an important point. In the explicit synchronisation scenario
the sync token is passed with the buffer. It becomes irrelevant where
the token originated from, as long as it's a commonly used type of
token, i.e. dma_fence in kernel space or sync_fd in user space. That
allows for greater flexibility and works with and without dma
reservation objects.

Cheers,
Tomek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Tomek Bury
>  As long as we can fall back to not using fences then we should be fine.
Buffers written by the camera are trivial because you control what
happens - just don't attach fence, so that the capture can be used
immediately. For recycled buffers there's an extra bit of work to do
because won't  be up to camera driver to decide whether the buffer
comes back with or without fence.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 11:35 PM Jason Ekstrand  wrote:
>
> On Wed, Mar 18, 2020 at 12:20 AM Jacob Lifshay  
> wrote:
> >
> > The main issue with doing everything immediately is that a lot of the
> > function calls that games expect to take a very short time (e.g.
> > vkQueueSubmit) would instead take a much longer time, potentially
> > causing problems.
>
> Do you have any evidence that it will cause problems?  What I said
> above is what switfshader is doing and they're running real apps and
> I've not heard of it causing any problems.  It's also worth noting
> that you would only really have to stall at sync_file export.  You can
> async as much as you want internally.

Ok, seems worth trying out.

> > One idea for a safe userspace-backed sync_file is to have a step
> > counter that counts down until the sync_file is ready, where if
> > userspace doesn't tell it to count any steps in a certain amount of
> > time, then the sync_file switches to the error state. This way, it
> > will error shortly after a process deadlocks for some reason, while
> > still having the finite-time guarantee.
> >
> > When the sync_file is created, the step counter would be set to the
> > number of jobs that the fence is waiting on.
> >
> > It can also be set to pause the timeout to wait until another
> > sync_file signals, to handle cases where a sync_file is waiting on a
> > userspace process that is waiting on another sync_file.
> >
> > The main issue is that the kernel would have to make sure that the
> > sync_file graph doesn't have loops, maybe by erroring all sync_files
> > that it finds in the loop.
> >
> > Does that sound like a good idea?
>
> Honestly, I don't think you'll ever be able to sell that to the kernel
> community.  All of the deadlock detection would add massive complexity
> to the already non-trivial dma_fence infrastructure and for what
> benefit?  So that a software rasterizer can try to pretend to be more
> like a GPU?  You're going to have some very serious perf numbers
> and/or other proof of necessity if you want to convince the kernel to
> people to accept that level of complexity/risk.  "I designed my
> software to work this way" isn't going to convince anyone of anything
> especially when literally every other software rasterizer I'm aware of
> is immediate and they work just fine.

After some further research, it turns out that it will work to have
all the sync_files that a sync_file needs to depend on specified at
creation, which forces the dependence graph to be a DAG since you
can't depend on a sync_file that isn't yet created, so loops are
impossible by design.

Since kernel deadlock detection isn't actually required, just timeouts
for the case of halted userspace, does this seem feasable?

I'd guess that it'd require maybe 200-300 lines of code in a
self-contained driver similar to the sync_file debugging driver
mentioned previously but with the additional timeout code for safety.

Jacob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jason Ekstrand
On Wed, Mar 18, 2020 at 12:20 AM Jacob Lifshay  wrote:
>
> On Tue, Mar 17, 2020 at 7:08 PM Jason Ekstrand  wrote:
> >
> > On Tue, Mar 17, 2020 at 7:16 PM Jacob Lifshay  
> > wrote:
> > >
> > > On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
> > > >
> > > > Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > > > > I think I found a userspace-accessible way to create sync_files and
> > > > > dma_fences that would fulfill the requirements:
> > > > > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> > > > >
> > > > > I'm just not sure if that's a good interface to use, since it appears
> > > > > to be designed only for debugging. Will have to check for additional
> > > > > requirements of signalling an error when the process that created the
> > > > > fence is killed.
> >
> > It is expressly only for debugging and testing.  Exposing such an API
> > to userspace would break the finite time guarantees that are relied
> > upon to keep sync_file a secure API.
>
> Ok, I was figuring that was probably the case.
>
> > > > Something like that can certainly be lifted for general use if it makes
> > > > sense. But then with a software renderer I don't really see how fences
> > > > help you at all. With a software renderer you know exactly when the
> > > > frame is finished and you can just defer pushing it over to the next
> > > > pipeline element until that time. You won't gain any parallelism by
> > > > using fences as the CPU is busy doing the rendering and will not run
> > > > other stuff concurrently, right?
> > >
> > > There definitely may be other hardware and/or processes that can
> > > process some stuff concurrently with the main application, such as the
> > > compositor and or video encoding processes (for video capture).
> > > Additionally, from what I understand, sync_file is the standard way to
> > > export and import explicit synchronization between processes and
> > > between drivers on Linux, so it seems like a good idea to support it
> > > from an interoperability standpoint even if it turns out that there
> > > aren't any scheduling/timing benefits.
> >
> > There are different ways that one can handle interoperability,
> > however.  One way is to try and make the software rasterizer look as
> > much like a GPU as possible:  lots of threads to make things as
> > asynchronous as possible, "real" implementations of semaphores and
> > fences, etc.
>
> This is basically the route I've picked, though rather than making
> lots of native threads, I'm planning on having just one thread per
> core and have a work-stealing scheduler (inspired by Rust's rayon
> crate) schedule all the individual render/compute jobs, because that
> allows making a lot more jobs to allow finer load balancing.
>
> > Another is to let a SW rasterizer be a SW rasterizer: do
> > everything immediately, thread only so you can exercise all the CPU
> > cores, and minimally implement semaphores and fences well enough to
> > maintain compatibility.  If you take the first approach, then we have
> > to solve all these problems with letting userspace create unsignaled
> > sync_files which it will signal later and figure out how to make it
> > safe.  If you take the second approach, you'll only ever have to
> > return already signaled sync_files and there's no problem with the
> > sync_file finite time guarantees.
>
> The main issue with doing everything immediately is that a lot of the
> function calls that games expect to take a very short time (e.g.
> vkQueueSubmit) would instead take a much longer time, potentially
> causing problems.

Do you have any evidence that it will cause problems?  What I said
above is what switfshader is doing and they're running real apps and
I've not heard of it causing any problems.  It's also worth noting
that you would only really have to stall at sync_file export.  You can
async as much as you want internally.

> One idea for a safe userspace-backed sync_file is to have a step
> counter that counts down until the sync_file is ready, where if
> userspace doesn't tell it to count any steps in a certain amount of
> time, then the sync_file switches to the error state. This way, it
> will error shortly after a process deadlocks for some reason, while
> still having the finite-time guarantee.
>
> When the sync_file is created, the step counter would be set to the
> number of jobs that the fence is waiting on.
>
> It can also be set to pause the timeout to wait until another
> sync_file signals, to handle cases where a sync_file is waiting on a
> userspace process that is waiting on another sync_file.
>
> The main issue is that the kernel would have to make sure that the
> sync_file graph doesn't have loops, maybe by erroring all sync_files
> that it finds in the loop.
>
> Does that sound like a good idea?

Honestly, I don't think you'll ever be able to sell that to the kernel
community.  All of the deadlock detection would add massive complexity
to the already